Home

phc Documentation

image

Contents

1. Assignment Cast Unary op Bin op Constant Instanceof Variable Pre op ethod invocation New Literal Op assignment List assignment Post op Array Conditional expr Ignore errors Literal INT lt long gt REAL lt double gt STRING lt String gt BOOL lt bool gt NIL lt gt Assignment Variable is_ref amp Expr Op_assignment Variable OP Expr List assignment List element Expr List element Variable Nested list elements ested list elements List element7x Cast CAST EXpr Unary op OP Expr 35 Chapter 9 The Abstract Grammar Bin_op left Expr OP right Expr Conditional_expr cond Expr iftrue Expr iffalse Expr lgnore errors ii EXpr Constant CLASS NAME CONSTANT NAME Instanceof Expr Class name Variable Target Variable name array indices Expr x Variable name VARIABLE NAME Reflection Reflection Expr Target Expr CLASS NAME Pre op OP Variable Post op Variable OP Array Array elemx Array elem key Expr is ref amp val Expr Method invocation Target Method name Actual parameters Method name METHOD NAME Reflection Actual parameter is ref amp Expr New Class name Actual parameters Class name CLASS NAME Reflection Additional Structure Commented node Member
2. Chapter 5 Restructuring the Tree class Remove_concat_null public Visitor public void pre bin op Bin opx in Find concat operators if xin op value The problem is what are we going to do inside the if Tree visitors can only inspect and modify in they cannot restructure the tree In particular we cannot replace in by a new node For this purpose phc offers a separate API the tree transformation API It looks very similar to the tree visitor API but there are two important differences First the pre and post methods can modify the structure of the tree by returning new nodes Second there are no generic methods in the tree transform API So it is not possible to define a transformation that would replace all statements by something else It is not clear how that would be useful anyway So we need to write our transformation using the Tree transform API defined in AST transform h Restructuring the class above yields class Remove concat null public Transform public Expr pre bin op Bin op in Find concat operators if in gt op gt value The differences between the previous version have been highlighted We inherit from a different class and pre bin op now has a return value which is the node that will replace in If you check the default implementation of pre_bin_op in AST transform cpp you ll find Expr Transform pre_bin_op Bin
3. NULL AST target NULL AST expo Token variable name The name of the variable is x not x Using array indices x 1 1 2 NULL AST target The empty array index means next available in PHP Class constants X y NULL AST expe Token variable name AST expe list 39 Chapter 10 Representing PHP Again the variable name is y not y The fact that you must write x gt y but X Sy in PHP disappears in the abstract syntax Variable variables x NULL AST target Token variable name NULL AST target NULL AST expr Note how the name of the variable second component is now given bv another variable Object attributes x gt y NULL AST expe NULL AST target The target is now given bv a variable 40 Chapter 10 Representing PHP Variable object attributes x gt y ASTIN ariable a il o x ae NULL AST variable CAST reflection AST expe is AST expr AST variable NULL DENIS euh NULL AST target Token variable name gt mm AST expo NULL T NULL x AST target Cen s vue name gt CAST ec Ie AST expr Both the target and the variable name are given by other variables Comments A number of nodes in the AST are dedicated commented nodes Their corresponding C classes inherit from Commented node which introduces a String list x attribute called comment s The commente
4. Statement Interface def Class def Switch case Catch Identifier INTERFACE NAME CLASS NAME METHOD NAME VARIABLE NAME CAST OP CONSTANT NAM DIRECTIVE NAME ql Source_rep Identifier Literal 36 Chapter 9 The Abstract Grammar Mix in Code The code generated based on the grammar listed above can be extended by mix in code which adds fields or methods to the class structure generated by phe For a full listing of the mix in code see src generated src ast tea in the phe distribution 37 Chapter 10 Representing PHP Most PHP constructs can immediately be represented in terms of the phe grammar Chapter 9 There are a few constructs that present some difficulties This document describes how these difficulties are resolved and it explains some of the more difficult rules in the grammar Variables The grammar rule for variables reads variable target variable name array indices expr string index expr variable name VARIABLE NAME reflection This is probably one of the more difficult rules in the grammar so it is worth explaining in a bit more detail The following table describe each element of the first rule in detail Target Just like function calls variables can have a target and just as for function calls this target can be an expression for an object e g x gt y or a class name for a
5. list of As a second example consider the rule that describes arrays in PHP This rule should cover things such asarray array a b andarray 1 gt a 2 gt g Arrays are described by the following two rules Array Array_elemx Array_elem key Expr val Expr Actually this is a simplification but it will do for the moment These two rules say that an array consists of a list of array elements and an array element has an optional expression called key and a second expression called val The question mark means optional Note that the grammar does not record the need for the keyword array or for the parentheses and commas We do not need to record these because we already know that we are talking about an array all we need to know is what the array elements are The Abstract Syntax Tree When phc reads a PHP script it builds up an internal representation of the script This representation is known as an abstract syntax tree or AST for short The structure of the AST follows directly from the abstract grammar For people familiar with XML this tree can be compared to the DOM representation of an XML script and in fact phe can output the AST as an XML document see Chapter 3 in The phc User s Manual Chapter 2 Getting Started For example consider i statements again An i f statement is represented by an instance of the If class which is approximately define
6. private static abstract final Formal parameter Type is ref amp var Name with default Formal parameter Type is ref amp var Name with default Type CLASS NAME Name with default VARIABLE NAME Expr Attribute Attr mod vars Name with defaultx Attr mod public protected private static const Statements Statement Class def Interface def Method Return Static declaration Global Try Throw Eval expr If While Do For Foreach Switch Break Continue Declare Nop If Expr iftrue Statementx iffalse Statementx 34 DH Chapter 9 The Abstract Grammar While Expr Statement Do Statement Expr For init Expr cond Expr incr Expr Statement Foreach Expr key Variable is_ref amp val Variable Statement Switch Expr Switch casex Switch case Expr Statementx Break Expr Continue Expr Return Expr Static declaration vars Name with default Global Variable name Declare Directive Statementx Directive DIRECTIVE_NAME Expr Try Statement catches Catchx Catch CLASS_NAME VARIABLE_NAME Statementx Throw Expr Eval_expr Expr Op Expressions EXpr
7. return wildcard gt value Replace with left operand if right operand is the empty string if in gt match new Bin op wildcard empty return wildcard gt value return in We already explained what mat ch does in Chapter 4 but we have not yet explained the use of wildcards If you are using a wildcard WILDCARD in a pattern passed to mat ch mat ch will not take that subtree into account Thus if in gt match new Bin op empty WILDCARD can be paraphrased as is in a binary operator with the empty string as the left operand and as the operator I don t care about the right operand If the match succeeded you can find out which expression was matched by the wildcard by accessing wildcard gt value Running Transformations Recall from the previous two tutorials that visitors are run with a call to visit extern C void run ast PHP scriptx in Pass managerx pm Stringx option SomeVisitor visitor in gt visit amp visitor 19 Chapter 5 Restructuring the Tree Likewise transformations are run with a call to transform_children extern C void run ast PHP scriptx in Pass managerx pm String option SomeTransform transform in transform children amp transform We invoke transform children because we should not replace the top level node in the AST the PHP script node itself A Subtlety If you don t understand this section right now don t worry about it you m
8. Create a new directory say myplugins and create a new file helloworld cpp include lt AST h gt include lt pass_manager Plugin_pass h gt extern C void load Pass managerx pm Plugin_passx pass pm gt add_after_named_pass pass new String ast extern C void run ast AST PHP scriptx in Pass managerx pm Stringx option cout Hello world I m a phc plugin endl This is an example of an almost minimal plugin Every plugin you write must contain these functions with these exact signatures load is run when phc starts giving your plugin the opportunity to add itself to the list of passes phc runs In this example it is added after the ast pass When phc processes a PHP script it runs all of the passes on it in turn When it s your plugin s turn it calls your version of run ast To compile the plugin run myplugins pho compile plugin helloworld cpp phe compile plugin isa small shellscript that makes the task of compiling plugins easier it calls g in a platform independent way if you re curious you can open it in any text editor Finally run the plugin using myplugins phc run helloworld la sometest php You need to pass in an input script to phe even though our plugin does not use it If that worked as expected congratulations you ve just written your first phe plugin Chapter 2 Getting Started About extern C You may have been wondering what the extern C
9. Some comment with multiple lines foo 2 gt The problem is that the whitespace at the start of each line is included in the comment This means that when the unparser outputs the comment it outputs something like lt php Some comment with multiple lines foo 22 It is unclear how to solve this problem nicely Suggestions are welcome Second it is not currently possible to associate a comment with the e1se clause of an i statement Thus in lt php Comment 1 if Sc foo 47 Chapter 11 Limitations Comment 2 else bar 2 gt Comment 2 will be associated with the call to bar but Comment 1 will be associated with the if statement itself A similar problem occurs with comments for elseif statements Finally if a scope ends on a comment that comment will be associated with the wrong node For example in lt php if Sc echo Hi else Do nothing echo World 22 the comment will be associated with the echo World statement A similar problem occurs when a script ends on a comment that comment will not be lost but will be associated with the last node in the script Numbers PHP accepts invalid octal numbers such as 01090 the incorrect tail is silently ignored so this number should evaluate to 8 decimal The phe lexical analyser will generate an invalid token instead which will result in a syntax error Scopes
10. a AST foo list instead See the section Context Resolution in the grammar formalism for details here we consider the programmer s perspective only The exact signatures for a particular class can always be found in AST h As with the visitor API transform calls pre transform transform children and post transform However while transform calls pre transform on itself it calls transform children and post transform on the node returned by pre transform If pre transformreturns a vector transform calls transform children and post transform on every element in that vector assembling all the results pre transformand post transform simply call the appropriate method in the AST Transform object However if pre transform or post transform returns a list of nodes the corresponding method in the tree transform object will expect two arguments the node to be transformed and an empty list of nodes that will be the return value of pre transform In that case pre_transform will first create a new empty list pass that in as the second argument to the corresponding method in the tree transform object and then return that list transform_children just calls the corresponding method in the tree transform object 35 Chapter 13 Maketea Theory Introduction maketea is available separately http www maketea org to phe Based on a grammar definition of a language it generates a C hierarchy for the corresponding abstract syntax tree a tree
11. a less than satisfactory way see Limitations for details for details String parsing Double quoted strings and those written using the HEREDOC syntax are treated specially by PHP it parses variables used inside these strings and automatically expands them with their value phc handles both the simple and complex syntax defined by PHP for variables in strings We transform a string like Total cost is total includes shipping of S shipping into Total cost is total includes shipping of shipping which is represented in the phe abstract syntax tree by a number of strings and expressions concatenated together Thus as a programmer you don t need to do anything special to process variables inside strings Any code you write for processing variables will also appropriately handle variables inside strings Note that as of version 0 2 0 interpolated strings are correctly unparsed by phc elseif The abstract grammar does not have a construct for elseif The following PHP code lt php if x c1 elseif Sy c2 else c30 2 gt gets interpreted as 42 Chapter 10 Representing PHP lt php if x c1 else if y c20 else c30 2 gt The higher the number of el sei s the greater the level of nesting This transformation is hidden by the unparser Miscellaneous Other Changes Fragments of inline HTML become arguments to a function call to echo The keyword
12. can view the AST graphically First ask phe to output the AST in DOT format phc dump dot ast helloworld php gt helloworld dot You can then view the tree helloworld dot using Graphviz In most Unix Linux systems you should be able to do dotty helloworld dot And you should see the tree it should look similar to the tree shown in figure Figure 3 1 Chapter 3 Running phe Figure 3 1 Abstract syntax tree for Hello world Method invocation 2 NULL METHOD NAME 2 L ListcActual paran Target SE Actual_paramete STRING 2 Chapter 3 Running phe Including files phe has initial support for compile time processing of PHP s include built in Enabling this feature inserts the included statements in the AST in the place of the include statement Included functions classes and interfaces become part of the file s top level scope In the event that phe is not able to process the include statement for example if the file cannot be found a warning is issued and the include statement is left in place To enable this support run phe include script with includes php The include support is intended to mimic PHP s include built in http php net manual en function include php as far as can be achieved at compile time phe supports Moving included statements to the point at which include was called Naturally these statement s use the variable scope at the point at which they are included e
13. endl The overall structure of this plugin should be fairly clear We count all the statements in the outermost list of statements and then consider each statement in turn to check if there are any statements nested inside them This plugin will now report the correct number of statements for our example with the if statement However if will report an invalid number of statements for examples with other types of statements For example it will report only two statements for lt php x 5 while x echo x 2 gt Of course we can fix the plugin by testing for while statements And for do statements And for foreach switch try etc As mentioned manually dealing with the syntax tree is a laborious process Even something as simple as counting the number of statements in a script becomes a large program Fortunately there is a much easier solution The Easy Solution Fortunately phe will do all this for you automatically There is a standard do nothing tree traversal predefined in phc in the form of a class called AST Visitor defined in AST_visitor h AST Visitor contains methods for each type of node in the tree phc will automatically traverse the abstract syntax tree for you and call the appropriate method at each node In fact there are two methods defined for each type of node The first method called pre something gets called on a node before phc visits the children of the node The second method calle
14. following commands Instead you should the same version as your webserver uses From the ext directory run phpiz with php config usr bin php config configure enable helloworld Build and install the extension if you dont have root refer instead to Alternatives Chapter 3 Running phe make sudo make install In your web folder replace the existing helloworld php file contents with the following lt php dl helloworld so MAIN 2 gt If the d1 function is not enabled in your php ini file enable it enable_dl On Accessing helloworld php should now work Alternatives Instead of setting enable dl you can instead load the extension manually in your php ini file extension helloworld You can also avoid installing the extension using sudo make install by adding an alternate extension directory extensions dir full path to ext Writing and Reading XML phe can output an XML representation of the PHP script You can use this representation if you want to process PHP scripts using tools in your desired framework instead of using phc plugins After processing the XML representation phc can convert it back into PHP To generate an XML version of a PHP script run phc dump xml ast helloworld php helloworld xml When reading the XML back in all the usual features of phc are again available in particular it is possible to read an XML file and write PHP syntax To convert t
15. function calls reference variable variable property etc etc the concepts used in the phe tree map directly to constructs in the PHP language that does not hold true for the PHP tree Moreover the fact that this expression is a method invocation function call is immediately obvious from the root of the expression in the phc tree the root of the PHP tree says that the expression is a variable and only deeper down the tree does it become apparent that the expression is in fact a function call 46 Chapter 11 Limitations This document describes the known limitations of the current phe implementation These limitations are things that we are aware of but that are not high on our priority list of things to deal with at the moment However if any of them are bothering you let us know http www phpcompiler org mailinglist html and we might look into it Comments Representing PHP explains how we deal with comments Most comments in a PHP script should get attached to the right token in the tree and no comments should ever be lost If that is not true please send us a sample program that demonstrates where it breaks There are a few problems that we are aware of and there are probably others too Dealing with comments in a completely satisfactory way is a difficult task The first problem with our method of dealing with comments is how we deal with whitespace in multi line comments Consider the following example lt php x
16. in the definition of load and run ast is for the reason is that phe uses the 1ibtool s libltdl interface to load your plugin if the functions are not declared as extern C phe will not be able to find them in your plugin because the name of that function will have been mangled by the C compiler It does not mean that you cannot write C code inside these functions If you don t understand any of that don t worry about it just remember that you need to declare load run ast and a small number of other functions which we ll name later as extern C and everything will be fine You don t need extern C for any functions you might define Abstract Syntax To be able to do anything useful in your plugins you need to know how phc represents PHP code internally phe s view of PHP scripts is described by an abstract grammar An abstract grammar describes how the contents of a PHP script are structured A grammar consists of a number of rules For example there is a rule in the grammar that describes how if statements work If Expr iftrue Statementx iffalse Statementx This rules reads An if statement consists of an expression the condition of the if statement a list of statements called iftrue the instructions that get executed when the condition holds and another list of statements called iffalse the instructions that get executed when the condition does not hold The asterisk x in the rule means
17. in the old tree The only exception 50 Chapter 12 Overview of the AST classes and transformation API to this rule is that cloning the WILDCARD objects see pattern matching below returns the WILDCARD object itself Pattern Matching Pattern matching is implemented by bool match Objectx pattern Pattern matching differs from deep equality in two ways First it does not take into account any fields added by the mixin code for example it does not compare line numbers or comments Second mat ch supports the use of wildcards Maketea generates a special class called Wildcard You should never instantiate this class directly in lt AST h gt you will find the following declaration extern Wildcard WILDCARD This WILDCARD is the sole instance of Wildcard When match encounters a reference to this object in a pattern it does two things it skips that field in the comparison so it acts as a don t care and it replaces the value of the field in the pattern by the value in the tree For example in the body of the if in CLASS NAMEx name new CLASS NAME new String SOME CLASS CLASS NAME pattern new CLASS NAME WILDCARD if name match pattern Ya resi pattern value willbe set to the corresponding value in name Tutorials Restructuring the Tree and Using State include examples of the use of wildcards Calling any methods on the WILDCARD object other than deep clone will lead to a runtime e
18. only difference is that we leave the actual filename a wildcard obviously we want to be able to match against any include not just include a php Running this transform should remove the include from the file but leave the other statements untouched note that we need to push_back into out to make sure a statement does not get deleted The Full Transform We are nearly done All that s left is to parse the file we can use the filename wildcard to find out which file we need to include and insert all statements into the parsed file at the point of the include Parsing PHP is hard but of course phe comes with a PHP parser To use this parser include the lt parsing parse h gt header and call parse Here then is the full transform include AST transform h include parsing parse h include process_ir XML_unparser h class Expand_includes public Transform private 30 Chapter 8 Returning Lists XML unparserx xml_unparser Wildcard lt STRING gt x filename ethod invocationx pattern public Expand includes xml_unparser new XML unparser cout false filename new Wildcard lt STRING gt pattern new Method_invocation NULL new METHOD_NAME new String include new List lt Actual_parameter gt new Actual_parameter false filename public void pre_eval_expr Eval_exprx in List lt Statement gt out in gt visit xml_unparser Chec
19. shared hosting environment Warning This section is experimental Please report any problems http www phpcompiler org mailinglist html We have created the command line option web app which will in the future automate the process of compiling a web application Unfortunately for now please follow these steps We describe how to create and install an extension using the C code generated by phe While we give an overview of creating extensions significantly more detail can be found in the Zend Extension Writing Tutorial http devzone zend com node view id 1021 and in Extending and Embedding PHP http www amazon com dp 067232704X To begin create a new directory for the extension We ll use ext in our example Generate C code from helloworld php using phe phe generate c helloworld php gt ext helloworld c Create a new file ext config m4 by copying the following and changing instances of helloworld appropriately PHP_ARG_ENABLE helloworld whether to enable Hello World support enable helloworld Enable Hello World support if test SPHP HELLOWORLD yes then AC DEFINE HAVE HELLOWORLD 1 Whether you have Hello World PHP NEW EXTENSION helloworld helloworld c Sext shared fi In the previous section we described using the PHP embed SAPI If vou installed a copv of PHP with nabl mbed enabled it is important NOT to use that version for the
20. static class attribute e g FOO y As in function calls in variables the target is optional indicated by the question mark If no target is specified the variable refers to a local variable in a method Variable name Again as for function calls the name of the variable may be a literal VARIABLE NAME x or be given by an expression which is wrapped up in an Ref lection node The latter possibility is referred to as variable variables in the PHP manual For example x is the variable whose name is currently stored in another variable called x array indices Expr7 x A variable may have one or more array indices for example x 3 5 The strange construct Expr means a list of x optional 2 expressions For example x 4 is a list of two expressions but the second expression is not given In PHP this means use the next available index String and array indexing x 3 are equivalent in PHP so string indexing is also represented by array indices We illustrate the various possibilities using diagrams Warning These diagrams use old names for AST nodes Where you see AST variable it is now called Variable it uses the AST namespace Token variable name is called VARIABLE NAME and Token int is called INT It is possible the structure of some nodes have changed slightly since this was written 38 Chapter 10 Representing PHP The simple case x
21. transformation and visitor API and deep cloning deep equality and pattern matching on the AST In this document we describe the grammar formalism used by phc how a C class structure is derived from such a grammar and explains how the tree transformation API is generated The generated code itself is explained in Overview of the AST classes and transformation API The Grammar Formalism The style of grammar formalism used by maketea is sometimes referred to as an object oriented context free grammar It facilitates a trivial and reliable mapping between the grammar Chapter 9 and the actual C abstract syntax tree AST that is generated by the phe parser We make a distinction between three types of symbols non terminal symbols terminal symbols and markers Non terminal symbols have the same function in our formalism as in the usual BNF formalism and will not be further explained We denote non terminal symbols in lower case in the grammar e g expr The distinction between terminal symbols and markers is non standard Markers have no semantic value other than their presence an example is abstract Thus the semantic value of a marker is a boolean value it is either there or it is not note that this is different from a symbol such as the semi colon which has no semantic value whatsoever and thus does not need to be included in an abstract syntax tree Conversely the semantic value of a terminal symbol is an arbitrary v
22. tree as shown in figure Figure 1 1 Chapter 1 Introduction Figure 1 1 Abstract syntax tree for the demo example Method 2 VARIABLE NAME Chapter 1 Introduction The Transform Suppose we want to rename function oo to bar This is done by the following plugin include AST_visitor h include pass_manager Plugin_pass h class Rename_foo_to_bar public Visitor void pre_method_name METHOD_NAMEx in if in gt value foo in gt value new String bar extern C void run ast AST PHP scriptx in Pass managerx pm Stringx option Rename foo to bar f2b php script visit amp f2b extern C void load Pass manager pm Plugin passx pass pm add after named pass pass new String ast The Result Running phc gives php function bar return 5 foo bar echo foo is foo br 2 gt where the name of the function has been changed while the name of the variable remained unaltered as has the text foo inside the string It s that simple Of course in this example it would have been quicker to do it by hand but that s not the point the example shows how easy it is to operate on PHP scripts within the phe framework Chapter 1 Introduction Writing Plugins Getting Started introduces writing plugins for phe It then explains how phe represents PHP scripts internally and shows how to write a first but ultimately wrong attempt at
23. 27 OEA OS exceder tt D iet eter Arja LM d Eo 12 Overview of the AST classes and transformation AP 99 The NEE 22 Deep Equality i ederet e edo a DUE ye Sip ii Pattern Matching eene ed BREET WEEN 22 The Transtorm APLE ue Seege ee sab ok Laas PRIM Aa se 22 13 Maketea TheOp eere eee vete ii id Introduction rre eee eret e ee e aea eye ne eve Ee eee ee vede ee e Ee a 22 The Grammar Formal Meli inicie 22 Context Resolution ep id er ES a E E R 22 COM Aia 22 Reducing Context ind 2 Resolution for DiSjuhetions 4 arteria XV IST OSUEuu m 14 Porting and Packaemg eren non neon nono nana rn concen non rennen entente nenens We need porters packagers and mamtamnere nennen Packaging hints enee ere eno tenera etie obe deep a Yee t ii b 2 phe packard es e See 2 xvi List of Figures 1 1 Abstract syntax tree for the demo example sess 3 1 Abstract syntax tree for the running example sese 10 1 Function Callan the As Ts ices ects at Pe A ee est uelis eed 10 2 Function call as represented by PHP oo eee sess enne nennen nene 12 1 Sequence Diagram for the Visitor APL iiii ettet mre nere there n 12 2 Sequence Diagram for the Transform AT e xvii Chapter 1 Introduction From the start one of the design goals of phe has been to provide a useful framework for writing applications that process PHP scripts phe parses PHP code i
24. Actual_parameter false new NIL in gt actual_parameters gt insert pos param y extern C void load Pass managerx pm Plugin_passx pass pm add after named pass pass new String ast 15 Chapter 4 Modifying Tree Nodes extern C void run_ast PHP_script in Pass_manager pm Stringx option MySOL2DBX m2d in gt visit amp m2d If we apply this transformation to link mysql_connect host user pass We get Slink dbx_connect DBX_MYSQL host NULL user pass Refactoring A quick note on refactoring Refactoring is the process of modifying existing programs PHP scripts usually to work in new projects or in different setups for example with a different database engine Manual refactoring is laborious and error prone so tool support is a must Although phe can be used to refactor PHP code as shown in this tutorial a dedicated refactoring tool for PHP would be easier to use though of course less flexible Such a tool can however be built on top of phe What s Next Chapter 5 explains how you can modify the structure of the tree as well as the tree nodes 16 Chapter 5 Restructuring the Tree Now that we have seen in Chapter 3 how we can traverse the tree and in Chapter 4 how we can modify individual nodes in the tree in this tutorial we will look at modifying the structure of the tree itself The transform that we will be considering in this tutori
25. As a matter of fact it does If you check AST transform h you will see that the signature for pre eval expr is void pre eval expr Eval exprx in Statement listx out This is different from the signatures we have seen so far For nodes that can be replaced by a number of new nodes the pre transform and post transform methods will not have a return value in their signature but have an extra xxx List argument This list is initialised to be empty before pre eval expr is invoked and when pre eval expr returns the nodes in this list will replace in If the list is empty the node is simply deleted from the tree So we will use the following p lugin as our starting point Executing this plugin deletes all Eval_expr nodes from the tree try it include AST transform h class Expand includes public Transform public void pre_eval_expr Eval_exprx in Statement_listx out y extern C void load Pass manager pm Plugin_passx pass pm add after named pass pass new String ast extern C void run ast PHP scriptx in Pass managerx pm String option Expand_includes einc in transform children amp einc Using the XML unparser So we now want to do something more useful than deleting all eval expr nodes from the tree The first thing we need to be able to do is distinguish include statements from other eval expr nodes We can use pattern matching see Chapter 5 and Chapter 6
26. Preserving FILE and LINE statements Moving included functions to the MAINS class and importing the included classes include and require If the specified file cannot be found parsed or if the argument to include is not a string literal the include statement is left in place phc does not support Return values in included scripts We intend to support these in the future They will likely be supported in a later stage of the compilation process instead of in the AST Calling include on anything other than a literal string containing the filename of a local file This excludes variables and remote files These may be supported when more static analyses are available include once and require once as we cannot guarantee that the file to be included is not included elsewhere These statements will not be processed and combinations of include or require and include once or require once may cause incorrect behaviour with this option set Updating get included files toreflect the included files 11 The phc Developer s Manual The phc Developer s Manual Table of Contents 1 IntroductioN oooommmsmsmsmsmmmmmmmmme 1d System Requirements A ira Yee Building PHP for phe development te ere ere bre tee re tee ete deiere DEMONIO REESE GEHE Ee The ee 2 The Transtorm ien A a O R The Res lt iio iR ER i e
27. We incorrectly represent lt php function x 22 as lt php 48 Chapter 11 Limitations function x 2 gt In the former x is only declared when its declaration is executed In the latter it is declared as soon as the program starts Other issues There are quite a number of minor bugs and issues with phe that we are aware of Our bug tracker is available at our project site http code google com p phc issues list We are looking for contributors to help us fix many of these bugs Please see our contributors page http phpcompiler org contribute html if you re interested in helping out 49 Chapter 12 Overview of the AST classes and transformation API This document explains the code for the AST classes tree visitor API and tree transformation API All this code is generated by a tool called maketea It does not explain how this code is derived from the phe grammar some of the details of this process are explained in Maketea Theory The AST classes Deep There are two main kinds of AST classes classes that correspond to non terminals in the grammar and classes that correspond to terminals in the grammar Non terminal classes contain an upper case first letter Terminals or tokens are entirely uppercase Examples are While Expr METHOD_NAME and INT The main difference is that token classes have one additional field and sometimes two Every token class gets an attribute
28. _op in return in The return in is very important as we mentioned before the return value of pre_bin_op will replace xin in the tree Therefore if we don t want to replace in or perhaps if we want to replace xin only if a particular condition holds we must return in This will replace xin by in itself The second thing to note is that the return type of pre_bin_op is Expr instead of Bin_op This means that we can replace a binary operator node by another other expression node The Maketea Theory 18 Chapter 5 Restructuring the Tree explains exactly how the signatures for the pre and post methods are derived but in most cases they are what you d expect The easiest way to check is to simply look them up in lt AST_transform h gt The Implementation We wanted to get rid of useless concatenation operators To be precise if the binary operator is the concatenation operator and the left operand is the empty string we want to replace the node by the right operand similarly if the right operand is the empty string we want to replace the operator by its left operand Here s the full transform class Remove_concat_null public Transform public Expr post bin op Bin op in STRING empty new STRING new String Wildcard lt Expr gt x wildcard new Wildcard lt Expr gt Replace with right operand if left operand is the empty string if in gt match new Bin_op empty wildcard
29. al is one that is used in phe itself The transform is called Remove_concat_null and can be found in src process_ast Remove_concat_null h The purpose of the transform is to remove string concatenation with the empty string For example lt php Ss foo S qu is translated to lt php Ss foo 2 gt The reason that this transform is implemented in phc is due to how the phe parser deals with in string syntax For example if you write Sa foo Sb bar the corresponding tree generated by phc is Sa tee 9b bart In other words the variables are pulled out of the string and the various components are then concatenated together However taken to its logical conclusion that means that if you write a foo b the parser generates a m foo d Sb Ws Obviously the second concatenation is unnecessary and the Remove concat null transform cleans this up In this tutorial we will explain how this transform can be written Introducing the Tree_transform API Concatenation is a binary operator so we are interested in nodes of type Bin_op If you check the grammar or alternatively src generated AST h you will find that Bin op has three attributes a left anda right expression of type Expr and the operator itself OP op Thus we are interested in nodes of type Bin_op whose op equals the single dot for string concatenation Based on the previous two tutorials we might try something like this 17
30. alue an example is CLASS_NAME the structure of a terminal symbol may be defined by a regular expression this is irrelevant as far as the abstract grammar is concerned We denote markers in quotes abstract and terminal symbols in capitals CLASS_NAME Each non terminal symbol aa will have a single production in the grammar Instances of aa in the AST will be represented by a class called Aa The attributes of Aa will depend on the production for aa see below A terminal symbol xx will be represented by a class XX Every token class gets an attribute called value The type of this attribute depends on the token for most tokens it is Stringx this is the default however if the grammar explicitely specifies a type for the value in angular brackets for example REAL double this overrides the default If the type of the value attribute it set to be empty the token class does not get a value Finally a marker will not be represented by a specialised class Instead a marker foo may only appear as an optional symbol in a production rule a foo and will appear as a boolean attribute is_foo in the class representing aa Aa There are only two types of rules in the grammar The first is the simplest and list a number of alternatives for a non terminal symbol aa 56 Chapter 13 Maketea Theory aem ti P oo l 2 Here each of b c z must be a single non terminal symbol This rule results in a u
31. ansform in the src process_ast Token_conversion cpp To do this you would replace return in gt right by return in gt right gt pre_transform this 20 Chapter 5 Restructuring the Tree What s Next The next tutorial in this series Using State introduces a very important notion in transforms the use of state 21 Chapter 6 Using State This tutorial explains an advanced feature of pattern matching and shows an important technique in writing tree transforms the use of state Suppose we are continuing the refactoring tool that we began in Chapter 4 and suppose that we have replaced all calls to database specific functions by calls to the generic DBX functions To finish the refactoring we want to rename any function oo in the script to Zoo DB if it makes use of the database this clearly sets functions that use the database apart which may make the structure of the script clearer So we want to write a transform that renames all functions foo to foo_DB if there is one or more call within that function to any dbx_something function Here is a simple example lt php function first global link Serror dbx error link function second echo Do something else 2 gt After the transform we should get lt php function first_DB global link Serror dbx error link function second echo Do something else 2 gt The Implementation Since we have to modi
32. called value The type of this attribute depends on the token for most tokens it is Stringx this is the default however if the grammar explicitely specifies a type for the value in angular brackets for example REAL lt double gt this overrides the default In addition all the tokens classes have a method called get_value_as_string and a method get_source_rep when applicable This is useful for programs that operate on general Identifier objects such as METHOD_NAME or CLASS_NAME or Literal such as REAL or INT Note that the value returned by get_value_as_string andget source rep may be different for example get source rep might return 0 5E 1 while get_value_as_string might return 0 5 All non terminal and terminal then provide the following methods for deep equality pattern matching cloning calling a tree visitor and calling a tree transformer These methods are explained separately in sections below Equality Deep equality is implemented by bool deep equals Objectx other It takes into account the entire tree structure generated by maketea including any fields that are specified in the code in the grammar see the Section called Mix in Code in Chapter 9 Thus deep equals also compares line numbers comments etc Cloning Cloning is implemented by deep clone Cloning makes a deep copy of a tree so the set of all pointers in the new tree is completely distinct from the set of pointers
33. d post something gets called on a node after phc has visited the children of the node For example pre if gets called on an If before visiting the statements in the ift rue and iffalse clauses of the If After all the statements have been visited post if gets called So here is an alternative and much easier solution for our problem This plugin will actually count all statements in a script without having to worry about all the different ways statements can be embedded in other statements Moreover even if the internal representation of phe changes for example if another type of statement gets added this plugin will still work as is include AST visitor h include pass manager Plugin pass h class Count statements public AST Visitor 10 Chapter 3 Traversing the Tree private int num_statements public Set num_statements to zero before we begin void pre php script AST PHP scriptx in num_statements 0 Print the number of function calls when we are don void post_php_script AST PHP_script in cout lt lt num_statements lt lt statements found lt lt endl Count the number of function calls void post statement AST Statement in num_statements extern C void load Pass manager pm Plugin_passx pass pm add after named pass pass new String ast extern C void run ast AST PHP scriptx in Pass managerx pm Stringx option Co
34. d as follows class If public EXprx expr Statement listx iftrue Statement listx iffalse Hr Thus the name of the rule if translates into a class If and the elements on the right hand side of the rule Expr iftrue Statement iffalse Statement x correspond directly to the class members The class Statement list inherits from the STL list class and can thus be treated as such Similarly the class definitions for arrays and array elements look like class Array public Array_elem_list array_elems y class Array_elem public Exprx key Exprx val hi When you start developing applications with phe you will find it useful to consult the full description of the grammar which can be found in Chapter 9 A detailed explanation of the structure of this grammar and how it converts to the C class structure can be found in Chapter 13 Some notes on how phe converts normal PHP code into abstract syntax can be found in Chapter 10 Working with the AST When you want to build tools based on phe you do not have to understand how the abstract syntax tree is built because this is done for you Once the tree has been built you can examine or modify the tree in any way you want When you are finished you can ask phe to output the tree to normal PHP code again Let s write a very simple plugin that counts the number of statements in a script Create a new file myplugins count_statements cp
35. d nodes are class members Member statements Statement interface and class definitions Interface def Class def switch cases Switch_case and catches Catch When the parser encounters a comment in the input it attaches it either to the previous node in the AST or to the next according to a variable attach to previous This variable is set as follows Itis reset to false at the start of each line Itis set to true after seeing a semicolon or either of the keywords class or function Thus in foo Comment bar the comment gets attached to bar to be precise to the corresponding Eval_expr node the function call itself is an expression and phe does not associate comments with expressions but in foo Comment bar the comment gets attached to foo instead The same applies to multiple comments foo A B x 4l Chapter 10 Representing PHP C D bar In this snippet A and B get attached to foo but C and D get attached to bar Also in the following snippet Comment echo one 1 x two 2 all comments get attached to the same node This should work most of the time if not all the time In particular it should never loose any comments If something goes wrong with comments please send http www phpcompiler org contact html us a sample program that shows where it goes wrong Note that whitespace in multi line comments gets dealt with in
36. e 2 Running Transformations 000 0 eee cseceseeseeeeceeceseeeeecaecsacsaeceeceseeeeceecaessaesaecnesaseeaeeeaseneeags 22 A Subtlety si i ai oe ep dpa nuege 2 Whats Next A i 2 6 UsIng RE tere The Implementar Whats Next s et ee tid Uto mile tim tn tati eiiis ues xiv 7 Modifying the Traversal Order 22 The Solution ssa i A A 22 What S Next i enc A A Ai 22 8 Returning List cai eG A Re eR Eee ee gere tee ie eden n 22 Deleting Nodes i 5 deste epar ROI EENS Using the XAML Unpatset E N MSNM OT NOR 22 What s Next iii add iii GRO GRO GE Ee IBS Su O 9 The Abstract Grammar 22 PM 72 Overall Structure ees enm a aa 22 Md a ds 22 EXpressioDs eet re oem dee epo eget steeds Additional Structure A bte e bete a p Ei I 72 Mix Code s fis sir a e imd 22 10 Representing POP ec aee A ODORE POS i RES TAW A eda Ec tta ca MM EE 22 Variables A 72 Comimmentsss mer pe omen a a OER 22 String PUSO ice e B L SA A oi 77 eE ETRAS Miscellaneous Other Changes eese nennen nennen ren nennen neret 22 Comparison to the PHP oerammar E N IT A EAD ua arae da eda e NE 22 da lada e C oraments a aia aine eec a itat t 72 e e det E nd a a TE 22 A LEE
37. ee when a AST visitor traverses a tree it first calls pre_xxx for a node of type xxx it then visits all the children of the node and finally it calls post_xxx on the node For many transforms this is sufficient but not for all Consider the following transform Suppose we want to add comments to the true and false branches of an if statement so that the following example lt php if Sexpr echo Do something else echo Do something else 2 gt 1s translated to lt php if Sexpr TODO Insert comment 4 echo Do something else TODO Insert comment 4 echo Do something else 2 gt This appears to be a simple transform One way to do implement it would be to introduce a flag comment that is set to true when we encounter an Tf i e in pre if Then in post statement we could check for this flag and if it is set we could add the required comment to the statement and reset the flag to false However this will only add a comment to the first statement in the true branch try it To add a comment to the first statement in the false branch too we should set the flag to t rue in between visiting the children of the true branch and visiting the children of the false branch To be able to do this we need to modify children_if as explained in the next section The Solution For every AST node type xxx the AST Transform API defines a method called children_xxx This method is r
38. ement should be allowed return a list of statements of any kind because it is safe to replace an if statement by a list of statements Similarly a binary operator should be allowed return any other expression but not a list of them For reasons that will become clear very soon we call the process of deciding these signatures context resolution Contexts A context is essentially a use of a symbol somewhere in a concrete rule in the grammar There are four possibilities Consider concretel concrete2 concrete3 concrete4 concrete5 concrete6 abstractl abstract2 concrete3 concrete4 concrete5 concrete6 some concrete rule concretel concrete2x abstractl abstract2x then based on the rule for some_concrete_rule concretel occurs in the context concretel concretel Single i e as a single instance of itself concrete2 occurs in the context concrete2 concrete2 List i e as a list of instances of itself The use of the abstractl class leads to a number of contexts abstractl abstractl Single concrete3 abstractl Single concrete4 abstract1 Single And finally the use of abstract2 yields to the contexts abstract2 abstract2 List concrete5 abstract2 List concrete6 abstract2 List These contexts essentially mean that an instance of concrete5 can be replaced by any number of any concrete instance of abstract2 Reducing Contexts If there are two or more conflicti
39. erits from both x1a and x1b the programmer should not rely on the relative order of post x1a and post_x1b The only guarantee made by maketea is that the order of the pre methods will be the exact reverse of the order of the post methods 53 Chapter 12 Overview of the AST classes and transformation API The Transform API Figure 12 2 Sequence Diagram for the Transform API Application AST fao AST fno AST har AST harz l l I I I I I I ransform transfagm I I I I I I tranzformitranzfo rng I I I I I I pre_footthis l I I I neu AST bar dm iS ere l bar l E PER A I I bar transform chilfiren transformi I I I I I I children b arit I I I I Bo RE ee 4 I I I I VE tate er E a A ne l I I I I I I post transtoyni trantsormi l I I I Io I post_fog this I I I I I I Dein I I ue E Ve a baz I I a Al SSS See I Bde I I I I I 54 I BA EJ I I S I I I I a I I I I l i Chapter 12 Overview of the AST classes and transformation API Every AST class AST foo which inherits from AST gen foo provides four methods to support the tree visitor API AST gen foox transform AST Transformx AST gen foox pre transform AST Transformx void transform children AST Transformx and AST gen foox post transform AST Transformx Itis not entirely as straightforward as this if AST foo inherits from more than one class the return type would probably be AST foo in some cases transform might return
40. es gengetopt http www gnu org software gengetopt gengetopt html if you need to add additional command line arguments you will need version 0 20 or higher gperf http www gnu org software gperf gperf html if you need to modify the list of keywords recognized by the lexical analyser Building PHP for phc development When compiling PHP for use with phe there are a few options Development For developing phe or debugging phe problems it is worthwhile to have debugging symbols and leak checkers enabled Chapter 1 Introduction CFLAGS 00 ggdb3 configur nable debug enable maintainer zts nabl mbed Deployment For performance optimization should be used CFLAGS O3 g configur nabl mbed Benchmarking In order to be fair both phc generated code and PHP should be compiled with O3 The are also some options required to run some benchmarks The prefix is supplied to correspond to benchmarking scripts we provide CFLAGS 03 DNDEBUG configur nabl mbed nable bcmath with gmp prefix usr local php opt Demonstration This section is intended as a quick introduction outlining what the current release of phc can do for you It does not explain everything in detail The Source Program Consider the following simple PHP script lt php function foo return 5 Sfoo foo echo foo is foo lt br gt 2 gt Internally this program gets represented as an abstract syntax
41. esponsible for visiting all the children of the node The default implementation for I is 25 Chapter 7 Modifying the Traversal Order void Visitor children_if Ifx in visit_expr in gt expr visit_statement_list in gt iftrue visit_statement_list in gt iffalse you can find this definition in AST_visitor cpp If you want to change the order in which the children of a node are visited entirely avoid visiting some children or simply execute a piece of code in between two children this is the method you will need to modify Here is the transform that does what we need available as plugins tutorials Comment_ifs la include AST visitor h class Comment ifs public Visitor private bool comment public Comment ifs comment false void children if Ifx in visit_expr in gt expr comment true visit_statement_list in gt iftrue comment true visit statement list in iffalse comment false void post statement Statementx in if comment amp amp in Pget comments gt empty in Pget comments Ppush back new String x TODO Insert comment x comment false y What s Next Chapter 8 explains how to deal with transforms that can replace a single node by multiple new nodes and shows how to call the phe parser and unparser from your plugins 26 Chapter 8 Returning Lists In this tutorial we will develop step by step a transform tha
42. found in the archive http www phpcompiler org src archive Moreover although we have tried to document phc as well as we can if anything is still unclear please let us know by sending an email to the mailing list http www phpcompiler org mailinglist html Chapter 2 Installation Instructions System Requirements Warning These instructions only apply if you don t intend to modify phc and you are using a downloaded phc release If you intend to modify it or you are using the phe SVN repository http code google com p phc please refer to the instructions for developers phe needs a Unix like environment to run it has been tested on Linux Solaris FreeBSD Cygwin and Mac OS X To compile phe you will need g version 3 4 0 or higher make Boost version 1 34 or higher PHP5 embed SAPI version 5 2 x recommended refer to PHP embed SAPI installation instructions for more details This is required to compile PHP code with phe Xerces C http xml apache org xerces c if you want support for XML parsing you don t need Xerces for XML unparsing Boehm garbage collector is used in phe but not in code compiled by phe If unavailable it can be disabled with disable gc but phe will leak all memory it uses The following dependencies are optional a DOT viewer such as graphviz http www graphviz org if you want to be able to view the graphical output generated by phe for example syntax
43. fy method function names the nodes we are interested in are the nodes of type Method However how do we know when to modify a particular method Should we search the method body for function calls to dbx_xxx As we saw in Chapter 3 manual searching through the tree is cumbersome there must be a better solution 22 Chapter 6 Using State The solution is in fact very easy At the start of each method we set a variable uses_dbx to false When we process the method we set uses_dbx to true when we find a function call to a DBX function Then at the end of the method we check uses dbx if it was set to true we modify the name of the method This tactic is implement by the following transform available as plugins tutorials InsertDB la in the phe distribution Note the use of pre_method and post_method to initialise and check use_dbx respectively Because we don t need to modify the structure of the tree in this transform we use the simpler AST_visitor API instead of the AST_transform API class InsertDB public Visitor private int uses_dbx public void pre method Methodx in uses dbx false void post method Methodx in if uses_dbx in gt signature gt method_name gt value gt append _DB void post method invocation Method invocationx in Wildcard METHOD NAME pattern new Wildcard lt METHOD_NAME gt Check for dbx_ if in gt method_name gt match pattern amp am
44. ge collector so there is never any need to free objects you never have to call delete This makes programming much easier and less error prone smaller chance of bugs match compares two sub trees for deep equality There is also another function called deep equals which does nearly the same thing but there are two important differences mat ch does not take comments line numbers and other additional information into account whereas deep equals does The second difference is that mat ch supports wildcards this will be explained in Chapter 5 Modifying the Parameters Unfortunately renaming mysql_connect to dbx connect is not sufficient because the parameters to the two functions differ According to the PHP manual http www php net manual en index php the signatures for both functions are mysql connect server username password new link int client flags and dbx connect module host database username password persistent The module parameter to dox connect should be set to DBX MYSQL to connect to a MySQL database Then host corresponds to server and username and password have the same purpose too So we should insert DBX MYSQL at the front of the list and insert NULL in between host and username the mysql connect command does not select a database The last two parameters to mysql connect do not have an equivalent in dbx connect so if they are specified we cannot perform the conversion The last paramete
45. he XML file we just generated back to PHP syntax run phc read xml ast pretty print helloworld xml The generated XML should use the schema http www phpcompiler org phc 1 0 However our XML schema is currently broken Chapter 3 Running phe Internal Representations After parsing phe converts a PHP script into an Abstract Syntax Tree AST this is further explained in Chapter 3 in The phc Developer s Manual This is very useful for processing PHP scripts which you wish to convert back into PHP However for some tasks especially program analysis a simpler form of the PHP script is more suitable phe offers two other Internal Representations IRs The High level Internal Representation HIR simplifies most expressions by assigning them to temporary variables However code represented in the HIR is still valid PHP The Medium level Internal Representation MIR converts HIR statements to simpler components for example converting control flow statements like the or loop into gotos To view PHP in any of these forms use the dump option phe dump ast helloworld php phe dump hir helloworld php phe dump mir helloworld php Nearly all phe options work as well on the HIR and MIR as on the AST For example XML can be read and written phe dump xml hir myprog phc read xml hir Graphical Output If you have a DOT viewer installed on your system for example graphviz http www graphviz org you
46. i me alte inta lid WIP EE Reference eise eee be tea ppt Nu e eid n dpt t o M te iii I Tree Traversal API Tutorials sessesnezeznonsnennzzonsnenenenennenezonsenenennenenzonsaneneesenensenensnnenannenanzenanne ae GR En EE 2 A A IDAS bad Seed 22 Compiling a Plugin socorrer csi p A A To Aboutexterm CU uta tel deti eee 7 Abstract Syntax utate ote duet o m SN The Abstract Syntax Tree ecce bac Rte er IR servis e HERR e Eno etes 27 Working with the AST epe ap iaa ORO Sierras cian 2 Actually ede ae ate ehhh a eed Re ei eS 2 Writing Stand Alone Applications sse seem mn nnnnnnenenennennnnenzenzznnznnznn eren en nennen nenne What s Next ao E ere d Oa e e e Dite eie 27 3 Traversing the Trees reses pL PM D 77 The Grammar Revisited 2 Statements and Expressions 0 seen ern ennnnnnnnnz mann concen non reete etren tenter ennt ete 22 The Difficult Solution 5 npe a tore eredi The Easy Solon docto E Pre and Post Methods une tenere pm een eot eerte parti 4 Modifying Tree Nodes coii ce d Ree 22 AS Aa 2 First Attempt cic anc cons atid en aa 22 Modifying the Parameters 2 noe eet ee etie ep lc eder t eerte Re f ctoring iens rete a a eI e OI EE RI iei etes 27 Whats Next air et epe IS DU pad ETA p pt B AT L spi 2 5 Restructuring the Tree si oce REPRE p OD ie iS 2 A dde a edie 2 Introducing the Tree transform AP 22 The Implementation A ID A T
47. ight find it useful to read it again after having gained some experience with the transformation API We have implemented the transform as a post transform rather than a pre transform Why Suppose we implemented the transform as a pre transform Consider the following PHP expression bracketed explicitly for emphasis Inn l a Jom The first binary operator we encounter is the second one get phe to print the tree if you don t see why So we apply the transform and replace the operator by its left operand which happens to be a We then continue and transform the children of the that node because that is how the tree transform API is defined But the children of that node are and Sa So that means that the other binary operator itself will never be processed There are two solutions to this problem The first is the one we used above and use a post transform instead of a pre transform You should try to reason out why this works but a rule of thumb is that unless there is a good reason to use a pre transform it s safer to use the post transform because in the post transform the children of the node have already been transformed so that you are looking at the final version of the node The second solution is to use a pre transform but explicitly tell phe to transform the new node in turn This is the less elegant solution but sometimes this is the only solution that will work see for example the Token_conversion tr
48. ins will not work unless the symbol information is available Test suite phc is shipped with its tests which can be run after compilation make test You can also add your own code to test subjects 3rdparty and test it by running make long test to run the entire suite Note that many tests still fail Please submit the results to us on the mailing list http phpcompiler org mailinglist html phc packages See the downloads page http www phpcompiler org src for existing packages In addition we re looking for people to create and or maintain packages for more systems including Debian Ubuntu especially Debian Ubuntu Gentoo Slackware Darwin and Solaris 62
49. k for calls to include if in gt expr gt match pattern Matched Try to parse the file PHP script php script parse filename gt value gt value NULL false if php script NULL cerr lt lt Could not parse file lt lt xfilename gt value gt valu lt lt on line lt lt in get line number lt lt endl exit 1 Replace the include by the statements in the parsed file out push back all php script statements else No match leave untouched out gt push_back in y extern C void load Pass manager pm Plugin_passx pass pm add after named pass pass new String ast 3l Chapter 8 Returning Lists extern C void run ast PHP scriptx in Pass managerx pm Stringx option Expand_includes einc in transform children amp einc Exercise One problem with the plugin we have developed is that if the file we are including in turn has include statements they will not be processed Modify the plugin to invoke the transform on the list of statements from the parsed file taking care to deal with infinite loops if the first file includes the second and the second the first What s Next This is the last tutorial in this series on using the AST_visitor and AST_transformclasses Of course the only way to really learn this stuff is to try it out for yourself Hopefully the tutorials will help you do so The following sources sh
50. l look at modifying the tree The task we set ourselves is replace all calls to mysql_connect by calls to dbx_connect dbx http pecl php net package dbx is a PECL extension to PHP that allows scripts interface with a database independent of the type of the database this conversion could be part of a larger refactoring process that makes a script written for MySQL work with other databases The tutorial we develop in this tutorial is available as MySOL2DBX 1a in the phe distribution To see its effect run phe as follows phe run plugins tutorials MySQL2DBX la pretty print test php First Attempt We are interested in all function calls to mysql_connect Let us have a look at the precise definition of a function call according to the The Abstract Grammar Method invocation Target Method name Actual parameters Method name METHOD NAME Reflection Actual parameter is ref amp Expr Reflection EXxpr The target of a method invocation is the class or object the function gets invoked on if any It need not worry us here For now we are only interested in the Method name The grammar tells us that a Method name is either a METHOD NAME or a node of type Ref lection If a symbol is written in CAPITALS in the grammar that means it refers to a token a literal value In this case to an actual method name such as mysql_connect In PHP it is also possible to call a method whose name is sto
51. les use the e phe will compile your program then immediately execute it You can also view the C code generated by phc phc generate c helloworld php helloworld c One of the advantages of phc is that it can optimize your program Using the O flag you can instruct phc to analyse your source code and perform simple optimizations On simple benchmarks this can increase the speed of your application by 5096 To optimize phe 02 c helloworld php o helloworld phc generates C code which is then compiled by gcc To see the command passed to gcc by phc use the v flag If you specify the O flag phe will also pass the O flag to gcc which will optimize your code further The argument to the O flag must therefore be usable by gcc so it must be any of 00 default O1 02 03 or Os Consult the gcc manual http gcc gnu org onlinedocs gcc Optimize Options html Optimize Options for more details It is also possible to pass command line arguments to gcc through phc using the C flag For example to disable inlining of the generated code by gcc using no inline Chapter 3 Running phe phe c 02 helloworld php o helloworld C fno inline Compiling web applications Warning In order to compile web applications it is currently necessary to alter your php ini file or have access to the root account We welcome suggetions of a different method which avoids these requirements especially if they would work in a
52. news is that phe provides sophisticated support for examining and modifying this tree This is explained in detail in the follow up tutorials Chapter 3 Traversing the Tree In Chapter 2 we explained that phe represents PHP scripts internally as an abstract syntax tree and that the structure of this tree is determined by the The Abstract Grammar We then showed how to make use of this tree to count the number of statements However the plugin we wrote only counted the top level statements Statements nested inside other statements for example statements inside the true or false branch of an if statement were ignored In this tutorial we will rectify this problem and write a plugin that counts all statements in a script So for lt php x 5 if x 5 echo yes else echo no 2 gt we should report four statements Note that all the plugins that we will develop in these tutorials are included in the phe distribution For example in this tutorial we will be developing two plugins a difficult solution to the problem and an easy solution to the problem You can run these plugins by running phe run plugins tutorials count statements difficult la test php or phe run plugins tutorials count statements easy la test php The Grammar Revisited How do we go about counting all statements in a script Remember that as far as phc is concerned a PHP script consists of a number of statements but some of those stateme
53. ng contexts for a single symbol we must resolve the contexts to their most specific restrictive form For instance for the phe grammar this yields if statement List CLASS_NAME CLASS_NAME Single INTERFACE_NAME INTERFACE_NAME Single 58 Chapter 13 Maketea Theory So a context is a triplet symbol symbol multiplicity where the symbols are terminal or non terminal symbols and the multiplicity is either Single Optional List OptionalList or ListOptional list of optionals When reducing two contexts a b c a b c we take the meet of b and b that is the most general common subclass of b and b where more general means higher up in the inheritance hierarchy and opt for the most restrictive Multiplicity Single over Optional Single over List etc The general idea is that we want the most permissive context for a non terminal that is still safe if it is safe to replace an a by a list of bs everywhere in a tree the context we want for a is a b list To see the reason for taking the meet consider this fragment of the phc grammar Expr BOOL Cast CAST Expr ethod_invocation Target Target Expr CLASS_NAME The use of expr in the rule for cast leads to the context BOOL expr Single The use of target in the rule for method_invocation leads to the context BOOL target Single By taking the meet of expr and target this gives the context BOOL ex
54. ng phc If you can follow those instructions and you get the output you should get congratulations You have successfully installed phc Chapter 3 Running phc Once you have installed phe see Installation Instructions ran it by typing phe help You should see phe 0 2 0 Usage phc OPTIONS h help full help V version GENERAL OPTIONS verbose V c compile pretty print obfuscate run STRING r option STRING d define STRING INPUT OPTIONS read xml passname include COMPILATION OPTIONS Es c option STRING extension NAME O optimize STRING Oy 9UTpPut sE LLE se S eR ecules PRETTY PRINTING OPTIONS next lin curlies no leading tab tab STRING FILES Print help and exit Print help including hidden options and exit Print version and exit Verbose output default off default off Pretty print input according to the Zend style guidelines default off Obfuscate input default off Run the specified plugin Compile may be specified multiple times Pass option to a plugin specify multiple flags in the same order as multiple plugins 1 option only per plugin Define ini entry c and include only affects Assume the input is in XML format Start processing after the named pass Parse included or required files at compile time default off Pass opti
55. nto an internal representation known as an abstract syntax tree or AST Applications can process PHP code by analysing and modifying this abstract representation in one of two ways phc supports plugins Plugins are modules that can be loaded into phe which get access to the AST phe provides sophisticated support for writing operations over the AST through the Tree Transformation API Alternatively you can export the AST to XML You can then process the XML in any way you like and then use phc to convert the XML back to PHP The Tree Traversal API Tutorials explain how to write plugins for phe and provide numerous examples You will find Reference very useful when writing serious applications using phc Although we have tried to document phc as well as we can if anything is still unclear please let us know by sending an email to the mailing list http www phpcompiler org mailinglist html System Requirements If you want to modify the internals of phe in other ways than through the explicit API we provide for doing so you will following tools listed below in addition to those detailed in the user manual Chapter 2 in The phc User s Manual However most people should not need these tools even if you are implementing tools based on phe flex if you need to modify the lexical analyser bison if you need to modify the parser maketea http www maketea org if you want to modify the phe grammars or the AST HIR MIR class
56. nts may have other statements nested inside them Here is part of the phc grammar PHP script Statementx Statement Eval expr If While If Expr iftrue Statementx iffalse Statement While Expr Statementx The vertical bar means or So a statement is either an evaluation of an expression eval_expr an if statement or a while statement or Thus our running example is represented by the tree in Figure 3 1 The four statements that we are interested in have been highlighted Chapter 3 Traversing the Tree Figure 3 1 Abstract syntax tree for the running example L METHOD NAME Chapter 3 Traversing the Tree Statements and Expressions The Eval_expr nodes in the tree probably need some explanation There are many different types of statements in PHP i f statements while statements for loops etc You can find the full list in the The Abstract Grammar If you do look at the grammar you will notice in particular that a function call is not actually a statement Instead a function call is an expression The difference between statements and expressions is that a statement does something for example a for loop repeats a bunch of other statements but an expression has a value For example 5 is an expression with value 5 1 1 is an expression with value 2 etc A function call is also considered an expression The value of a function call is the value that the f
57. on to the C compile e g C g Can be specified multiple times Generate a PHP extension called NAME standalone application default 0 Place executable into file FILE Run executabl default off instead of a Optimize after compiling implies c Output the opening curly on the next line instead of on the same lin default off Don t start every line in between with a tab default off String to use for tabs while unparsing default Chapter 3 Running phe no hash bang Do not output any fl lines default off Now write a very small PHP script for example lt echo Hello world gt and save it to helloworld php Then run phe phe pretty print helloworld php This should output a pretty printed version of your PHP script back to standard output lt php echo Hello world 2 gt You can see a list of options controlling the style of pretty printing using the ull help option Compiling executables phe can compile either executables or extensions To compile an executable phe creates C code which it compiles and links to the PHP embed SAPI Since it links to PHP you have access to all of PHP s large built in standard library In order to compile the hello world executable from before run phe c helloworld php o helloworld This creates an executable helloworld which can then be run helloworld If you prefer to run your executable immediately after it compi
58. ou may have found that our plugin isn t quite correct Consider the following example php x 5 if x 5 echo yes else echo no 2 gt Chapter 2 Getting Started If you run our plugin on this example if will report two statements Why Well the first statement is the assignment and the second is the conditional the if statement The statements inside the if statement are not counted because they are not part of the outer list of statements of the script In the next tutorial we will see how to fix this Writing Stand Alone Applications If you prefer not to write a plugin but want to modify phc itself to derive a new stand alone application you can add your passes in src phc cpp in the phe source tree instead This has the effect of hardcoding your plugin into phc in versions before 0 1 7 this was the only way to write extensions However in the rest of the tutorials we will assume that you are writing your extension as a plugin What s Next In theory you now know enough to start implementing your own tools for PHP Write a new plugin run the plugin using the run option and optionally pass in the pretty print option also to request that phc outputs the tree back to PHP syntax after having executed your plugin However you will probably find that modifying the tree despite being well defined and easy to understand is actually rather laborious It requires a lot of boring boilerplate code The good
59. ould also be useful The The Abstract Grammar and the Maketea Theory The explanation of how PHP gets represented in the abstract syntax as detailed in Representing PHP The definition of the C classes for the AST nodes in src generated AST h The definition of the AST visitor and AST transformclasses in src generated AST visitor h and src generated AST transform h respectively And of course we are more than happy to answer any other questions you might still have Just send an email to the mailing list http www phpcompiler org mailinglist html and we ll do our best to answer you as quickly as possible Happy coding 32 Il Reference Chapter 9 The Abstract Grammar This is the full and authoritative definition of the phe abstract grammar for PHP in maketea format this can also be found in src generated_src ast tea in the distribution For a description of the structure of the grammar and how it converts to C code refer to the Chapter 13 Overall Structure PHP script Statement Class_def Class mod CLASS NAME extends CLASS NAM implements INTERFACE_NAME x Memberx Class_mod abstract final ES M Interface def INTERFACE NAME extends INTERFACE_NAMEx Memberx Member Method Attribute Method Signature Statementx Signature Method mod is ref amp METHOD NAME Formal parameter Method mod public protected
60. p pattern gt value gt value gt find dbx_ 0 uses dbx true Nn In Chapter 4 we simply wanted to check for a particular function name and we used mat ch to do this if in match new METHOD NAME mysql connect Here we need to check for method names that start with dbx_ We use the STL method find to do this but we cannot call this directly on in Pmethod name because in Pmethod name has type Method name could either be a METHOD NAME or a Ref lection node However calling mat ch on a pattern has the side effect of setting the value to point to the node that was matched by the wildcard So if the match succeeds we know that the name of the method must have been a METHOD NAME and we can access this name by accessing pattern gt value pattern gt value gt value is the value field of the METHOD NAME itself i e the actual string that stores the name of the method 23 Chapter 6 Using State Of course this transform is not complete renaming methods is not enough we must also rename the corresponding method invocations This is left as an exercise for the reader What s Next Chapter 7 explains how to change the order in which the children of a node are visited avoid visiting some children or how to execute a piece of code in between visiting two children 24 Chapter 7 Modifying the Traversal Order As explained in the previous tutorials in particular Traversing the Tr
61. p Recall the skeleton plugin include lt AST h gt include lt pass_manager Plugin_pass h gt extern C void load Pass manager pm Plugin_passx pass pm gt add_after_named_pass pass new String ast Chapter 2 Getting Started extern C void run ast AST PHP scriptx in Pass managerx pm Stringx option You will notice that run_ast gets passed an object of type PHP_script This is the top level node of the generated AST If you look at the grammar Chapter 9 you will find that PHP_script corresponds to the following rule PHP script Statement Thus as far as phe is concerned a PHP script consists of a number of statements The class PHP_script will have therefore have one member called st atements the list of statements So to count the number of classes all we have to do is query the number of elements in the statements list include lt AST h gt include lt pass_manager Plugin_pass h gt extern C void load Pass manager pm Plugin_passx pass pm add after named pass pass new String ast extern C void run ast AST PHP scriptx in Pass managerx pm Stringx option printf Sd statement s found n in gt statements gt size Save this file to myplugins count statements cpp Compile myplugins phc compile plugin count statements cpp And run phc run count statements la hello php Actually If you actually did try to run your plugin y
62. phc Documentation phc Documentation The phc User s Manual The phc User s Manual Table of Contents O n 99 M PEgc ae system Requirements retener edoctus te ete t edt tese teet tes on PHP embed SAPI installation instructions ss eerte eene nennen nennt Installation Instruction e id O Compiling executables erien iren reti seed Eeer js Compiling web applications essent enne enne nennen tnnt nene etre tentent nens Alterjati MON E Writing and Reading XML rettet e eere rre obe EIER ERES T EROR edi dades 22 Internal Representations 2 rre tee A p EENS raphi al Output EE 2 Incl dimg e oe o Ine A pue pe A a jA List of Figures 3 1 Abstract syntax tree for Hello World rseson ener nennen rene vi Chapter 1 Introduction phc has supports limited code generation and can be used as a front end to parse PHP for other applications This manual explains how to compile install and use phe how to compile command line and web applications and how to convert PHP to an XML reprentation and back Note Documentation of the phe API including how to write plugins can be found in The phe Developers Manual The documentation for this and for older versions of phe can be
63. pr Single This means that it is always safe to replace a boolean by any other expression but it is not always safe to replace a boolean by any other target In the case of CLASS_NAME we have the contexts CLASS_NAM CLASS_NAM class_name Single D D target Single The meet of class name and target does not exist hence this gives the context CLASS NAME CLASS NAME Single That is the only safe transformation for CLASS NAME is from CLASS NAME to CLASS NAME To be precise about the most specific multiplicity here is a Haskell definition that returns the meet of two multiplicities meet mult Multiplicity gt Multiplicity gt Multiplicity meet mult ab a a meet_mult Single _ Single meet_mult List Optional Single meet_mult List OptList List meet_mult List ListOpt List meet_mult Optional OptList Single meet_mult Optional ListOpt Optional meet_mult OptList ListOpt List meet mult a b meet mult b a meet is commutative Resolution for Disjunctions We cannot deal with this situation w ll DD o Q 59 Chapter 13 Maketea Theory SORA This grammar leads to the following contexts a a Single b a Single b b Single c a Single Crer LEST Resolving these contexts lead to a a Single b b Single C C List However this is incorrect because this indicates that an a will only be replaced by another single a but a c which i
64. r to dbx connect persistent is optional and we will ignore it in this tutorial 14 Chapter 4 Modifying Tree Nodes Now in phe DBX_MYSQL is a Constant which has two fields an optional class name for class constants and the name of the constant of type CONSTANT NAME NULL is represented by NIL to avoid getting confused with the C value NULL We are now ready to write our conversion function include AST_visitor h include lt pass_manager Plugin_pass h gt using namespace AST class MySQL2DBX public Visitor public void post_method_invocation Method invocationx in Actual parameter listxp iterator pos CONSTANT NAMEx module name Constantx module constant Actual parameterx param if in gt method_name gt match new METHOD NAME new String mvsql connect Check for too manv parameters if in gt actual_parameters gt size gt 3 printf Error unable to translate call to mysql_connect on line d n in get line number return Modify name in gt method_name new METHOD_NAME new String dbx_connect Modify parameters module_name new CONSTANT_NAME module_constant new Constant new String DBX_MYSQL NULL module_name pos in gt actual_parameters gt begin param new Actual_parameter false module_constant in gt actual_parameters gt insert pos param pos Skip host pos param new
65. red in variable in this case the function name will be a Reflection node which contains an Expr In this tutorial we are interested in normal method invocations only All tokens have an attribute called value which corresponds to the value of the token For most tokens the type of value is a String consider a St ring to be an STL string However for some tokens for example INT value has a different type e g int If the token has a non standard type it will have method called get source rep which returns a Stringx representing the token in the source For example the real number 5E 1 would have value equal to the double 0 5 but get source rep would return the Stringx SE 1 Thus we arrive at the following first attempt include AST_visitor h include lt pass_manager Plugin_pass h gt using namespace AST class MvSOL2DBX public Visitor public 13 Chapter 4 Modifying Tree Nodes void post method invocation Method invocationx in if in gt method_name gt match new METHOD NAME new String mvsql connect Modifv name in gt method_name new METHOD NAME new String dbx connect y extern C void load Pass managerx pm Plugin_passx pass pm add after named pass pass new String ast extern C void run ast PHP scriptx in Pass managerx pm Stringx option MySOL2DBX m2d in gt visit amp m2d Note phc uses a garba
66. rror 51 Chapter 12 Overview of the AST classes and transformation API The Visitor API Figure 12 1 Sequence Diagram for the Visitor API Application AST fao AST fno Tree visitar l I l I l I I I I I Are gen foorthi I re foorthis I wigit_childrentvigito rni isittvisitor pfe visit visitor zt wisit visitor I ost_footthisi l ost gen faarfthj l l I Chapter 12 Overview of the AST classes and transformation API Every AST class provides four methods to support the visitor API void visit AST Visitorx void pre visit AST Visitorx void visit_children AST Visitorx and void post visit AST Visitorx The implementation of each of these methods is very simple visit simply calls pre_visit visit_children and post_visit in order It could have been implemented once and for all in the Node class but is not for no particular reason For a node x0 which inherits from x1 which inherits from x2 which in turn inherits from x3 etc x0 pre_visit calls pre_x3 pre_x2 pre_x1 and pre_xo0 in that order on the tree visitor object passing itself as an argument If x0 inherits from multiple classes all of the appropriate visitor methods will be invoked However if x0 inherits from both x1a and x1b the programmer should not rely on the relative order of pre xla and pre_x1b x0 visit children simply calls children_x0 x0 post visit will call post_x0 post x1 etc Again if x0 inh
67. s an a will in fact return a list of cs The problem is that the non terminals in the rule for a have a different multiplicity in their contexts single for b list for c maketea disallows this if this happens in a grammar maketea will exit with a cannot deal with mixed multiplicity in disjunction error Otherwise fora rule a bl b2 ifthe multiplicity of a is list and the multiplicities of all the bs are lists the multiplicity for a will be list if the multiplicity of all the bs is single the multiplicity for a will be set to single independent of the original multiplicity for a 60 Ill Development guide Chapter 14 Porting and Packaging We need porters packagers and maintainers Now that phc has a plugin architecture it is no longer necessary for users to integrate their source with it As a result it is much more useful to package phe and integrate it within various distributions package management systems If you are interested in packaging phe for your favourite OS please contact us http www phpcompiler org mailinglist html Currently phe runs on x86 Linux and is mostly tested using Ubuntu If you have access to other machines architectures or operating systems and would be willing to test phe on it please contact us http www phpcompiler org mailinglist html Packaging hints Do not strip the binaries Since the plugins use dlopen and link dynamically against the phe binary the plug
68. s use require require_once include include once isset and empty all get translated into a function call to a function with the same name as the keyword exit also becomes a call to the function exit exit and exit are interpreted as exit 0 Comparison to the PHP grammar Finally the phc grammar is much simpler than the official grammar and as a consequence more general The class of programs that are valid according to the abstract grammar is larger than the class of programs actually accepted by the PHP parser In other words it is possible to represent a program in the abstract syntax that does not have a valid PHP equivalent The advantage of our grammar is that is much much easier to work with To compare consider the tree for g greet TACS Using the phe abstract syntax this looks like the tree shown in figure Figure 10 1 43 Chapter 10 Representing PHP Figure 10 1 Function call in the AST AST_method_invocation 11 AST_actual_ E MAIN AST_expr However in the official PHP grammar the tree would look like the tree shown in figure Figure 10 2 44 Chapter 10 Representing PHP Figure 10 2 Function call as represented by PHP variable base_variable_with_function_calls UT Ges DOO Gen a Cae non_empty_function_call_parameter_list 45 Chapter 10 Representing PHP Not only is the number of concepts used in the tree much larger base variable with
69. sider the rule for variable in the grammar Expr Variable Variable Target Variable name array indices Expr x A Variable is an Expr so that Variable is represented by the class shown below class Variable virtual public Expr public Target target Variable namex variable name Expr list array indices A final note on combining x and The construct ax 7 denotes an optional list of as Thus it will be represented by an A list If a list is specified but empty the list will simply contain no elements If the list is not specified at all the list will be NULL This is used for example to distinguish between methods that contain no statements and abstract methods Similarly a x is a non optional list of optional as Thus this is a list but elements of the list may be NULL This is used for example to denote empty array indices a in the rule for Variable Context Resolution We also derive the tree visitor API and tree transformation API from the grammar The tree visitor API is very simple to derive see the Overview of the AST classes and transformation API for an explanation 57 Chapter 13 Maketea Theory The tree transformation API however is slightly more difficult to derive The problem is to decide the signatures for the transform methods or in other words what can transform into what For example in the phe grammar for PHP the transform for an if stat
70. simple plugin that counts the number of statements in a PHP script Traversing the Tree introduces the support that phe offers for traversing and transforming scripts It shows how to write a plugin that correctly counts the number of statements in a script Modifying Tree Nodes shows how you can modify nodes in the tree without modifying the structure of the tree It shows how to replace calls to mysql_connect by calls to dbx_connect Restructuring the Tree shows how you can modify the structure of the tree It works through an example that removes unnecessary string concatenations for example a is replaced by just a Using State explains an advanced features of pattern matching and shows an important technique the use of state in transformations where one transformation depends on a previous transformation It shows how to write a program that renames all functions foo in a script to db Zoo if there are calls to a database engine within foo Modifying the Traversal Order explains how to change the order in which the children of a node are visited avoid visiting some children or how to execute a piece of code in between visiting two children Returning Lists shows how to define transformations that replace nodes in the tree by multiple other nodes and how to delete nodes from the tree It also shows to call the phe parser and unparsers from plugins Reference phc represents PHP scripts internally as an abstract syntax
71. stem and vou should file a bug report with the PHP group http bugs php net There is a known bug and long work around for OSX alreadv filed in the PHP bug svstem The most important part of the command is nabl mbed While the CFLAGS 03 environmental variable is optional we find it speeds up the executable by about four times If PHP is already installed on your system you may want to install this version separately using the prefix option Other configuration options are discussed in the developer manual Finally install the embed SAPI make install Installation Instructions First of all you must download http www phpcompiler org downloads html the latest release of phe To extract phe tar zxvf phc 0 2 0 tar gz This will create a new directory phc 0 2 0 that contains the phe source tree Finally you must compile phe If the dependencies are in their standard locations you should be able to simply type cd phc 0 2 0 configure make Consult configure help for configuration options if your dependencies are not in standard locations This should compile without any warnings or errors If this step fails please send a bug report to the mailing list http www phpcompiler org mailinglist html with as much information about your system as you can give and we will try to resolve it Finally install phe using make install Chapter 2 Installation Instructions For information on running phe see Runni
72. sually empty class Aa which acts as a superclass for the classes for b c z This reflects the semantics of the rule a b is an a if there are multiple rules aa cl b cl class C will inherit from both Aa and B This type of rule is exemplified by the production for Statement in the grammar There is one additional requirement for disjunction rules which will be explained in the section on context resolution below The second type is the most common A ues In this rule each of the b c z is an arbitrary symbol non terminal terminal or marker which may be optional b or repeated bx or b This type of rule must not include any disjunctions b c and only single symbols can be repeated no grouping If a symbol b can be repeated it will be represented by a specialised list class B 1ist which inherits from the STL 1ist class in the tree In addition the symbols may be labeled label symbol This does not add to the grammar structure but explains the purpose of the symbol in the rule and will be used for the name of the attribute of the corresponding class The default name for each class attribute depends on the corresponding type an attribute of type Variable name corresponding to a non terminal Variable name will be called variable name The default name for an attribute of type Foo list will be foos However as mentioned above this can be overridden by specifying a label As an example con
73. t expands include statements For example if b php is lt php echo Hello world 22 and a php is lt php include b php echo Goodbye 22 Then running the transform on a php vields lt php echo Hello world n echo Goodbye n Qm The transform we will develop in this tutorial is only a simple implementation of includes and we won t take every feature of include into account However it can serve as a basis for more full featured version The transform we will develop here is available as plugins tutorials Expand includes la Deleting Nodes Our transform should process include statements In the AST includes are represented as method invocations Thus we might start like this class Expand includes public Transform public Expr pre method invocation Method invocationx in Process includes D However this will not get us very far The return type of pre_method_invocationis an Expr That means that we can replace the method invocation the include statement only by another single expression But we want to replace it by the contents of the specified file Recall from Chapter 3 that to turn an expression into a statement phe inserts an Eval_expr in the abstract syntax tree Thus if we want to process include statements we could also look at all 27 Chapter 8 Returning Lists eval_expr nodes Assuming for the moment we can make that work does it get us any further
74. to do that but what should we match against If you are unsure about the structure of the tree it can be quite useful to use the XML unparser to find out what the tree looks like We modify the plugin as follows include AST transform h include process ir XML unparser h class Expand includes public Transform private 28 Chapter 8 Returning Lists XML unparserx xml_unparser public Expand includes Send output to cout do not print attributes xml_unparser new XML unparser cout false public void pre_eval_expr Eval_exprx in Statement_listx out in gt visit xml_unparser hi The XML unparser is implemented using the Vi sitor API so it can be invoked just like vou run anv other visitor There is a similar visitor called AST unparser in lt process_ast AST_unparser h gt that you can use to print parts of the AST to PHP syntax When you run this transform on a php it will print two eval_expr nodes shown in XML syntax one for the include and one for the echo We are interested in the first the include lt AST Eval_expr gt lt AST Method_invocation gt lt AST Target xsi nil true gt lt AST METHOD_NAME gt lt value gt include lt value gt lt AST METHOD_NAME gt lt AST Actual_parameter_list gt lt AST Actual_parameter gt bool is ref gt false lt bool gt lt AST STRING gt lt value gt b php lt value gt lt AST STRING gt lt AST Ac
75. tree The structure of this tree is dictated by the The Abstract Grammar The grammar definition is a very important part of phe phe s view on the world as dictated by the grammar does not completely agree with the PHP standard view Representing PHP describes how the various PHP constructs get translated into the abstract syntax Overview of the AST classes and transformation API gives an overview of the AST classes the tree visitor API and the tree transformation API from a programmer s perspective Maketea is a tool bundled with phe which based on a grammar definition of a language generates a C hierarchy for the corresponding abstract syntax tree a tree transformation and visitor API and deep cloning deep equality and pattern matching on the AST Maketea Theory explains some of the theory behind maket ea in particular the grammar formalism the mapping from the grammar to the AST classes and the derivation of the tree transformation API I Tree Traversal API Tutorials Chapter 2 Getting Started For this introductory tutorial we assume that you have successfully downloaded and installed phe and that you know how to run it Chapter 2 in The phe User s Manual and Chapter 3 in The phe User s Manual This tutorial gets you started with using phe to develop your own tools for PHP by writing plugins Compiling a Plugin To get up and running we ll first write a hello world plugin that does nothing except print a string
76. trees Under Debian Ubuntu the following command will install nearly all dependencies apt get install build essential libboost dev libxerces27 dev graphviz libgc dev You will still need to install the PHP embed SAPI manually PHP embed SAPI installation instructions If you do not intend to compile PHP code using phc you may skip this section In order to compile code phe must have the PHP embed SAPI available which is typically not available via standard package managers The embed SAPI is also required for compiling stand alone executables Chapter 2 Installation Instructions Download the PHP source tar gz package from php net http www php net We will assume you downloaded PHP version 5 2 6 the latest version available at time of writing To extract PHP tar zxvf php 5 2 6 tar gz This will create a new directory php 5 2 6 In order to configure and compile PHP you must know what configuration options you require These are likely to be the same as the version of PHP you are currently using which can be examined with the command php i grep Configure We will assume these options are enable bcmath with gmp with mysql a configuration which we occasionally use for benchmarking You are ready to build PHP When configuring you must add the nabl mbed option CFLAGS 03 configure enable bcmath with gmp with mvsql nabl mbed make If this command does not succeed there is a problem with PHP on vour sv
77. tual_parameter gt lt AST Actual_parameter_list gt lt AST Method_invocation gt lt AST Eval_expr gt This tells us that the include statement is an Eval_expr node that was obvious from the fact that we implemented pre_eval_expr The Eval_expr contains a Method_invocation we knew that too although of course a node of type Eval_expr can contain any type of expression The method invocation has target NULL it is not invoked on an object or a class method name include and a single parameter in the parameter list that contains the name of the file we are interested in We can construct a pattern that matches this tree exactly class Expand_includes public Transform private Wildcard lt STRING gt x filename Method_invocation pattern 29 Chapter 8 Returning Lists public Expand includes filename new Wildcard lt STRING gt pattern new Method_invocation NULL new METHOD_NAME new String include new List lt Actual_parameterx gt new Actual_parameter false filename public void pre eval expr Eval exprx in List lt Statementx gt x out Check for calls to include if in gt expr gt match pattern Matched Try to parse the file else No match leave untouched out gt push_back in hi Note how the construction of the pattern follows the structure of the tree as output bv the XML unparser exactly The
78. unction returns Now an eval_expr makes a statement from an expression So if you want to use an expression where phe expects a statement you have to use the grammar rule Statement Eval_expr Eval_expr Expr The Difficult Solution The following plugin is a partial solution to counting the number of statements in a tree If you do not understand the code do not worry We will look at a much easier solution in a second If you understand the comments that is enough include lt AST h gt include lt pass_manager Plugin_pass h gt int count AST Statement listx in Every item in in is a statement int num statements in gt size But there can also be statements nested inside anv of the statements in in We consider each one in turn Statement list const iterator i for i in gt begin i in gt end iHH Check if the statement is an if statement if Ifx if stmt dynamic cast Ifx xi num statements count if stmt iftrue num statements count if stmt iffalse return num_statements extern C void load Pass manager pm Plugin_passx pass pm add after named pass pass new String ast Chapter 3 Traversing the Tree extern C void run ast AST PHP scriptx in Pass managerx pm Stringx option int num statements count in gt statements cout lt lt num_statements lt lt statements found lt lt
79. unt_statements cfc in gt visit amp cfc We override a number of methods of the Visitor class to implement the functionality we need the traversal is then taken care of by phe Pre and Post Methods We need to be precise about the order in which phe calls all these methods Suppose we have a node Foo say an if statement which is a Bar say statement which itself is a Baz say commented node Then phc calls the visitor methods in the following order 1 pre_baz 2 pre bar 3 pre foo 4 children foo visit the children of foo 5 post foo 11 Chapter 3 Traversing the Tree 6 post bar 7 post baz Just to emphasise if all of the visitor methods listed above are implemented they will all be invoked in the order listed above So implementing a more specific visitor pre foo does not inhibit the more general method pre bar from being invoked You can run the plugins tutorials show_traversal_order la from the phc distribution to see this in action Note Advanced users As mentioned above if you implement pre if say the more general methods such as pre statement Or pre node will still be invoked It is possible to override pre if chain instead if you override pre if chain you are responsible for calling the more general methods manually If you don t they will not be called at all 12 Chapter 4 Modifying Tree Nodes Now that we have seen in Chapter 3 how to inspect the tree in this tutorial we wil

Download Pdf Manuals

image

Related Search

Related Contents

iMaze SFSTRHR-BLE  7月23日 17:00(必着) - 長崎県出納局物品管理室へようこそ  SERVICE MANUAL  TouchSystems V4280I-U3X2  Craig CMP621F Portable Multimedia Player User Manual  3 kW - Bonfiglioli USA  

Copyright © All rights reserved.
Failed to retrieve file