Home

- HSR - Institutional Repository

1. leek 1_Week2 Week3 Week4 Week 5 Week 6 Week7 Week 8 Week 9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week 18 Week 19 Week 20 Week 21 Test with big C projects Hook into CDT s static analysis framework Improve error handling Small Updates Add custom marker icons Add IncludePath selection algorithm Improve Annotation handling Optimize AST access Detect circular include dependencies Add dependency graph view impl Directly include referenced declarations Impl replace includes with forward declarations limp automatically manage includes Impl find unsed files Run continuously in background Final release Investigate on SDG Slicing Read Large Scale C Software Design Update Documentation Update documentation focus Documentation Figure C 1 Initial Project Plan Note that the investigation on the task Run continuously in background yielded that keeping the ReDHead data structure up to date on changes in ASTs and indexer would be very complex to achieve and also is not necessary because the ReDHead data structure as it is available now performs very well so that even reconstructing it while a programmer is typing each time works well enough The task Investigate on SDG Slicing resulted in the Chapter 2 of this documentation 111 During the first half of the master thesis the project plan was refined There were several tasks that were added which where necessary so the ReDHead
2. build File Browser 0 xX File Edit View Go Bookmarks Tabs Help Name Vv config build properties v resources eclipse SDK 3 6M5 CDT linux gtk tar gz build sh build xml ae ae JUNIT XSL startXvfb sh test xml gt Figure A 1 Build directory overview The following sections describe how the steps listed in A 3 are accomplished To trigger the whole build process the following command can be executed in the repository s build directory from a command shell This is exactly what the Hudson build server does after having automatically checked out the ReDHead Git repository build sh 95 ou 11 14 17 20 23 26 29 32 The first two and the last step are taken care of by the ReDHead build sh script which can be seen in the following listing The ReDHead build script sets up the environment to run the build xml ant script make sure we re in the current dir cd dirname 0 if f usr bin Xvfb then export display export DISPLAY 2 stanti tua lE cligyolleny a 3 startXvfb sh start else echo running without virtual display fi setup echo cleaning directory results rm rf results mkdir results p unziping echo unzipping eclipse tar xzf resources eclipse SDK 3 6M5 CDT linux gtk tar gz C results Let s go java jar results eclipse plugins org eclipse equinox launcher 1 1 0 v20
3. 78 Algorithm Runtime Project 50 24 00 45 32 48 43 12 00 36 53 80 36 00 00 36 53 80 28 48 00 21 36 00 MM SS 14 24 00 10 4911 09 41 24 09 49 72 13 31 28 09 56 56 07 12 00 09 45 15 04 2333 o418 26 05 15 74 04 18 33 00 00 00 1 2 3 4 5 6 F 8 9 10 11 12 13 14 15 16 17 18 19 20 Figure 9 2 Execution Time Project The measurements on the whole project do only start at index 3 The reason for this is that before the running of the find unused includes algorithm did not terminate successfully due to the already mentioned OutOfMemoryError Memory Usage 2 500 2 000 1 500 MB 1 000 500 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Figure 9 3 Memory Usage All the memory measurement shown in Figure 9 3 do also cover the memory consump tion of the testing Eclipse instance as well The consumption of the Eclipse application is roughly 50 to 80 megabytes As can be seen in the charts the performance gain is fortunately very big To makes this possible several changes have been introduced Firstly a lot of caching was added so that none of the ReDHead object which can be seen in Figure 5 1 are instantiated more than necessary To show how important this is the following list contains some information on this e Instantiations of ReDHeadDeclarationReferences was reduced to 37 e Instantiations of IncludePaths was reduced to less than 1 e Instantiat
4. 3 1 1 Compile Configurations When developing C software one normally uses a build system like make cmake qmake etc These build systems are used to assemble the numerous arguments which will be passed to the compiler that is used to compile the C software These argu ments coming from the build system combined with a compilers own arguments result in aso called compile configuration which is mainly used by the compiler itself However a key feature to successfully run static software analysis is such a compile configuration Without the compile configuration one could not reliably resolve all dependencies con tained in the C software since a lot of ambiguous cases would arise Acquiring such a compile configuration is a very complex process since there are many different C C compilers which normally comes along with one of several different build system How costly it can be to obtain such compile configurations is in detail described in BBCt10 Seen form this point of view it becomes clear that to effectively develop static include analysis for C code a powerful IDE which already provides a suitable compile con figuration is required Since CDT does this I can safe a lot of time in the development process of the ReDHead plugin 3 1 2 AST and Indexer CDT s AST and Indexer functionality already provides probably all the basic information which is needed to statically analyze C code To effectively do so this info
5. 9 3 Algorithm Performance When I first implemented the ReDHead data structure and some of the ReDHead algo rithms my main focus was to develop well designed code that produces correct results on the various but small code samples which form the ReDHead test suite The tests represent the various features and possibilities well The only thing that these small artificial code fragments can not do is representing a real world C project which comprises a large amount of code To test the result of the ReDHead algorithms I used a real C project which is under development at the Institute for Software in Rapperswil The first few runs were a bit disillusioning because the algorithms even when initiated only on a single file performed very badly both in means of memory and execution time A run the find unused includes algorithm on one code file took more than five minutes and used more than one gigabyte of memory When the algorithm was triggered on the whole project after running for TT more than half an hour the algorithm was interrupted by an QutOfMemoryError because the two gigabyte memory which were available for the application were exhausted Clearly the ReDHead plugin was in dire need of some performance optimization Dur ing the following work to make ReDHead perform better memory usage and execution time were regularly measured to judge the achieved advancement Before presenting these measured numbers following some number
6. Figure 5 2 All Available DeclarationReferenceDependencies 3 X x The graphs shown in Figure 5 3 shows all the DeclarationReferenceDependencies which are returned by the method getIncludedDependencies of the class ReDHead 42 DeclarationReference The dependency to the file Xfwdi1 h which is not included was grayed out and is not contained in the collection returned by getIncludedDe pendencies 1X4 1 class X Xfwd1 h 1 class X Declaration Reference Dependency Include Dependency Imain cpp 1 include X h 2 include Xfwd2 h 3 Xx Figure 5 3 Only Inclucded DeclarationReferenceDependencies Method getRequiredDependencies returns only these DeclarationReferenceDepen dencies that are required which is here the one to the declaration of class X in file X h To find out which of all the available DeclarationReferenceDependencies are required the ReDHead data structure evaluates if a forward declaration of class X is sufficient or not In Figure 5 4 x in the file main cpp is neither a pointer nor reference variable so the definition of X is required 1X h 1 class X Xfwd1 h 1 class X Declaration Reference Dependency Include Dependency Imain cpp 1 include X h 2 include Xfwd2 h Coy 3 X x Figure 5 4 Only Required DeclarationReferenceDependencies In Figure 5 5 the class X is in main cpp only used as a pointer variable So a forward declaration is enou
7. The system dependence graph was initially introduced in 1990 by Horwitz et al HRB90 It defines an abstract graph representation for a procedural programming language which supports function calls but neither object orientation macros nor other language constructs of C The difference here to an AST is that an AST is confined to a single code file whereas a SDG is not System dependence graphs were introduced as a instrument to help in the process of slicing Slicing is a graph traversal process which given a program point p and a variable x finds all parts of a program which affect and are affected by x at point p This slicing process can be helpful to perform debugging ADS 93 automatic parallelization BW88 Wei83 or to automatically integrate program variants HPR89 In 1996 Larsen and Harrold introduced a SDG for object oriented languages while using the language C for examples LH96 In Figure 2 1 one can find an example of an SDG which does not yet uses object orientation So Figure 2 1 is a SDG as defined by Horwitz et all HRB90 In the following example images the C source code given in Figure 2 1 will be adapted so it uses object orientation and also polymorphism EO E control flow edge int current_floor int top_floor KOSA data Haw edge int current_direction Si intfloor 5 function call edge S2 current floor 1 ae parameter in edge S3 top floor 10 parameter out edge S4 current_di
8. gorithm but by yielding only one dependency to the base class for all other ReDHead algorithms The second approach is to make the static code coverage algorithm aware of the problem and have it construct additional dependencies to overloading member function 10 2 Unimplemented Features 10 2 1 ReDHead include tag cloud A tag cloud is a cloud of words or terms where each term has a custom font size depending the amount of use of the given file These different sized terms are arranged to form a cloud of words When looking at such a cloud one can get a limited but still nice overview on the focal point of the cloud So an include tag cloud would visualize all the includes which are contained in a project as such a cloud which helps the user to realize within seconds the mainly used files in his project It can also help to detect coupling in physical design 10 2 2 ReDHead graph view A tree view which visualizes declaration reference dependencies or include dependencies is in my point of view a very valuable feature I was not able to spend any time on this feature yet and thus added this point to the Chapter Outlook 10 10 2 3 Implement further algorithms In this project not all of the algorithms proposed in Chapter 4 were implemented One of the major goals in a future work will be to implement further algorithms and enhance the existing ones The following list shows the algorithms that are yet to be implemented e Replace inclu
9. includeGuardName findIncludeGuardNameIn includedFile proposeToSurroundWithIncludeGuard include includeGuardName J 4 8 Finding Optimal Insert Positions Some of the static include analysis algorithms described above propose that new include directives should be added to a code file If a software engineer decides to apply the pro posed changes the questions arises where the optimal position is to insert the proposed include directives 37 This section describes the ReDHead component which calculates insert offset for in clude directives for a given source or header file The simplest answer to this position question is to just insert them at the beginning of the document There are two problems with this solution The first one is that when inserting include directives into a header file the include guards are bypassed The second one is that when inserting into a source file one is strongly disadvised to insert additional include directives in front of the one that includes the header file which correlates by name to the source file s name The reason for this is given on page 110 in Lak96 So a better place to insert includes is after the last of the already contained include guards However this is tricky because one should never insert in the scope of a condi tional compilation block include Onething h ifdef WIN32 7 include Anotherthing h endif HEX aon Sf For the code given i
10. l nn J l r I l l l l l l l main cpp other cpp include A h include A h i void main void foo Y Al ai A2a2 7 A2 a2 A3 a3 Note that the blue relations arrows between the files boxes represent include direc tives in the originating file which includes the target file Red dashed arrows represent dependencies to declared types in header files 67 The result when running the algorithm on the file main cpp and then on the file other cpp is shown in the two following figures include A h R main cpp 8 _ othercpp A An BAL A azn Baan ReDHead Suggestions x int main ReDHead Static Include Analysis Al al The runnded ReDHead static included analysis algorithms produced 2 A2 a2 suggestions L z File main cpp Actions to take 0 Missing include Al h Select the action that will be taken rae i forthe proposed suggestions for _ Missing include A2 h Se eS SE ee Checked Suggestions All Suggestions None IG _ Add markers Add markers Apply Quickfix Apply Quickfix e main cpp othercpp X N h Ah h A1 h P A2 h P A3 h include A h x void foo ReDHead Static Include Analysis A2 a2 The runnded ReDHead static included analysis algorithms produced 2 A3 a3 suggestions File othercpp C Missing include A2 h Missin
11. 112 C 3 Time Schedule In Figure C 3 the blue bars show how many hours I worked each week The yellow line shows the weekly target time of the project In green you can see the average work time of the the past weeks Weekly Worktime 72 00 60 00 T 48 00 E m oe k n S J _ A Ml Week time v 36 00 d d I H n Target time Average of past weeks 24 00 Ti 12 00 E m g snnnhHnHnnhknknenetitht 7 a 0 00 Time Hours Figure C 3 Weekly working hours Figure C 4 shows the accumulated working hours over the whole project period In orange you see the time I totally worked in blue the target time Total Worktime 1080 00 960 00 840 00 720 00 600 00 Time 480 00 Target time Time Hours 360 00 240 00 120 00 0 00 gt Figure C 4 Total working hours 113 C 4 CDT Bug Tickets The following list shows all the CDT bug tickets that were created during the ReDHead master thesis Most of the bugs deal with very special indexer scenarios Indexer bug tickets were processed by the CDT developer Markus Schorn who reacted very quickly and also very competently https bugs eclipse org bugs show_bug cgi id 295424 Calling the quick fix menu in an editor and double clicking on No suggestions available the document was deleted from offset 0 to the offset of the caret https bugs eclipse
12. 2 include foo h 3 include unused h 4 X foo 5 X bar Figure 5 6 Only Required Namespace DeclarationReferenceDependencies The ReDHead graph does not represent the ReDHead data structure in every detail The reason for this is that the code which represents a ReDHeadDeclaration of course contains the name which is declared So the code that yields ReDHeadDeclaration actu ally also always yields a ReDHeadDeclarationReference This means that for each dec laration contained in a ReDHead graph there should also be a DeclarationReference Dependency edge that leads to itself The reason not to show these edges in the ReDHead graph is that this would only be redundant information which would bloat the graph 44 In the ReDHead data structure however this so called self dependency is useful since otherwise the resolution of a DeclarationReference that origins from the code of a def inition would sometimes not yield any DeclarationReferenceDependencies Yielding one DeclarationReferenceDependency to itself is a more dynamic solution As shown in Figures 5 1 one ReDHeadDeclarationReference in some cases return not only one but rather several DeclarationReferenceDependencies The following subsections besides the examples given above show some additional special cases which should be considered to understand he ReDHead data structure 5 3 1 Preprocessor Symbols Normally the ReDHead data structure contains
13. GAT Codan amun Soe he 5 Soe Reet S ey SES Ne a hd ers Ch Se ea 6 4 2 Problem Feedback 0 0 0 0 0 0 0 0 0 0 0 000000084 6 9 Festing stl both tad anh 4 oe A neat Wang Fa i A bd 6 5 1 External Include Directories 004 User Manual 7 1 ReDHead Introduction 0 0 0 0 000000 a fede WAG gt towels oleh aoe dee ee ae ee ea eh es BR di a 7 2 1 CDT Codan Integration 0 2 2 22 2004 7 3 ReDHead Code Analysis Algorithms 7 3 1 Finding Unused Includes 0 0 00 00 0002 0G 7 3 2 Organize Includes 0220002 eee eee 7 3 3 Auto Organize Includes 2 2 ee 7 3 4 Directly Include Referenced Declarations 7 3 5 Finding Unused Files 0 02222004 7 3 6 Static Code Coverage 02 0202 eee ee ee 7 4 How ReDHead Include Analysis Works Market Analysis 8 1 Similar CDT Features sr oa e a al a a a ai ee Challenges 9 1 Adapting the CDT Index aa aaa a 9 2 Synchronism of CDT Index and AST aoaaa aaa a 9 3 Algorithm Performance aasa eee ee ees 9 4 Preprocessor Problems oaa aaa a 10 Outlook 10 1 Improvements 10 2 Unimplemented Features 10 2 1 ReDHead include tag 10 2 2 ReDHead graph view COWES erst 8 fan Bead cpl sell nate bas ean E ARDT 10 2 3 Implement further algorithms 10 2 4 Combine compile configuration results 0000 A Continuous Integration
14. Head s icons include lt vector gt ae include A h int main std string s ReDHead Annotations Annotations are added to the editor when running Static Code Coverage on a C project esr mem Applying Proposed Suggestion Solutions Almost all of the ReDHead suggestions bring along solution proposals called QuickFix which can be applied on a given document to resolve the issue These solution proposals will be shown by putting the cursor on a line containing a marker and pressing ctri 1 int Add include lt string gt 63 After applying a quickfix on a document the document will be adapted according to the current suggestion and the marker will be removed Deleting ReDHead Markers There are two ways to get rid of ReDHead editor markers without applying them To delete one or several specific markers one can use the the Problems view of Eclipse 2i Problems x Tastes g Console E Properties Progress 0 errors 2 warnings 0 others Description vV Wamings 2 items amp Missing include lt string gt usr include c 4 4 string The include statement include lt vector gt is unneeded c Go to B Copy Ctrl C oe eet Select All Ctri A Show In Shift Alt W gt Quick Fix tri 1 Properties Alt Enter To delete all ReDHead markers choose Remove all Static Analysis Markers from the ReDHead menu In addition
15. always also a declaration In the graph a declaration is represented as a green box 71 Declaration Reference A declaration reference represents any C element that references a declaration This can for example be the name of a class a variable name or a function name A declaration reference grants access to the declaration reference dependency element In certain cases a declaration reference gives access to several declaration reference dependencies In the graph a declaration reference is represented as a red circle Declaration Reference Dependency A declaration reference dependency is the link between a declaration reference and a declaration Through the declaration reference dependency ReDHead algorithms can gain access to include path elements Sometimes there are several such elements returned by a declaration reference dependency The declaration reference dependency element is represented in the graph as a red dashed arrow Include Paths An include path is a list of include directives which lead from a declaration reference to a declaration Include paths are not directly visible in the graph above But they can be seen as a combination of several blue arrows Note that the term include path does not refer to what is known as include path that is passed to a C compiler and is actually the name of a directory that contains C header files Example Algorithm Run A ReDHead algorithm starts in a given point which i
16. gt lt property name classname value ch hsr ifs redhead tests ReDHeadTestSuiteAll gt lt ant gt lt target gt lt project gt Important to see here is that the name of the base ReDHead test suite is passed in listing line 12 to the Ant task ui test which is started in line 9 99 A 4 Automated Build of the Documentation On the ReDHead Hudson server there is also an Ant task which is automatically run to build the project documentation which the is published automatically to var www ReD HeadFiles The documentation is then directly accessible from the Trac Wiki where you can find a link to it The Ant build file to build the documentation requires the Ant task latex which comes in a jar that can be downloaded from the Ant webpage from http Ant apache org external html The build file build xml which can be found in the Documentation directory of the repository is short compared to others ou lt project default all gt lt taskdef name latex classname de dokutransdata antlatex LaTeX classpath usr share ant lib ant latex jar gt lt target name doLaTeX gt lt latex latexfile ReDHead tex verbose on clean on pdftex on workingDir gt lt target gt lt target name CopyDoc gt lt copy file ReDHead pdf tofile var www ReDHeadFiles ReDHead pdf gt lt target gt lt target name all depends doLaTeX CopyDoc gt lt project gt 100 12
17. it is now on the user centered features Note that this chapter 1 is readable independently of the rest of this documentation and 2 is also contained in the ReDHead plugin itself as user manual which can be reached under Help gt Help Contents gt ReDHead Static Include Analysis The ReDHead plugin is a static source code analysis tool for C projects with the focus on include structure optimization This help provides information about how the plugin is used about its functionality and how results are presented to the C developer The ReDHead Help contains the following sections e Introduction 7 1 e Usage 7 2 e ReDHead Code Analysis Algorithms 7 3 e How ReDHead Analysis Works 7 4 7 1 ReDHead Introduction The ReDHead plugin is a static source code analysis tool for C projects with the focus on include structure optimization The name ReDHead means Refactor Dependencies of Header Files Target is to support a C developer in the task to design the include structure in a given project The programmer is provided with suggestions which will help him in an easy way to improve the include structure of his project Improving the include structure of a big C project is a very tedious and also delicate task which consumes a lot of time The process of improving and cleaning up of the include structure is crucial to maintain the overview in a project and to minimize compile time A very time consuming task is for example find
18. 18 21 B Eclipse Plugin Samples Here you can get insight into how I adapted the Eclipse user interface to my needs Each Eclipse plugin is configured in its plugin aml file which together with the MANI FEST MF is the entry point to integrate itself into Eclipse B 1 UI Menu Integration To integrate the ReDHead functionality into the Eclipse menu one uses several extension points to register action classes which are instantiated when the menu entry is clicked on Then the run method of the action is executed Depending on what kind of menu is extended the interface for the action class is either IWorkbenchWindowActionDelegate or IO0bjectActionDelegate B 1 1 Extending the Main Menu Bar The following listing which belongs to the plugin zml demonstrates how to extend the main menu bar of Eclipse with an example menu bar entry The extension point that is used here is org eclipse ui actionSets lt xml version 1 0 encoding UTF 8 gt lt eclipse version 3 4 gt lt plugin gt lt extension point org eclipse ui actionSets gt lt actionSet label ReDHead Action Set visible false id ReDHeadActionSet gt lt menu label ReDHead Static Analysis id staticAnalysisMenu gt lt separator name StaticAnalysisSeparator gt lt menu gt lt action class ch hsr ifs redhead ui actions FindUnusedIncludesAction icon icons unusedInclude gif id ch hsr ifs redhead actions FindUnusedIncludesAction label Find unus
19. 85 So a better way of adapting the index would be to trigger the process by the user manually with the help of a menu entry Prevent index adaptation for each project Often a CDT Eclipse workspace contains many projects which themselves depend on each other Each of these projects often depend on the same external libraries During the adaptation of a projects indexer as described in 9 1 each project belonging to one project indexes all these external library header independently if they have already been indexed by the indexer of other projects When indexing the whole standard C and boost library s headers more than ten times a lot of unnecessary time will be wasted when waiting for the re indexing task to complete A good thing here would be to have a common indexer which can be shared among projects which contains information about all these external headers which are con tained in every newly created C project The CDT IIndex instances already work in composite way This means that one can by using the static methods of CCorePlugin request one IIndex instance which actually encloses the indexes for several projects If such a common indexer as I suggested exists one could easily achieve that when re questing the index for one project a composite indexer that combines both the indexer of the project and the common one Make index adaptation configurable Instead of letting the indexer index every include directory known to
20. As can be seen in Figure 5 1 a ReDHeadDeclarationReference can yield several DeclarationReferenceDependencies Actually there are several methods available to retrieve a apart of these available dependencies In the Section 5 3 a detailed description is given on what these methods are and why they are necessary The following sections in this chapter describe noteworthy details about the ReDHead data structure that were important to implement the algorithms described in Chapter 4 5 1 Logical and Physical Design The terms logical and physical design have already briefly been mentioned in Section 2 2 Here I will give a more detailed explanation on the two design approaches In the diagram in Figure 5 1 one can see the logical design components on the left hand side and the physical design components on the right hand side Note that com pared to the ReDHead graph the physical design components were extended by the ReDHeadProject and the RedHeadWorkspace Logical design is what most software engineer understand just as design of a soft ware product Logical design components are for example functions classes structs and namespaces They can be used to organize the functionality of a software product so it gets a useful understandable and does not contain redundant code Some of the men tioned components use others This creates coupling between these components which is represented in the ReDHead data structure with the DeclarationReferen
21. Compared to the other graphs that are present in this paper these graphs also show include paths File Declaring A E Ee File Using A 1 N P O gt Declaration Reference gt Include gt Include Path Figure 4 1 Example Include Path Graph Black rectangle vertices depict a file which contains a declaration reference The letter inside of the rectangle is the name of this declaration reference and not the name of the file A brown diamond shaped vertex illustrates a declaration Also here the contained name describes the name of the declaration and not the one of the file Blue solid edges visualize includes whereas red dashed edges depict declaration reference dependencies And lastly the violet dashed dotted edges implie include paths Note that include paths are as already mentioned in Chapter 1 a set of includes So they always go along one or several blue edges In the case of an include path with a length of two or more it also goes on through files as can be seen in the file vertices A and C 22 ou The last part in this chapter Section 4 8 describes how the best position to insert an include is found This part is used by all the algorithms which propose the insertion of include directives 4 1 Finding Unused Includes Eclipse JDT automatically marks all unnecessary Java import statements which are contained in a Java source file The algorithm described here is the C equivalent to th
22. DeclarationReferenceDependencies independent of whether the file containing the ReDHeadDeclaration is included by the one that contains the ReDHeadDeclarationReference In the case of preproces sor symbols as shown in Figure 5 7 this is not the case In the case in which such a DeclarationReferenceDependency would be available the algorithm find unused in cludes would falsely propose to include file other h which would then practically destroy the effect of conditional compilation which of course we do not want other h i 1 define BAR BAR 1 Declaration File i Jimain cpp maln cpp gt Declaration Reference Dependency 1 define FOO Foo 1 l 2 ifdef FOO gt Include Dependency 3 endif 5 endif Figure 5 7 Preprocessor Symbols 5 3 2 Function like Macros The example Figure 5 8 shows the DeclarationReferenceDependencies that are pro duced for function like macros Important here to understand is that since macro ex pansion is textual replacement the call to the AB macro in file DIRECT_CALL h does not create a DeclarationReferenceDependency to the macro s definition which also originates from DIRECT_CALL h Instead these DeclarationReferenceDependencies originate in files containing calls to that macros which is the file main cpp in the exam ple 45 IAB h 1 define AB arg arg CONCAT_CALL h 1 define CONCAT_CALL first second arg 2 first second arg DIRECT_CA
23. Include Path Selection 4 1 2 Algorithm Enhancements Organize Includes Directly Include Referenced Declarations Find Unused Files Static Code Coverage Replace Includes with Forward Declarations 4 6 1 Refactor Towards iosfwd Introduce Redundant Include Guards Finding Optimal Insert Positions Include Substitution 5 ReDhead Data Structure 5 1 Logical and Physical Design 10 10 11 15 16 20 20 20 20 21 22 23 24 27 27 28 30 32 35 36 36 37 39 40 5 2 Declaration References 2 0 0 00 e a 5 3 Declaration Reference Dependencies 2 2 04005 5 3 1 Preprocessor Symbols 0 0000 eee 5 3 2 Function like Macros 0 0 0 0 0 0 0 0 000000000084 93 9 gt Lemplates 2 20 sea Geek bk Se Bee Pe a OE ae Ped PE ed 5 Alm CLOGS eropa a Got p ore beet eae e ee es ae eh DE we Se 5 5 Inclide Paths wie a he de ee ole Ge ee bh a na ra Rae Goeth Implementation 6 1 Used CDT Functionality 0 2 0 220200 OI AST arr Aara ak fm pares caf ar oi Paai cat tbat a Ged sin psc 62 Indexen oo ees cele ger Ge Se GAS a a Be he Se i d 6 13 INIMES A a 802 Mot Stee Se oa ee a es de ae BSR Oe See 6 2 ReDHead Data Structure 2 0 00000 eee ee 62 1 Data Storessa e ied be as bOI Se eee ak oe p es 6 2 2 CleanUp sunk eas foe ae heel a ae Re ee Ee 8 6 3 Optimization Algorithms 0 000000 2000 6 4 UI Integration into Eclipse 0 02 00 0002 0008
24. Includes e Directly Include Referenced Declarations e Find Unused Files e Static Code Coverage 54 6 4 UI Integration into Eclipse In this section the classes contained in the package redhead ui of Figure 6 1 are in troduced For a user centered description of the ReDHead interface components read Section 7 2 To add user interface components such as menu entries and markers to Eclipse one uses Eclipse s extension points by adding XML extension tags to the plugin xml of the ReDhead plugin ReDHead uses the following extension points e org eclipse help toc to add the ReDHead user manual to the Eclipse help e org eclipse cdt codan core checkers to use the CDT s Codan framework e org eclipse ut actionSets to add menu entries to the main menu bar of Eclipse e org eclipse ut bindings and org eclipse ui commands to bind the shortcut ctrl shift o to the auto organize includes algorithm menu entry e org eclipse ut perspectiveExtensions to bind the ReDhead Static Analysis main menu bar additions to the CDT perspective e org eclipse ut popupMenus to add pop up sub menus to the C Project Project Explorer and Navigator view of Eclipse e org eclipse ut editors annotation Types and org eclipse ui editors marker Annotation Specification To give the ReDheadMarkers a custom appearance e org eclipse cdt ui quickFixProcessors to register the ReDHeadQuickFixProcessor which links ReDHeadQuickFizes to the ReDHeadMarkers e org
25. Point availableIncludePath pickedIncludeList Additional Information List 1 2 3 4 5 6 7 it i In pickMandatoryIncludes include C gets selected since the target file of include path 6 G can only be reached through C include path 7 thus gets removed from availableIncludePathList 1 2 3 4 5 6 C Penalties A 2 3 2 1 25 B 2 4 6 D 2 1 1 3 E 2 2 1 4 B is worst so the include paths 5 and 6 get removed 1 2 3 4 C None of the includes A B D or E is mandatory required so nothing is picked 1 2 3 4 0 Penalties A 2 3 2 1 25 D 2 1 1 3 E 2 2 1 4 E is worst so the include path 4 get removed 1 2 3 0 In pickMandatoryIncludes include A gets selected since the target file of include path 3 H can only be reached through A include paths 1 2 and 3 gets removed from availableIncludePathList because all their target files D D and H are reachable through A C A The iteration is stopped here since availableIncludePathList is empty 26 11 14 4 1 2 Algorithm Enhancements When running this algorithm on a real C source file the file contains quite a lot of declaration references which thus also often yield a lot of duplicate include paths The number of declaration references and include paths handled by the find unused includes algorithm can easily go into the thousands Thus grouping both of them already on construction so that there are no duplicates d
26. S32 if alarm_on while current_floor floor amp amp 43 C33 Elevator go floor 48 E34 main int argc char argv current_floor lt top_floor 44 F 49 Elevator e_ptr add current_floor 1 45 protected 50 S35 if argv 1 else 46 int alarm_on 51 S36 e_ptr new AlarmElevator 10 while current_floor floor amp amp 47 p 52 else current_floor gt 0 53 S37 e_ptr new Elevator 10 add current_floor 1 54 C38 e_ptr gt go 5 55 S39 cout lt lt nCurrently on floor private lt lt e_ptr gt which_floor lt lt n add int amp a const int amp b 56 a a b protected int current_floor Direction current_direction int top_floor Ts Figure 2 7 Comparison SDG and Code for Figure 2 6 19 3 Used CDT and Eclipse Components This chapter describes the environment of the ReDHead plugin in detail There are reasons given for the choice of the IDE and the used software components on which ReDHead will be built upon 3 1 Eclipse CDT The plugin that will be developed in the scope of the ReDHead project will be an Eclipse plugin to be used together with C IDE Eclipse CDT CDT contains a yet still almost empty static analysis framework called Codan which the ReDHead plugin can hook itself into Also CDT provides access to AST structures for C code Furthermore there is the CDT indexer which provides indispensable help when resolving declaration references
27. a current suggestion Type incomplete type names should be resolvable When opening an empty C file and typing mytype t the organize includes algo rithm is not capable to help because the name of the type mytype is incomplete only the code in one of the following listings can make use of the organize includes algorithm using namespace mynamespace mytype t mynamespace2 mytype t It would be very nice if ReDHead was capable of proposing one out of several includes to one out of all types which map the incomplete name mytype The crux here is that the CDT indexer will probably help in solving this problem User problem feedback During traversal of the ReDHead data structure by any ReDHead algorithm there are some kind expected problems which might occur This can for example be includes or declaration references that cannot be resolved to the target file or the declaration Such problems are at the moment reported to the user with the help of the IStatus logging support of Eclipse In the case of such problems after the algorithm finished running a pop up dialog is presented by Eclipse which contains problem messages which were constructed by ReDHead In the case when algorithm results are presented in the ReDHeadSuggestionDialog this Eclipse pop up dialog is only shown after the suggestion dialog has been closed again This is of course suboptimal since the user should get notified about these poten tial problem
28. a project the directories of which all files are added to the indexer should be configurable by the user This would help shorten the adaptation time since the header file for example of the C standard library are often included several time in different version whereof of course only the newest is used Markers should be clickable As described several times before the user needs to press ctri 1 to show available quickfixes for a ReDHead marker It is not possible to just click with the left mouse button to see the available quickfixes Since for makers in the Java editor this is possible it should also be available for ReDHead markers pragma define and undef directives pragma define and undef statements in file A can in certain situations have an impact on all include files which are included after such a statement The bad thing here is this can not only be headers which are included after the statement under consideration in file A itself but also file B which contain an include to A or files that are included into B after the include to A The problem that arises here is that it is practically impossible to decide if adding or removing an include to such a file A has any impact on other files or not A valuable 86 enhancement for ReDHead would be to inform the user about this possible impacts by pointing out that there might be unwanted side effects when adding or removing a certain include directive as proposed by
29. and then executing its tests The crux here is that my plugins contain dependencies to Eclipse and Eclipse CDT components Running tests in Eclipse is easy and can be done through Run AS JUnit Plugin Test This sets up a complete new Eclipse instance which includes all the plugins in the current workspace runs the Eclipse instance and starts the tests through the Eclipse Ant Runner in that instance The results of the test run is shown in the JUnit view of the invoking Eclipse instance To mimic exactly this behavior on the server requires several steps 1 2 5 6 Starting a virtual display which is required to launch the tests Set up a complete new Eclipse instance which contains all required CDT plugins Build the ReDHead plugins while resolving the plugins dependencies against the Eclipse instance set up in the previous step Install the ReDHead plugins into the Eclipse instance Launch the ReDHead tests Stop the virtual display After the execution of all of these steps the result of the launched tests can be recycled by Hudson and presented on the Hudson build server page 94 A 3 1 Build Scripts The ReDHead build environment is fully contained in the ReDHead repository in the folder build which is shown in Figure A 1 This brings the big advantage that after the repository has been checked out on any computer the automated build can be launched without any additional configuration
30. are the blue arrows which lead to A B C D E and K In a second step the algorithm collects all declaration references contained in the given file pseudo code line 3 In the example these are the declaration references D H and G For each 23 declaration reference it finds the correlating declaration and the include paths which leads from the active file to the file containing the declaration pseudo code line 4 In the example graph these are all the include paths in the graph Note here that sometimes it is possible that there is more than one include path that leads to the file containing the declaration as can be seen in the example graph for declaration H Also note that the resulting list availableIncludePathsList in most cases contains more include paths than required So proposing only the include to K to be removed in the example graph would not be accurate The next step is to find all required includes by filtering the list availableInclude PathList line 6 or rather their first path elements This filtering process should only select all these includes in the list availableIncludePathList so that all files which contain used declarations are still reachable at least once This filtering process is rather complex and is thus described in detail in the following subsection 4 1 1 The following step is to remove all required includes form the list includesList which yields a list of remaining includes which are all these that ar
31. availableIncludePathsList the worst one is selected Then all the paths in availableIncludePathsList which start with that element are removed It is now possible that some of the remaining first elements became mandatory So the process of pickMandatoryIncludes and removeWorstChoice is repeated until the availableIncludePathsList is empty The tricky part here which makes the find unused includes algorithm interesting is the decision of determining which one of the remaining includes is the worst one The question that arises here is the one of what makes includes good or bad To measure goodness or badness I use two characteristics of an include The first characteristic is the amount of target files that are reachable through a given include In the example Figure 4 2 the include to A can reach both D and H whereas through the include to E only H can be reached The second characteristic that is used is the amount of all recursively included files The include to E includes two files E H whereas the one to B includes four B F I H Using the two characteristics in the following formula a penalty is calculated for each remaining include The include with the biggest penalty is the one that is selected for removal in removeWorstChoice penalty 2 amountRecusivelyIncludedFiles amountT argetFilesReachable Important to see is that the selection of the include to remove is a heuristic approach The penalty calculation abo
32. be found in Section 5 1 As can be deduced from the list above also the C SDG is not the optimal ab stract code representation to perform static include analysis The following Section 2 3 introduces a graph which is in my point of view optimal to perform static include analysis 2 3 ReDHead Graph This section will introduce the ReDHead graph which is designed to hold all the infor mation that is required to provide the ability to perform static include analysis in a fast and intuitive way The introduction to the ReDHead graph will be accompanied with example graphs that represent the same C code that was already present in the SDGs in Section 2 2 A ReDHead graph basically consists of 5 different elements whereof three are ver tices and two are directed edges A black rectangle represents a source or header file vertex These can be connected with blue solid edges which depict include directives Green rectangles represent declaration vertices Declaration reference are represented by vertices which are red ovals And lastly red dashed edges depict declaration reference dependencies The label of a vertex can be interpreted as following The first part is the name of the correlating declaration or declaration reference In the case of many declaration references with long names the names are replaced with DR which is simply short for Declaration Reference The number following the name of the vertex represents the 15 code
33. before he decides to apply all quickfixes directly in the suggestion dialog A helpful enhancement would be to integrate the created problem messages into the ReDHeadSuggestionDialog Configurability of include substitution The include substitution feature which is described in 4 9 right now only works for the standard C library s headers Since this feature would also be useful for the headers of other libraries it should be adaptable by the user in a ReDHead properties page Add support for polymorphic member function calls to static code coverage The ReDHead data structure does not yield any declaration reference dependencies when resolving a declaration reference in file F1 of a member function M1 which is declared 87 virtual in class C1 to an overridden member function in a deriving class C2 On first sight this looks like a bug It is not though since when for example determining which header files need to be included into F1 it would be wrong to propose an include to the file containing C2 This now yields the problem that the static code coverage algorithm does not mark all overriding member function of M as used but rather only the member function M of the base class I see two approaches to solve this problem The first is to make the ReDhead data structure configurable so it can handle both required situations by yielding declaration reference dependencies to overloading member function for the static code coverage al
34. eclipse core resources markers to define own ReDHead marker types Concrete examples on how these extension points are used can be found in Appendix B In the previous section the ReDHeadOptimizer was introduced This component is used by the ReDHeadOptimizationRunner which is invoked through a ReDHeadAction The ReDHeadAction is the element which was registered by the extension points org eclipse ui actionSets and org eclipse ui popupMenus and is thus bound to the registered menu items which are also registered through these extension points The ReDHeadAction is responsible to find the starting point for the ReDHeadAlgorithms to run This starting point is either a ReDHeadProject or a ReDHeadFile Whenever the ReDHeadOptimizationRunner is triggered through a click on a menu en try and thus a ReDHeadAction it shows the ReDHeadSuggestions produced by the ReD Head algorithms in Eclipse This is either done by displaying the ReDHeadSuggestion Dialog or by adding markers to the CDT editor The ReDHeadSuggestionDialog is introduced in 7 2 If the ReDHeadSuggestionDialog is used or not can be configured by the user To find out more about this read 7 2 55 When the user requests quickfixes for any ReDHeadMarker the ReDHeadQuickFiz Processor is called which returns the ReDHeadQuickFix that can solve the suggestion indicated by a ReDHeadMarker Note that the ReDHead annotations which are introduced in 7 2 and are produced by the static code c
35. example which contains a wide range of such features which make the life of a software engineer much more comfortable These features help a software engineer to procure high quality code in the shortest possible time period Here an incomplete list of features being provided by Eclipse JDT that rely on static code analysis e Refactoring see Fow99 e Open declaration e Auto completion of code while typing e Quickfixes for errors like Implement method Add field Create class Surround with try catch Change type of variable Change method signature e Organize imports e Open type hierarchy e Open call hierarchy e Outline view e Show syntax error while typing e etc 1 6 Feasibility Study In the term project which preceded this master thesis there was a feasibility study done on the topic static include analysis for C Here I would like to refer to the documentation Fel09 of the term project Chapter 2 where the documentation of the feasibility study is found As can be guessed since the project was extended to this master thesis the result of the study was that in general a static include analysis for C is implementable The task however is far from trivial and there are still some minor problems that needs to be approached 1 7 Document Overview Before starting the implementation of ReDHead there was a lot of thinking about what basic structure is required to even be able to run static
36. favorite choice The features already contained in CDT provide a good code base for a static include analysis plugin 1 4 Important Terms In this paper I often talk about static include dependency analysis Let us find out what this exactly means when considering a given C project In the source code one uses C identifiers which here are referred to as declaration references This can for example be the name of a called function a type or a variable These declaration references refer to declarations that can be found somewhere in the current file itself or in any included header file Static include dependency analysis means to find such dependencies and use this information to optimize a project s include structure in any way To better understand what the declaration and declaration reference term means have a look at Figure 1 1 a h i a H Declaration reference dependency m class A Include dependency public H inta lt i b h w include a h a cpp ae SD include a h include b h Figure 1 1 Declaration Reference Illustration Note that text highlighted in green are declarations whereas red text represents a declaration references Red dashed lines implies a dependency between a declaration reference and a declaration which from now on will be called a declaration reference dependency Here I would also like
37. function call type or variable name IDE Integrated Development Environment Include Guard This term denotes conditional inclusion like ifdef ifndef that guard include directives JDT Java Development Tooling Eclb Quickfix A quickfix is the Eclipse construct that comes along with a problem marker and can be applied to quickly solve a given problem denoted by the problem marker at hand Type Type in C means a class struct template or fundamental type VS Visual Studio Mica statements SDG System Dependence Graph HRB90 PDG Program Dependence Graph HRB90 Graph A graphic depicting the relationship between vertices Edge A relation visualization on a graph Vertex A item contained in a graph 117 Physical Design see page 12 in Lak96 Logical Design see page 12 in Lak96 GCC Gnu Compiler Collection Eclipse PDE Eclipse Plugin Development Environment 118 Bibliography ADS93 Apaa Apab BBC 10 BW88 CFS06 cod09 Dial Ecla Eclb Edg Fel09 Fow Fow99 GHJV97 Hiralal Agrawal Richard A Demillo and Eugene H Spafford Debugging with dynamic slicing and backtracking Software Practice and Experience 23 589 616 1993 The Apache Ant Project http ant apache org Apache Ant Homepage The Apache Software Foundation http httpd apache org Learning the Java Language Al Bessey Ken Block Ben Chelf Andy Chou Bryan Fulton Seth Hallem Charles Henri G
38. include analysis on The out come of this can be found in Chapter 2 Chapter 3 describes what choices I made about how to engage the development of the ReDHead plugin Chapter 4 describes the static include analysis algorithms which will help a software engineer when looking at a C project s include structure Note that not all the introduced algorithms have been imple mented yet The basic data structure provided by ReDHead which is used by the static include analysis algorithms is described in detail in Chapter 5 The following Chapter 6 introduces details about the ReDHead plugin s implementation and its components Chapter 7 describes the ReDHead features in a user centered aspect A list of several other tools which also cope with C static analysis is available in Chapter 8 Infor mation about tasks that were especially hard can be found in Chapter 9 In Chapter 10 there is information about plugin features which could not yet be implemented Ap pendix A contains information about the continuous integration server setup followed by useful Eclipse plugin example code in Appendix B Organizational information about the ReDHead project can be found in Appendix C 2 Abstraction Concept for C Source Code Obviously it is not enough to just textually parse source code to do static analysis It is clear that to effectively perform static analysis an abstract representation of the elements which are contained in the C C programmin
39. line number in which the vertex origins The code segments lines are numbered in green on the left hand side of the source code In the case of several declaration references on the same line the line number is followed by a _ and in index number which indicates the position of the declaration reference on that line As an example let us assume there is a document with the code i j on line 15 and a graph containing the vertex labeled DR15_2 Then the vertex depicts the 2nd declaration reference which originates from code line 15 which is j Note that a file contains edge has been omitted since one can plainly see that all the declaration and declaration reference vertices would have one leading to the file vertex that contains them Also note that each declaration and declaration reference vertex correlates exactly to one node of an AST This is very helpful because this will allow to continue the traversal of the ReDHead graph into the AST and also the other way around Figure 2 5 shows the ReDHead graph that correlates to the SDG from Figure 2 1 which is also contained in the Figure 2 5 To let the graphs be compared easily the vertices in the ReDHead graphs are arranged as similar as possible to the arrangement of the SDG Obviously in the SDG there are no vertices for variable definition whereas in the ReDHead graph there are Some nodes in the SDG are represented in the ReDHead graph as several declaration reference These de
40. my RSS feed reader A 2 ReDHead Project Server The build server used in the ReDHead project is a virtual server provided by HSR The following sections describe the software component which are running on the server and provide some notes on the setup A 2 1 Git To maintain the ReDHead repository Git Git is running on the server Setting up Git on the remote side is easy sudo apt get install git core mkdir ReDHead git cd ReDHead git git bare init RAPA A A To initially check out the repository on the client one does sudo apt get install git core git clone lt username gt lt servername gt ReDHead git 90 username is the username of the user on the server in which s home directory the repos itory was created and servername is the ip address or the DNS name of the server A 2 2 Hudson To automate the build of my plugins my documentation and running the tests on the server I use the Hudson continuous integration server Hudson is configurable on its webpage which makes its use very simple It supports automated Git checkout and running of Ant and shell scripts These scripts that are called are responsible to build the ReDHead plugin to run its tests and to deliver the results back to Hudson Installing Hudson is simple sudo echo ndeb http hudson ci org debian binary gt gt etc apt sources list sudo apt get update sudo apt get inst
41. of visu alization is to add makers to CDT editor instances for optimization suggestions and let the user choose how to go on from there The user then can decide if he wants to solve the highlighted problem by applying a proposed quickfix In Eclipse these proposals can be activated by the user when pressing ctri 1 In certain cases only adding markers will not be a favorable solution For such cases an other presentation possibility was found In the ReDHead plugin this is a dialog which lets the user decide on what to do with optimization suggestions 21 4 Dependency Optimization Algorithms In this section I describe static include analysis algorithms which are considerd useful for a C software engineer These algorithms are the main features of the ReDHead plugin They are based on the ReDHead data structure described in Chapter 5 which provides them with the basic functionality to perform static include analysis The following sections contain a pseudo code implementation These pseudo code listings give a good overview of the real algorithm implementations They are not a complete representation of the functionality since this would bloat the pseudo code to much But they should give a good idea on how an algorithm basically works Figure 4 1 shows a simplified version of the ReDHead graph as described in Chapter 2 These kind of figures will be used to support the introduction of the algorithms described in the following sections
42. possible at all 9 2 Synchronism of CDT Index and AST The CDT Codan framework triggers implemented checkers while the user is typing The AST which is passed to the checker to be examined is created with the current content of the open editor This means that the contained node is up to date even if the file that is edited has not yet been saved The problem which here arises is that changes on files which have not been saved are also not yet represented in the CDT indexer This results in the fact that whenever a codan checker makes use of the CDT indexer the information that it contains might be outdated The state of the AST and the indexer thus is not necessarily the same during the Codan checker s work This can for example result in a list of includes returned by the indexer which is bigger or smaller than the list which is returned when accessing the same list trough the AST The result of this effect is that static analysis which is done by also using the index instead of only the AST can yield results which are totally wrong To approach this issue I reported the problem as a bug of CDT Sadly the bug ticket has already been closed as WONTFIX with hint that one must consider the possibility that node position returned by the index might not be up to date and thus the node positions of the AST should be used because they are up to date The possibility of added or removed nodes has somehow skipped the CDT developers attention
43. project int main Static Code Coverage X Delete Static Code Coverage Annotations j Show Analysis Result In Dialog 3 Remove all Static Analysis Markers 60 Or through the pop up menu accompanying a source or header file or a C project in either the C Explorer the Project Explorer or the Navigator Eclipse view C C TestProjec File Edit Source Refactor Navigate Search Project ReDHead Static Analysis Run Window Help ray a a a By amp B OQ ow Fact Projects 33 gt F main cpp X N include lt vec Auto Organize Includes e Organize Includes amp TestProject New gt Directly include referenced declarations D ppl Includes Find unused includes gt e main cpp Go Into Find unused includes in project ReDHead Static Analysis gt 4B Find unused files in project Configure gt Static Code Coverage Properties Alt Enter Delete Static Code Coverage Annotations 3 Remove all Static Analysis Markers As can be seen in the sample screenshot some of the algorithms can not be started on a project These can only be triggered if the focus is either on a source or header file in the C Explorer Project Explorer or Navigator or inside of an open C editor 61 Visualization of ReDHead Suggestions As already mentioned ReDHead algorithms produce suggestions that provide informa tion on the given code These suggestio
44. run table which follows the notation global main cpp can be read as declaration of global in file main cpp Note that each table entry represents an iteration 33 Execution Point of the outer loop in pseudo code line 8 The declaration variable in line 8 is always the first element shown in column activeDeclarationList This means that all the declarations that have already been iterated over are not shown in the column anymore activeDeclaration List main main cpp global main cpp global main cpp A A h fool A h A A cpp A A h fool A h A A cpp bar2 bar h fool A h A A cpp bar2 bar h A A cpp bar2 bar h fool A cpp bar2 bar h fool A cpp fool A cpp bar2 bar cpp bar2 bar cpp unused DclarationList A A h foo 1 A h foo2 A h A A cpp fool A cpp foo2 A cpp bar1 bar h bar1 bar cp foo2 A h barl bar h bar1l bar cp foo2 A h bar1 bar h bar2 bar cp foo2 A h barl bar h bar2 bar cp foo2 A h bar1 bar cp foo2 A h bar1 bar cp foo2 A h bar1 bar cp foo2 A h bar1 bar cp foo2 A h bar2 bar h p bar2 bar cpp 0o1 A cpp foo2 A cpp bar2 bar h p bar2 bar cpp 001 A cpp foo2 A cpp bar1 bar cpp p 001 A cpp foo2 A cpp barl bar cpp p 002 A cpp barl bar h p bar2 bar cpp 002 A cpp barl bar h p bar2 bar cpp 002 A Cpp p barl bar h 002 A Cpp p barl bar h 002 A cpp barl bar
45. types descriptions will make use of the graph in the following image on the right hand side On the left hand side the C source code that produced the graph is given Element labels in the graph that end with square brackets which contain a number e g 7 indicate that the element belongs to this source code line Door h Window h 1 class Door 1 class Window 2 oe ss aT A House h H 1 include Door h ouse h 2 include Window h l 3 cl H 7 o Dej e 5 Door getDoor return door 6 private 7 Door door 8 Window w1 w2 i C Window oe 9 include House h 1 3 House h1 4 Door d1 h1 getDoor 5 return 0 i getDoor 4 6 o _ File A file represents the physical file that is present on the file system In the graph im age above it is represented as a box The file elements provide access to the include dependency and declaration reference elements Include Dependency An include dependency represents an include directive that is a link between an including file and an included file It is represented in the graph as a solid blue arrow The element grants access to both the including and the included file and to the include path element which is described bellow Declaration A declaration element represents a normal C declaration class forward declaration variable declaration typedef class declaration etc Note that a C definition is
46. 100118 lt jar ws gtk arch x86 os linux application org eclipse ant core antRunner stop virtual X display if f usr bin Xvfb then startXvfb sh stop fi The important parts of the script are lines 11 23 26 31 On line 11 the virtual display Xvfb is started Line 23 extracts the Eclipse instance which already contains all required CDT plugins On line 26 the Equinox Ant runner which is contained in the previously extracted Eclipse instance is launched The Ant runner will run the build xml which takes care of the remaining build tasks Here launching a normal Ant runner would not be enough since the Equinox Ant runner provides parameters and tasks which are required to build the ReDHead plugins and to run its tests Line 31 shuts down the virtual display again 96 12 18 21 The script which is responsible to start and stop the virtual display Xvfb is shown in the following listing bin bash start Xvfb 2 nolisten tcp shmem gt gt tmp Xvfb out 2 gt amp 1 amp RETVAL echo return RETVAL stop killall Xvfb return 0 J case i in start start 3 stop stop 3 esac The build process which is launched through the build xml is an Eclipse PDE build process that is capable of building plugins and running tests with minimal configuration effort Since Eclipse products are always distributed as features which are a set of Eclipse Plugin
47. 6_out a_out a data flow edge add int amp a const int amp b F2_in current_direction current_direction_in F7_in b b_in member function call edge a a b F2_out current_direction_out current_direction Al_in a_in current_floor aca paramere in ae fs arameter out edge i d F3_in top_floor top_floor_in Al_out current_floor a_out P i protected ae gt transitive data flow edge litcimrent flooe F3_out top_floor_out top_floor A2_in b_in 1 Diea iren Tienin F4_in l_top_floor l_top_floor_in A3_in b_in 1 AF y w a 3 F5_in floor floor_in int top_floor ee Figure 2 2 Object Oriented System Dependence Graph LH96 Note that the next two graphs contain part of their preceding graphs which means that one must also take into account the code which accompanies these preceding graphs and their the codes line labels to understand the meaning of the current graph Compared to the graph in Figure 2 1 this graph also contains member function edges which are represented ad bold dashed edges which connect the class vertex CE1 to its member function vertices e g E7 12 Shown in Figure 2 3 is the class AlarmElevator which inherits from Elevator O i K TZ oD Paes D gt CE23 class AlarmElevator public Elevator public E24 AlarmElevator int top_floor C25 Elevator top_floor S26 alarm_on 0 E27 void set_alarm Key for Parameter Vertices S2
48. 8 alarm_on 1 Flin current_floor current_floor_in F8_out alarm_on_out alarm_on E29 void reset_alarm Fl_out current_floor_out current_floor Al_in a_in current_floor S30 alarm_on 0 F2_in current_direction current_direction_in A1_out current_floor a_out E31 void go int floor F2_out current_direction_out current_direction A2_in b_in 1 S32 if alarm_on F3_in top_floor top_floor_in A3_in b_in 1 C33 Elevator g0 Floor F3_out top_floor_out top_floor A4_in current_floor_in current_floor F4 in 1_top_floor l_top_floor_in A4_out current_floor current_floor_out hs F5_in floor floor_in A5_in current_direction_in current_direction protected F6_in a a_in A5_out current_direction current_direction_out int alarm_on F6_out a_out a A6_in top_floor_in top_floor F7_in b b_in A6_out top_floor top_floor_out Ji F8_in alarm_on alarm_on_in A8_in _top_floor_in top_floor Figure 2 3 Object Oriented Polymorphic System Dependence Graph LH96 The AlarmElevator s elements are placed in the graph above the horizontal dashed and waved line whereas the base class elements are located bellow This line is crossed in two cases The first is by member function edges since the deriving class inherits these member functions The second case is by member function call edges in the case of overloaded member functions when calling their super implementation In the following Figure 2 4 th
49. 9 135 1983 120
50. 900 working hours in the master thesis and 400 hours in the term project The Eclipse plugin ReDHead which s development was started during the term project adds static includes analysis features to Eclipse CDT This is done in the hope that de veloping C will get more comfortable an less complex The overall aim is to help clean up the include structure of C projects while also reducing compile time To read and understand this paper knowledge about C C Str97 and software engineering is required I will not give any introduction into C C or any common software engineering concepts To comprehend the Continuous Integration Setup Ap pendix A of this paper basic knowledge about the Linux operating systems and Ant Apaa is recommended 1 1 Importance of IDE Features When developing C source code one can choose from a variety of integrated devel opment environments short IDE which try to support developers in doing so All of these IDEs provide support for syntax highlighting Many of them provide enhanced searching features But often additional features that go beyond this range are rare Every feature that helps a developer to find information he is looking for in existing code speeds up software development and thus decreases the complexity of analyzing understanding and enhancing code So the more of these features an IDE provides the more helpful it becomes in the hands of a capable software engineer which instead o
51. DeclarationRefDependency declarationRef declaration findDeclaration declarationRefDependency includePathList findIncludePathsToDeclaration declarationRef declaration if isEmpty includePathList addToList filesToIncludeList fileOf declaration J proposeToInclude filesToIncludeList runFindUnusedIncludesAlgorithm presentFile The pseudo code implementation should be understandable easy enough First all declaration references are collected Then for each of these declaration references first the declaration reference dependency and then the declaration is found and the include paths are retrieved that lead form the declaration reference to the declaration Should the list of retrieved include paths be empty an include to the file that contains the declaration 27 is suggested to be added The last pseudo code line triggers the find unused includes algorithm which is as already mentioned part of the organize includes algorithm The graph in Figure 4 4 illustrates a simple example to organize includes File Defining Type A File Using Type A Declaration Reference Dependency Include Include Path Figure 4 4 Organize Includes Example Graph The declaration references which are gathered in pseudo code line 1 are for the exam ple C D and F When finding include paths in pseudo code line 7 the retrieved list for C is empty and a suggestion for an include to the file containing the declaration
52. E HSR ansas gt z HOCHSCHULE F R TECHNIK INSTITUTE RAPPERSWIL FOR E E _ 0 SOFTWARE HSR University of Applied Sciences Rapperswil Institute for Software Master Thesis ReDHead Refactor Dependencies of C C Header Files Lukas Felber lfelber hsr ch http redhead ifs hsr ch supervised by Prof Peter Sommerlad July 2010 Abstract Even though C belongs to the most widely spread programming languages and is used in many different areas very effectively and also very often it has long ago been outperformed by other languages most notably Java in terms of the power of IDEs and their features There are C IDEs which provide a limited support for some features like for example refactoring But these cannot even come close to what for example Eclipse s Java Development Tools provide In the scope of this master thesis the ReDHead tool is developed which adds addi tional features to the C IDE Eclipse CDT These features provide functionality to statically analyze the include dependencies of C header files and provide suggestions and hints on how the include structure of a C software project can be optimized The aim of these optimizations is to 1 improve code quality 2 reduce code coupling 3 lower compile time and 4 improve the speed of the development process of C software Real C projects often span millions of lines of code which are distributed over several hundred source and head
53. LL h 1 define DIRECT_CALL arg 2 AB arg main cpp 1 include AB h 2 include CONCAT_CALL h l l 3 include DIRECT CALLh I 4 int main y EA 5 int i CONCAT_CALL A B 5 6 inti2 DIRECT_CALL 6 Z Caen 7 inti3 AB 7 8 return 0 Figure 5 8 Function Style Macros 5 3 3 Templates Figure 5 9 demonstrates that ReDHeadDeclarationReferences which originate from template functions or template classes often yield several DeclarationReferenceDep endencies to any ReDHeadDeclaration which might match Important here is that there is no determination if all these ReDHeadDeclarations are really required eventu ally In certain cases this could result in false positives in proposition made by some ReDHead algorithms Jibar h 1 void bar int x 1 void bar float x Declaration reference gt Declaration Reference Dependency main cpp gt Include Dependency limain cpp 1 include bar h 2 template lt typename T gt 3 void foo T x 4 return bar x 5 Figure 5 9 C Templates Example 5 4 Includes Instead of introducing an own class that represents include relations ReDHead uses the existing class of CDT to represent include relations since the required information about the include relation can be acquired there The class in Figure 5 1 which is labeled IIncludeStatement does not exists with that name The reason to call it like this is 46 t
54. List sourceFile Additional Information Iteration 1 A cpp B cpp C cpp A h B h A cpp In the current iteration the files A cpp C h D h E h F h G h H h Ih A h B h E h H h Lh get removed J h K h from unusedFilesList Iteration 2 B cpp C cpp C h D h F h G h B cpp In the current iteration the files B cpp J h K h B h I h H h get removed from unusedFilesList Iteration 3 C cpp C h D h F h G h J h C cpp In this iteration the files C cpp K h get K h removed from unusedFilesList The remaining files in the list unusedFilesList C h D h F h G h J h are marked as unused 31 4 5 Static Code Coverage As already described in Section 4 3 there are often declaration included which are not all used The aim of the static code coverage algorithm is to highlight which parts of a project s code is used and which ones are unused To achieve this goal one searches all declarations in a project which are active Then one finds all used declarations by traversing the ReDHead graph along all the declaration reference dependency edges starting with the previously found active declarations Note that such a traversal of the ReDHead graph is very similar to the slicing process A slicing process as introduced in detail in HRB90 is performed on a graph similar to the SDG introduced in Section 2 2 The process performed here in the static code coverage algorithm is as mentioned only similar A normal prog
55. Plugin 6 1 3 INames Instances of CDT s IName represent C identifiers in C code In CDT there are two basic subtypes of IName These are IASTName and IIndexName The first one can be found by traversing ASTs the second one by querying the CDT indexer ReDHead uses INames when producing the ReDHead data structure 6 2 ReDHead Data Structure This section will describe internal implementation details of the ReDHead data structure For a detailed description of the functionality provided by the ReDHead data structure I refer to Chapter 5 Figure 6 2 shows the ReDHead data structure s starting components for the ReDHead algorithms i iWorkspace ReDHeadProject ICProject ReDHeadFile IFile Figure 6 2 Basic ReDHead Data Structure I decided to implement my own file project and workspace classes to avoid a lot of helpers and wrappers The reason for this decision are experiences of several other Eclipse plugin projects I worked on that also used existing classes which could not be adapted or extended directly These are namely Ruby Refactoring CFS06 and CDT C Refactoring Ins 49 Because I work with Eclipse and CDT the ReDHead data structure is based on the resource classes provided from Eclipse and CDT as shown in Figure 6 2 The ReDHead resource classes add additional functionality which is used by the ReDHead algorithms In order to be able to optimize static include analysi
56. PreprocessorFunctionStyleMacroDefinition or IASTPrepro cessorlfdefStatement contain a getter method for its IASTName the accept method which would be responsible to initiate a visitor to visit the visitName method with that name is not even overloaded It gets even more complicated when it comes to instances of IASTPreprocessorIf Statement and IASTPreprocessorElifStatement These instances are not able to return any contained IASTNames directly The only way I could figure out to get all required IASTNames out of these instances is shown in the following code listing 80 ou 11 14 17 private static void addAl1ChildNamesHack IASTPreprocessorStatement node IASTNodeSelector selector node getTranslationUnit getNodeSelector null int curOffset node getFileLocation getNodeOffset int endOffset curOffset node getFileLocation getNodeLength IASTName name null do name selector findFirstContainedName curOffset endOffset curOffset if mame null foundName name if name getParent instanceof IASTPreprocessorMacroExpansion IASTPreprocessorMacroExpansion expansion lt gt IASTPreprocessorMacroExpansion name getParent for IASTName nestedName expansion getNestedMacroReferences foundName nestedName curOffset name getFileLocation getNodeOffset name getFileLocation getNodeLength while name null I think it is obvious that this
57. RedoOperation Change change IUndoableOperation operation new ChangeOperation change IWorkbench workbench PlatformUl getWorkbench getActiveWorkbenchWindow lt getWorkbench IOperationHistory operationHistory workbench getOperationSupport lt getOperationHistory operation addContext I0perationHistory GLOBAL UNDO CONTEXT try operationHistory execute operation null null catch ExecutionException e throw new ReDHeadException e 109 C Organizational In this chapter you can find organizational information about the ReDHead project You will find a list of tools I used followed by some information about the project plan and the time schedule The last section contains my personal impression C 1 Project Environment The environment consists of two parts This is on the one hand my notebook where I develop and document on and on the other hand the project server which is used to present the project to the public to maintain the project repository and to build and test my code and documentation C 1 1 Development Environment The following list describes all the components which were used to develop the ReDHead plugin and to write this documentation Ubuntu As development platform I used an Ubuntu 9 10 code name Karmic Koala Ubu Git Git Git version 1 6 3 3 serves as the repository for ReDHead so all changes that are done are loged an trackable Eclipse As developme
58. Setup A 1 Continuous Integration Introduction 000 A 2 ReDHead Project Server An QA Git de ig Sear la k A 2 2 Hudson A223 Trie e urio A 2 4 Apache Configuration A 3 Automated Building of the ReDHead Plugin and Its Tests A 3 1 Build Scripts A 4 Automated Build of the Documentation 000 B Eclipse Plugin Samples B 1 UI Menu Integration B 1 1 Extending the Main Menu Bar B 1 2 Extending the Navigator s Pop up Menu B 2 Example Problem Marker B 2 1 Customized Markers B 3 Example Quickfix B 4 Example Codan Checker B 5 Undo Redo Operations C Organizational C 1 Project Environment C 1 1 Development Environment 2 0 4 C 1 2 ReDHead Build Server 0 2200 4 C 2 Project Plan C 3 Time Schedule C 4 CDT Bug Tickets C 5 Personal Impression C 6 Changes Since Term Project D Nomenclature 83 83 88 88 88 88 89 90 90 90 90 91 91 92 94 95 100 101 101 101 102 104 105 106 108 109 110 110 110 111 111 113 114 114 116 117 1 Introduction This paper was written within the scope of my master thesis at University for Applied Science in Rapperswil Switzerland The ReDHead project was initially started as a term project Fel09 and then continued as this master thesis The project covers a time period of about
59. additional filtering criteria in the overloaded method getRequiredDependency This is required to achieve the behavior described in Section 5 3 Figure 5 6 ReDHeadDeclaration A ReDHeadDeclaration encapsulates an IName It is the representation of that AST node to which a ReDHeadDeclarationReference resolves to The IIndexName instance which is necessary to create a ReDHeadDeclaration is returned when resolving the IBinding of the IASTName contained in a given ReDHeadDeclarationReference Resolving and acquiring the IBinding is somehow complicated because there are sev eral different approaches which need to be combined to find the correct IName instances Dependency Classes Dependency classes represent dependencies between other ReDHead classes There is always a origin and a target file involved DeclarationReferenceDependency The DeclarationReferenceDependency class represents a link between a ReDHead DeclarationReference and a ReDHeadDeclaration For its creation an instance of one 51 of the linked classes which are described above is required The DeclarationRefer enceDependency gives access to IncludePaths The DeclarationReferenceDepend ency is the right class for this because it can get access to both the originating and target ReDHeadFile of the SourcePaths IncludePath An IncludePath represents one path of include edges from an include ReDHeadFile to a targetReDHeadFile This can be seen as the physical design eq
60. all hudson sudo etc init d hudson start Now you can connect to http localhost 8080 and configure Hudson from there To run an Ant or shell script automatically just add a job let Hudson check out the Git repository if it was updated and run the Ant or shell script A 2 3 Trac The project management system Trac 0 11 is used as ticketing system and to present the ReDHeadProject to the public After the installation one is able to browse the Git repository source online and the Hudson build status is shown in the Trac timeline Trac is running as an Apache module So the first setup step is to install Apache and Trac sudo apt get install apache2 sudo easy_install Trac apt get install libapache2 mod python libapache2 mod python doc a2enmod mod_python AGARARA Now we create the ReDHead Trac project sudo mkdir var lib trac ReDHead sudo trac admin var lib trac ReDHead initenv sudo chown R www data var lib trac ReDHead To install the Trac plugins which enable Git and Hudson support download the zipped sources from http trac hacks org wiki GitPlugin and http trac hacks org wiki HudsonTracPlugin unzip them and run each ones setup py script The last thing to do is to adapt the Trac configuration which is located at var lib trac ReDHead conf trac ini I will only post the lines that I have added or modified 91 12 18 21 24 27 component
61. an be found in Lak96 on page 82 Here a dummy example main cpp include header h include header h include header h include header h 36 ou 11 14 ou header h ifndef HEADERH define HEADERH class A endif x HEADERH When compiling the code from the previous listings the file header h is opened and closed four times The operation however was only required exactly once By intro ducing redundant include guards into main cpp as can be seen in the following listing results in faster compile time main cpp ifndef HEADERH include header h endif x HEADERH ifndef HEADERH include header h endif x HEADERH ifndef HEADERH include header h endif x HEADERH ifndef HEADERH include header h endif x HEADERH ifndef HEADERH include header h endif x HEADERH Adding redundant include guards as demonstrated above is an annoying time con suming and also error prone task Thus adding automated support in form of a ReD Head algorithm that adds redundant include guards will be a very valuable feature when utilizing redundant include guards The following listing demonstrates the algorithm allIncludesList findAllIncludes currentFile for include in allIncludesList includedFile includedFileOf include if not alreadySurroundedByIncludeGuards include and containsIncludeGuards gt includedFile
62. anks to the indexers capability of indexing certain specified file up front Files to parse up front can be configured as a property of a project s indexer So the solution to make all types known to the indexer that are not located inside of the project itself is to add all the external header files to this list The list of course grows very long since only the C standard library version 4 4 already contains about 4 600 header files Often there are also several other libraries included which lets the list grow again Parsing all these files takes a lot of time of course The good thing here is that the adaptation of the index is persistent so that it is only necessary one for a project So the ReDHead plugin adapts the index of a project by letting the indexer parse all the files which are contained in any include directory or sub directory that are known to a CDT project An instance of ICProject is capable of returning all these directories as instances of IIncludeReferences By recursively traversing each of these directories a 76 list of all contained files is created and provided to the projects index as parse up front list Then the indexer needs to be re indexed once This adaptation of the index does up to now happen implicitly when any ReDHead algorithm is run Re indexing the whole indexer each time an algorithm is run is of course suboptimal Because of that before the index is adapted it is checked if the adaptation is
63. are required which can validate the correct behavior of the ReD Head plugin concerning these external include directories The refactoring test frame work on which the ReDHead testing framework is based on however this did not provide the possibility to specify any such external include directories After spending a lot of time to figure out how the CProject instance that was created in the setup phase of the test could get added such directory paths I found the correct way to achieve what I desired However CDT s approach to solve this problem is quiet strange private void addExternalPaths String paths new String path to first dir path to second dir TestScannerProvider sIncludes array Override protected void tearDown throws Exception TestScannerProvider clear To make the testing indexer also consider the external directories I want to add they do not have to be added to the created instance of ICProject but rather to the globally accessible field TestScannerProvider sIncludes As can be seen in the listing the TestScannerProvider has to be cleared after the test was run in the tearDown method I think do I not have to go into detail here to show that this is not a very nice approach to solve the problem at hand 58 User Manual A detailed description of the features provided by the ReDHead plugins is given here Whereas in Chapter 6 the focus was on the technical implementation details
64. arker on list ing line 7 To create a maker of the newly defined type one can follow the example shown in the previous Section B 2 expect that instead of the term ICModelMarker C_MODEL_ PROBLEM MARKER which also refers to a marker id one can use the new id ch hsr ifs redhead unusedincludemarker The resulting maker can be seen in Figure B 3 ih Ah 8 l include lt string gt 2 Figure B 3 Example Customized Marker B 3 Example Quickfix In Eclipse quickfixes are attached to problem markers They are shown in Eclipse when pressing ctril 1 To get the ReDHeadQuickFixes into Eclipse I needed to add the following to the plugin cml This registers the ReDHeadQuickFixProcessor so it gets asked for quickfixes by Eclipse lt xml version 1 0 encoding UTF 8 gt lt eclipse version 3 4 gt lt plugin gt lt extension point org eclipse cdt ui quickFixProcessors gt lt quickFixProcessor id ReDHeadQuickFixProcessorsExtension name ReDHead gt Quickfix Processor class redhead ui ReDHeadQuickFixProcessor gt lt handledMarkerTypes gt lt markerType id org eclipse myplugin audits gt lt handledMarkerTypes gt lt enablement gt lt with variable projectNatures gt lt iterate operator or gt lt equals value org eclipse cdt core cnature gt lt iterate gt lt with gt lt enablement gt lt quickFixProcessor gt lt extension gt lt plugin gt 106 12 The class attribut
65. athList removeWorstChoice availableIncludePathList pickMandatoryIncludes pickedIncludeList availableIncludePathList return pickedIncludeList The process of finding all the required includes is achieved by picking includes until all the includes which are required are in the list pickedIncludesList Picking includes 24 happens in pickMandatoryIncludes in pseudo code line 4 and 7 For each path of which the target file cannot be reached through any other path the first element in the path is thus mandatory to satisfy some declaration reference dependencies in the active file In pickMandatoryIncludes all these first elements are added to the list pickedIncludesList When an include is picked all the include paths which start with the picked include are not needed anymore because their target file is now reachable through the picked include This means that not only the include paths itself but also all others which have the same targets can be removed from the list availableIncludePathsList Now one checks if the list availableIncludePathsList is not empty If it is there is nothing to do anymore and the list of picked includes can be returned At the moment after all the mandatory includes have been picked of all the first elements of the remaining include paths in availableIncludePathsList must at least one be unnecessary So next in removeWorstChoice of all remaining first elements of the include paths in
66. ause the task would be very time consuming and error prone when executed manually This algorithm is intended to be run on a full project and not on a single file When it is triggered in 30 the context of a single file the find unused files algorithm is triggered on the enclosing project instead The following pseudo code implementation introduces the algorithm 1 unusedFileList findAl1FileInProject presentProject for sourceFile in findAllSourceFilesIn presentProject 4 removeFromList unusedFileList sourceFile for include in recursivlyGetIncludes sourceFile includedFile findIncludedFile include 7 removeFromList unusedFileList includedFile J 10 markFilesAsUnused unusedFilesList Note that the find unused files algorithm as it is described above is also capable of finding several headers which are unused but which include each other This can also be seen on the example run demonstrated bellow on the example project in Figure 4 7 Note that all the node labels in this graph do not represent declaration and declaration reference names as usual Here they just represent the name of the file Header FileA h Source File A cpp Declaration Reference Dependency Include Include Path Figure 4 7 Example for Finding Unused Files Each line in the following table represents one iteration of the outer loop in the pseudo code implementation line 3 Execution Point unusedFiles
67. bject construction but on first access which prevents the evaluation of unnecessary elements and ensures that evaluation is only done once 6 2 1 Data Stores During the phase of optimizing the ReDHead data structure as described in 9 3 sev eral store classes were created to prevent the unnecessary recreation of data structure elements These were created to store e ReDHeadDeclarations e IncludePaths e IncludeSubstitutions as introduced in Section 4 9 e IIndexIncludes e ReDHeadSuggestions 6 2 2 Clean Up When setting up data structures which allocate a lot of memory one needs to make sure that as much of the occupied memory is released as fast as possible again In Java releasing memory is of course achieved by the garbage collector however this is only done with objects that are not referenced anymore So an important task after the ReDHead plugin performed static include analysis is for the ReDHead data structure to be cleaned up completely This implies that all information which is to outlast the cleanup process cannot contain parts of the ReDHead data structure This means that ReDHeadSuggestion instances cannot keep members such as ReDHeadFiles AST nodes or ReDHeadDeclarationReferences But still they must be capable to create ReD Head QuickFizxes that propose solutions for the suggested problems So what now remains in ReDHeadSuggestions are names of files and offset locations of nodes which are required to generate ReDHea
68. ceDependency class instances Physical design on the other hand is something very similar in the task of designing But one could say it is a second dimension of the design process Here the question is how to organize a system s physical units These units are files and directories and the relation between them the include directives It also belongs to the tasks of the physical design to put the components of the logical design into the physical units in a smart manner Obviously the two ways to design are not completely independent of each other since the coupling which exists in the logical design influence the include structure of a project which belongs to the physical design And the logical design s components are distributed in the physical design s units the files To develop a good software product both the task of logical and physical design should get enough attention If one is neglected the software products code quality is decreased Al 5 2 Declaration References ReDHeadDeclarationReferences can be obtained from any ReDHeadFile Often many ReDHeadDeclarationReferences contained in one file refer exactly to the same ReDHead Declaration To prevent the repeated execution of the same operation on similar ReDHeadDeclarationReferences only one of these similar ReDHeadDeclarationRef erences is returned by the class ReDHeadFile So the set returned contains only as many ReDHeadDeclarationReferences as there are dif
69. claration reference are enveloped by a black dotted oval which is labeled with the name of the correlating vertex name in the SDG The graph shown in Figure 2 6 depicts the object oriented SDG which was shown in Figure 2 4 In the figure one can also see the include dependencies between different source and header files The program s source code and the SDG of Figure 2 4 including additional green line numbers necessary to comprehend the ReDHead graph in Figure 2 6 can be found in Figure 2 7 2 4 Conclusion As was explained the Abstract Syntax Tree the System Dependence Graph and also the ReDHead graph each are very useful to perform static code analysis When one focuses on static include code analysis though the ReDHead graph combined with additional information from ASTs is the optimal provider for the required information 16 ne current f current oor 2 2 top_fioor 3 top_fioor 3 3 Rast current _ current direction 4 4 foor 5 5 er gt Declaration Reference Dependency I he D om include Dependency E0 main int current_floor int top_floor int current_direction S1 int floor 5 S2 current_floor 1 S3 top_floor 10 S4 current_direction UP S5 if current_direction UP o S6 while current_floor floor amp amp current_floor lt top_floor 11G7 add amp current_floor 1 12 else 13 58 while current_floor floor amp amp current_
70. components change during refactoring or maybe even are removed the docuemtation changes along all over This results in a big loss of time which could have been spent on other tasks The optimal way to document during a software project thus is in my point of view to document early and continuously but rather than writing full text passages which can be considered finished at the time of writing while only updating the outline and writing text in a sketchy style Like this all the facts which are important to end up in the documentation will be available while the loss of time when re processing the documentation will be small because the real docuemtation text will only be written during the ending phase of the project and thus only once Discipline and Motivation When one works completely alone except for the weekly meetings with the supervisor work discipline is bound to falter from time to time I think I could master this difficulty quite well but if I could choose I would definitely prefer a team project because the obligation towards and the help from other team members makes this barrier much smaller Overall I think this master thesis was a great experience to add to my growing collection of software engineering projects that I worked on At this point I want to thank my advisor Prof Peter Sommerlad for all the guidance I received and for the motivation he helped to raise 115 C 6 Changes Since Term Project Some sect
71. ctive to lt string gt B main cpp 52 Multiple markers at this line Missing include lt string gt usr include c 4 4 string The include statement include lt vector gt is unneeded std string s 7 3 3 Auto Organize Includes The Auto Organize Imports proposes the same suggestion as the Organize Imports 7 3 2 algorithm But it does not add markers or shows the ReDHead Suggestion Dialog Instead of that it automatically applies all the solution proposals to the generated suggestions To each automatically applied change an info marker is added so the user can track the applied changes and also revert them if he likes To use this feature as fast as possible it is also bound to the shortcut ctril shift o 66 7 3 4 Directly Include Referenced Declarations Directly including referenced files will suggest to add an include directive to each header file that defines a type that is used in a file under consideration This will alter the include structure of a project in favor of the reduction of include dependencies coupling and compile time A detailed description why directly directly including referenced files is useful can be found in the book Large Scale C Software Design Lak96 by John Lakos on page 113 As an example consider the following C project graph A3 h include A1 h include A2 h include A3 h be oe oe om om oe l l l
72. dQuickFizes 6 3 Optimization Algorithms Figure 6 4 shows the class diagram that demonstrates how the ReDHeadOptimizer works The ReDHeadOptimizer uses one or several ReDHadAlgorithms which s task is to collect 53 ReDHeadSuggestions These suggestions each give a hint on anything that is not optimal when looking at the include structure of the given C C code As a sample the class diagram contains the find unused includes algorithm redhead cxxelements a lt lt package gt gt optimizer ReDHeadOptimzier x ReDHeadAlgorithm A ReDHeadSuggestion FindUnusedincludesAlgorithm redhead dependency Figure 6 4 Optimization Algorithms Class Diagram Each of the ReDHeadAlgorithms looks at the ReDHead data structure and tries to lo cate potential points that could be optimized The ReDHeadOptimizer gathers all these points in the form of ReDHeadSuggestions The UI component which makes use of the gathered suggestions is the ReDHeadOptimizationRunner which is described in the fol lowing section Often ReDheadSuggestions also bring along a solution proposal in the form of a ReDHeadQuickFix that can fix the problem which the suggestion points out The following list shows the algorithms which have been implemented during the ReD Head project There will not be any further description on them since their functionality was already described in Chapter 4 in detail e Finding Unused Includes e Organize
73. data structure worked precisely and also fast enough Note also that due to some of these added tasks others have been postponed to the end of the project The decision to postpone these tasks was taken either with the approval or even because of the advise of my supervisor The resulting project plan can be seen in Figure C 2 Task leek 1_ Week 2 Week 3 _Week4 Week 5 Week 6 Week7 Week amp Week9 Week 10 Week 11 Week 12 Week 13 Week 14 Week 15 Week 16 Week 17 Week 18 Week 19 Week 20 Week 21 Organizational Test with big C projects Hook into CDT s static analysis framework Improve error handling Small Updates Add custom marker icons Add IncludePath selection algorithm Improve Annotation handling Optimize AST access Detect circular include dependencies Impl Directly include referenced declarations Prorolase for ACCU Optimize datastructure amp algorithms Finetune amp bufix datastructure Impl organize includes Impl find unsed files Fintune Insertion of Includes Finetune Removal of Includes Finetune Static Code Coverage Run continuously in background Final release Investigate on SDG Slicing eae tee Read Large Scale C Software Design B aaa Update Documentation Impl replace includes with forward declarations Add dependency graph view User Manual Update documentation focus i a a a Documentation Fe a ae Se a ee NT NT ey ey ee Backup Figure C 2 More Detailed Project Plan
74. des with forward declarations 4 6 e Introduce redundant include guards 4 7 which is described in detail also in Lak96 on page 85 88 10 2 4 Combine compile configuration results The rather vague idea that is described here is to instead of running static analysis with only one compile configuration see 3 1 1 run it with several ones Compile configura tion specific ReDHead data structures could then be compared to gain a more detailed insight into the include structure of a given project 89 A Continuous Integration Setup In this section you can find a detailed description how the ReDHead continuous integra tion environment is set up A 1 Continuous Integration Introduction When changing existing software one should always make sure that one does not break any existing features while adding or changing others To not put ones luck to the test using continuous integration Fow is crucial In continuous integration the whole project is built periodically and automated tests are run on the server to make sure that a software project is running properly This task is mastered by the build server Hudson Sun which observes the Git repos itory and builds the ReDHead plugins and this documentation automatically when the repository was updated Then it triggers the automated ReDHead project tests The result is published on the ReDhead webpage http redhead ifs hsr ch hudson I myself am notified about success or failure through
75. development of the ReDHeadPlugin some tasks were especially hard to complete The most noteworthy of these tasks are described in this chapter 9 1 Adapting the CDT Index Normally the CDT indexer is aware of all the C code which is somehow reachable through any of the source files which are contained in a project This means that all the headers which are included into any source file are represented in the CDT index However headers which are not included in any way are not represented in the indexer This is even true for header files that are contained in the current CDT project itself For the ReDHead plugin this brings forth the problem that when using a type in any file even if a type is declared inside of a header file in a given project there will be no suggestion generated to include this header file because the relation from the usage of the type to its declaration is not possible due to the indexer s ignorance of the header file For header files present in the current CDT project this issue can be solved easily because the indexer can be configured to also index all header files However this does not remove the whole problem since the ReDHead plugin is also supposed to be able to propose includes to external header files This can be header files of the C standard library or any other header which is contained in one of the include directories that is linked to the CDT project Also the remaining problem can be solved th
76. dhead feature gt lt copy gt lt copy todir buildDirectory plugins gt lt fileset dir basedir Software gt lt include name ch hsr ifs redhead gt lt include name ch hsr ifs redhead tests gt lt include name ch hsr ifs redhead codan gt lt fileset gt lt copy gt lt target gt lt target name zips depends init init eclipse props gt lt ant antfile build xml dir pde build scripts gt lt property name builder value basedir config gt lt ant gt lt target gt lt target name tests depends init unless hasErrors gt lt record name testReportDir testsLog txt action start loglevel verbose gt lt unzip src redhead featureZip dest buildDirectory gt lt ant antfile test xml gt lt property name eclipse home value eclipse home gt lt property name library file value eclipse home plugins org eclipse test13 310 tibrary n xm lt property name 0s value baseos gt lt property name ws value basews gt lt property name arch value basearch gt lt ant gt lt xslt style basedir JUNIT XSL in testReportDir ch hsr ifs redhead tests ReDHeadTestSuiteAll xml out testReportDir junits html gt lt record name testReportDir testsLog txt action stop gt lt target gt lt target name publish test results gt lt copy file testReportDir junits html tofile var www ReDH
77. e e Access ReDHead dependency classes 4 Check the results on correctness 5 Clean up the project that was tested on 56 11 ou 11 14 17 Luckily Eclipse CDT already contains refactoring tests which proceed in a very similar way as described in the list above It provides an elegant solution which covers points 1 2 and 5 from the list above The ReDHead testing framework thus is an adaptation of the refactoring testing framework To write a new test one has to provide the source code that will be added to the project to test on Additionally a test class is needed that runs the method under test and validates the results The source code of the testing project is provided in one file in a special file format which is then interpreted by the testing framework Here the file CoverageTest2Many Files rts as an example CoverageTest2ManyFiles ch hsr ifs redheadtests coveragetests CoverageTest2ManyFiles A cpp include Unused h include Used h int main int argc charx x argv Cae oe J QUnused h class Unused Used h clases LT Note that the example file contains code segments which represent three small C files The comment A cpp for example signals the start of file A cpp The first line of the listing is the test s name and the second one denotes the test class which will be instantiated The test class CoverageTest2ManyFiles java looks like this public c
78. e g A1_out formal in e g F1_in and formal out e g F1_out vertices have been added to the graph These can be mapped to source code with the help of the Key for Parameter Vertices agenda For a precise description of the graph I refer to LH96 and HRB90 The following Figure 2 2 introduces the class Elevator which splits te beforehand introduced elevatro program into member functions Note that there is not yet a main program available in the figure It will follow in Figure 2 4 class Elevator public Elevator int _top_floor current_floor 1 current_direction UP top_floor _top_floor Fa virtual Elevator i void up current _direction UP void down current_direction DOWN sae i N int which_floor ET i AN if a i return current_floor E wo 7 i AA a ee Direction direction 2_in gt 2_oub V a NY ye ae kedai V7 ALD AiD Aloud AIID AZID Aloud return current_direction Gs a ee y virtual void go int floor ee ST if current_direction UP TS while current_floor floor amp amp current_floor lt top_floor add current_floor 1 else a while current_floor floor amp amp S22 current_floor gt 0 add current_floor 1 Key for Parameter Vertices gt member function edge Fl_in current_floor current_floor_in F6_in a a_in control flow edge private F1_out current_floor_out current_floor F
79. e main program is shown which uses either the Alarm Elevator or the Elevator class Note the vertex C38 which calls the overloaded member function go connects to the artificial vertex P1 which again leads to both the imple mentations of the go member function This is due to the fact that the evaluation of which member function is really called is only possible at runtime 13 535 S36 S37 C38 S39 main int argc char argv Elevator e_ptr if argv 1 e_ptr new AlarmElevator 10 else e_ptr new Elevator 10 e_ptr gt go 5 cout lt lt nCurrently on floor lt lt e_ptr gt which_floor lt lt n member function edge control flow edge data flow edge member function call edge parameter in edge parameter out edge transitive data flow edge Key for Parameter Vertices current_floor current_floor_in current_floor_out current_floor current_direction current_direction_in current_direction_out current_direction top_floor top_floor_in top_floor_out top_floor _top_floor _top_floor_in floor floor_in a a_in a_out a b b_in alarm_on alarm_on_in alarm_on_out alarm_on Alin Al_out A2_in A3_in A4_in A4_out A5_in A5_out A6_in A6_out AT_in A7_out A8_in A9_in A10_in All_in a_in current_floor current_floor a_out b_lin 1 b_in 1 current_floor_in current_floor current_floor cu
80. e of Full IncludePath The FullIncludePath contains all the path elements starting with the first included file To construct a FullIncludePath the whole include structure starting on the originating file is recursively traversed to find the target file The traversal functionality is provided by the CDT indexer To minimize construction time of a FullIncludePath the recursive traversal to find the target file in the case of the target file is part of the current project and the traversal leaves the scope of the current project skips further traversing outside of the project This is done under the assumption that header files which are not part of the project also do not again include headers that themselves are inside of the project The second subclass of IncludePath is the FirstLastElementIncludePath The clue here is that it only contains the IIndexInclude to the first included file and the one to the last included file Most of ReDHead s algorithm do not care at all about the elements in between the first and the last one The CDT indexer can without any recursive traversal deliver all the files which are transitively included through a given 52 originating file It is clear that path construction here takes much less time in the case of FirstLastElementIncludePath Lazy Loading For all the above described classes contained fields which can be retrieved through getter methods are lazy loaded So fields are not initialized on o
81. e of the quickFixProcessor defines the name of the QuickFix Processor class which is shown here public class ReDHeadQuickFixProcessor implements IQuickFixProcessor public ICCompletionProposal getCorrections final IInvocationContext context final IProblemLocation locations throws CoreException ArrayList lt ICCompletionProposal gt proposalList new ArrayList lt lt ICCompletionProposal gt proposalList add new ReDHeadDummyCompletionProposal return proposalList toArray new ICCompletionProposal 0 public boolean hasCorrections final ITranslationUnit unit final int problemId lt return true Here the RedHead ICCompletionProposal implementation is used You can see a dummy implementation of this class here public class ReDHeadDummyCompletionProposal implements ICCompletionProposal public String getIdString return myDummyID public int getRelevance return 0 public void apply final IDocument document apply changes here x public String getAdditionalProposaliInfo return My accurate description public String getDisplayString return ReDHead dummy QuickFix public IContextInformation getContextInformation return null public Image getImage return null public Point getSelection final IDocument document return null I should probably mention here that the quickfix in CDT is only shown when invoked through ctrl 1 and not by clicking on
82. e one of Java One should be aware that Java imports are not transitive whereas C includes are which makes the task much more difficult than the one in Java This algorithm is file based which means that its starting position is one given C source or header file The algorithm is also available for a whole project In this case the algorithm is run for each C file that is contained in the given project To give an overview the algorithm is first described in pseudo code includeList findListOfIncludes presentFile declarationRefList findAllDeclarationReferences presentFile availableIncludePathList findListOfIncludePaths declarationRefList pickedIncludeList unusedIncludeList filterAvailableIncludePaths availableIncludePathList removeAllRequiredIncludes includeList pickedIncludeList markIncludesAsUnused unusedIncludeList To help understanding the textual description which follows the example in Figure 4 2 was added here The active file on which the algorithm is run is the one represented as black box labeled D H G in the bottom of the graph gt File Defining Type A File Using Type A Declaration Reference Dependency Include Include Path Figure 4 2 Find Unused Includes Example The first step of the algorithm gathers all present include directives in the file pseudo code line 1 This information can be retrieved by the the active file In the example figure these includes
83. e unneeded line 7 The component which decides which of all available include paths contained in the active file list includePathList is best to be left in the document the routine filter AvailableIncludePaths are described in the following section 4 1 1 Optimal Include Path Selection The reason for the presence of this section is that a declaration which is required through a declaration reference in the current file can be included through several include paths If there are several possible paths available to navigate through include directives to a declaration and these paths have different starting include elements one needs to decide which one of them is the best one because seen from the given declaration reference only one of these starting includes is required to be included in the current document To make a good decision however one does not only need to look at the in include paths leading from one declaration reference to one declaration but at all of these relations This means that given all the include paths leading out of the current document one can remove some paths until all the paths are necessary so that it is still possible to reach all the declarations which are used through declaration references in the current document filterAvailableIncludePaths availableIncludePathList pickedIncludeList EmptyIncludeList pickMandatoryIncludes pickedIncludeList while hasIncludeChoice availableIncludeP
84. eDeclarations presentProject usedDeclarationList emptyList removeAll unusedDclarationList activeDeclarationList addAll usedDeclarationList activeDeclarationList for declaration in activeDeclarationList containedDeclarationRefList findContainedDeclarationRefs declaration 10 for declarationRef in containedDeclarationRefList newDeclaration findeDeclaration declarationRef appendToList activeDeclarationList newDeclaration 13 addToList usedDeclarationList newDeclaration removeFromList unusedDclarationList newDeclaration 16 markUsed usedDeclarationList markUnused unusedDclarationList Note that in pseudo code line 12 a declaration is appended to the list active DeclarationList The loop in line 8 is at that moment iterating over exactly this list I assume here that 1 this does not break the execution of the pseudo code and 2 that the appended element is included into the iteration on line 8 This rather strange behavior is here used to keep the pseudo code as small and as simple as possible Figure 4 8 will demonstrate the algorithm in the given sample run int bar1 class A int bar2 ic d A void fool a ear void foo2 Include int bar1 Ah int bar2 zsA void A fool A a new A void A fo0o02 a gt fool int global bar2 Figure 4 8 Static Code Coverage Example In the sample
85. eadFiles junits html failonerror false gt lt target gt lt project gt 98 10 13 Building the ReDHead plugins first requires to copy the source code of the plugins to the expected location This is done by the Ant target install redhead source Building the ReDHead feature itself is achieved in the Ant target zips Note that the property builder which is passed there refers to the config directory which contains the build properties file which is listed above The Ant task tests extracts the previously built ReDHead feature into the Eclipse instance and launches the Ant file test cml which is listed bellow Executing the tests results in the file ch hsr ifs redhead tests ReDHeadTestSuiteAll czml which contains the test results This file is used by the Hudson build server to display as result The file is then also XSLT transformed into a HTML file which can be found on the ReDhead web site The following listing shows the tezt rml file that is responsible to start the testing instance of Eclipse and also run the ReDHead tests lt project name ReDHead Automated Tests default redhead basedir gt lt property name redhead loc value eclipse home redhead tests data dir gt lt target name redhead gt lt ant target ui test antfile library file dir eclipse home gt lt property name data dir value redhead loc gt lt property name plugin name value ch hsr ifs redhead tests
86. ecreases their number a lot Besides a huge performance gain the algorithm is not affected at all through this change 4 2 Organize Includes Eclipse JDT contains an organize imports feature that can be used to automatically import all needed classes into a Java class The target of the Organize Includes algorithm is to do the same thing in C code with includes Organizing includes basically does two different things The first thing is to finds includes that are unused and to propose them for removal This part is already covered by the finding unused includes algorithm described in 4 1 A small difference here is that organizing includes only makes sense in the scope of a single file and not in the scope of a whole project The second thing that organize includes does is to add missing includes which are required to successfully compile the active file To achieve this goal the ReDHead plugin needs a possibility to resolve declaration references to any declarations in a project s scope This includes for example also the declarations of the C Standard Library or any other headers which are available in the project The following pseudo code implementation gives a good overview of the task that is fulfilled by the organize includes algorithm indAllDeclarationReferences presentFile declarationRefList ae EmptyIncludeList filesToIncludeList for declarationRef in declarationRefList declarationRefDependency find
87. ed includes menubarPath staticAnalysisMenu StaticAnalysisSeparator tooltip Finds unused includes and marks them gt lt enablement gt lt or gt lt objectClass name org eclipse cdt core model ITranslationUnit gt lt objectClass name org eclipse jface text ITextSelection gt lt and gt lt objectClass name org eclipse core resources IResource gt lt gt 101 10 13 16 19 lt or gt lt objectState name extension value cpp gt lt objectState name extension value c gt lt objectState name extension value h gt lt objectState name extension value hpp gt lt or gt lt and gt lt or gt lt enablement gt lt action gt lt actionSet gt lt extension gt lt extension point org eclipse ui perspectiveExtensions gt lt perspectiveExtension targetID org eclipse cdt ui CPerspective gt lt actionSet id ReDHeadActionSet gt lt perspectiveExtension gt lt extension gt lt plugin gt The sample action which is added here will show up in the menu as can be seen in Figure B 1 Search Run Project MOGERIET Ec Win Find unused includes Igy Gy a o 7 C Kk Figure B 1 Example Menu Entry The visible label of the menu bar entry is set in listing line 7 The one of the menu entry itself can be found in line 12 The action class which will be instantiated and executed when the menu entry is clicked is specified in line 11 T
88. eferencedFile removeFromList includesToRemoveList referencedFile J proposeToInclude filesToIncludeList proposeTolRemove includesToRemoveList The algorithm s pseudo code actually covers a special case which arises if the present file is an implementation file of a C class e g Something cpp that is defined in the header file which correlates by file name to the present file e g Something h In this case all the files which are included in this header are treated as if they would be contained in the present file pseudo code line 4 7 Besides the effect of decreased coupling which was already described above the algo rithm has the positive effect that even if the include list might grow significantly there are no more unused declarations included than possible This might result in a slightly optimized compile time One should also be aware that this algorithm sometimes stands in conflict with the find unused includes algorithm When including all used declarations directly it might happen as can be seen in the example in Figure 4 6 that some of them are in the resulting include structure included indirectly through several include paths which then will cause the find unused includes algorithm to propose the removal of some includes 4 4 Find Unused Files Find unused files is an algorithm that informs the user about files in his project which contain dead code In a big C project this might come in very handy bec
89. elp section contains the following subsections e Find Unused Includes 7 3 1 e Organize Includes 7 3 2 e Auto Organize Includes 7 3 3 e Directly Include Referenced Declarations 7 3 4 e Find Unused Files 7 3 5 e Static Code Coverage 7 3 6 7 3 1 Finding Unused Includes The Find Unused Includes algorithm analyzes either a whole C project or a single source file to find all include statements which are not necessarily required Consider the following example include lt vector gt include lt string gt int main std string s Clearly the first include directive is not needed here When running the Find Unused Includes algorithm on the sample code it will suggest the removal of the first include as shown in the following screenshot 65 main cpp 5g include lt vector gt include lt string gt int main std string s 7 3 2 Organize Includes The Organize Includes algorithm analyzes a single file and tries to make all necessary suggestion so that the file will then compile without giving any warning of the type XY was not declared in this scope In addition it will suggest the removal of all unnecessary includes Consider the following example include lt vector gt int main std string s When running the Organize Includes algorithm on the sample code it will suggest the removal of the currently existing include and the insertion of an include dire
90. er files Also initially well designed projects develop a complex net of include dependencies over the years which is often almost impossible to understand and manage As a side effect compile time rises significantly Hence the possibility to approach such design issues supported by an automated static include analysis tool is a crucial advantage The name ReDHead origins from the idea to support a software engineer while Refactor Dependencies of C Header Files Following a list of features provided by the ReDHead tool e Organize includes e Find unused includes e Directly include referenced files e Find unused files e Static code coverage The last feature mentioned static code coverage is actually a special type of code slicing feature which is very helpful to understand a program s design and expose unused code parts Contents 1 Introduction 1 1 1 2 1 3 1 4 1 5 1 6 1 7 Importance of IDE Features Static Include Analysis Choice of IDE Important Terms Static Analysis in General Feasibility Study Document Overview 2 Abstraction Concept for C Source Code 2 1 2 2 2 3 2 4 AST System Dependence Graph ReDHead Graph Conclusion 3 Used CDT and Eclipse Components 3 1 3 2 Eclipse CDT 3 1 1 Compile Configurations 3 1 2 AST and Indexer UI Elements 4 Dependency Optimization Algorithms 4 1 4 2 4 3 4 4 4 5 4 6 4 7 4 8 4 9 Finding Unused Includes 4 1 1 Optimal
91. f spending hours on scanning code manually can focus on engineering better software 1 2 Static Include Analysis Compared to the import mechanism of Java the mechanisms that C brings along with include directives can in bigger projects become quite tedious and add a lot of complexity Because the include mechanism works transitively it is more complicated and thus adds inadequately used a lot of unnecessary complexity to C projects A whole project of C code files depicts a complex construct of source and header files of which often many rely on each other A project which was not well designed from the start is often hard to refactor when it comes to cleaning up and maintaining an existing include structure Often nobody dares to even try to remove existing include directives because the effect it will have can be very unpredictable Because of the amount of information we are talking about is often huge it makes sense to approach this issue with automated features which can process the information far quicker than a human could when browsing and analyzing code manually This is where the motivation for a static include analysis plugin origins 1 3 Choice of IDE The reason to choose Eclipse CDT to implement a plugin for was simple The IDE already provides good support for code navigation and some refactoring support I do not want to push any commercial solution to develop C code so the open source IDE Eclipse CDT was the
92. ferent ReDHeadDeclarations referenced This optimizes the runtime of the algorithms which use the ReDHead data structure 5 3 Declaration Reference Dependencies A ReDHeadDeclarationReference is able to return several DeclarationReference Dependency instances Note here that to improve the readability this was neglected in the pseudo code implementations of the ReDHead algorithms in Chapter 4 A ReDHeadDeclarationReference is also able to filter this collection of dependencies according to certain criteria The following example ReDHead graphs in this section introduce these filtering functions of the ReDHeadDeclarationReference in detail Figure 5 2 contains four code files one contains the definition of the class X two others each contain a forward declaration of the class X The last file contains a ReDHead DeclarationReference to class X The class ReDHeadDeclarationReference contains the method getAllDependencies which returns the complete list of available Decla rationReferenceDependencies which here are all the red dashed edges Note that the list also include a dependency to the forward declaration which is not included in the file main cpp 1X 1 class X Declaration Declaration reference X 1 gt Declaration Reference Dependency Xfwd1 h 1 class X Xfwd2 h 1 class X Include D d Xfwd2 h I xI gt Include Dependency Imain cpp 1 include X h 2 include Xfwd2 h ay
93. fferent modes In the main mode where a main function is present in the project only the main function and global variables are used as starting point In the library mode where a main function is not available all declarations in source files are used as starting point The Static Code Coverage algorithm can decide for itself in which mode it will run depending on the presence of a main function Consider the following example code void bar int foo return 42 J void nothing int main bar aime 52 FOO 2 Since a main function is available here the algorithm will run in the main mode This means the starting point consists of the main function and the definition of the int x The functions foo and bar are reachable through the starting point whereas nothing is not So the result as can be seen below marks all the declarations expect the nothing function as used main cpp 53 l void bar void foo return 42 nothing int main bar int x foo 70 7 4 How ReDHead Include Analysis Works All the ReDhead static include analysis algorithms are based on the ReDHead data struc ture which is constructed with code information obtained from the CDT AST Abstract Syntax Tree and the CDT Indexer The ReDHead data structure contains several types of elements which are introduced in the following subsections Some of the element
94. fferent ways with scores of loops if then else statements casts and much more to get only one other IASTNode After spending a lot of time to find out all of this I was able to do the resolution I required to fulfill my test s requirement It was not necessary to re implement all of these ways that I found This can be positive and also negative Either some of the CDT code is old and was 81 not yet removed because no one was sure if it really can be removed or I will find other scenarios in the future which will force me to re implement also all or some of these remaining ways 82 o 10 Outlook In the future the development of the ReDHead Eclipse plugin will go on There is a good chance that the plugin might get a commercial product distributed by a partner company However the ReDHead plugin is far from finished The following sections in this chap ter list tasks that have not yet been achieved in the ReDHead master thesis The tasks are grouped into features that can be improved and other ones that are still completely missing 10 1 Improvements Improve accuracy of static code coverage The static code coverage algorithm is not precise enough when it comes to the instanti ation of types Regard the following example code listing structair x X int i s U2 X pX new X 7x6 Running the static code coverage algorithm on the sample given above it will mark all the constructors and also their im
95. floor gt 0 14 C9 add amp current_floor 1 15 S10 _ printf d current_floor Sr ae ee 16 17 E11 add int a int b 18 S12 a a b 19 Figure 2 5 Simple Example for a ReDHead Graph 17 main cpp iostream cout ostream tcc ar sh za r Ne Declaration j M t_al 37 E gt Declaration Reference Dependency l C gt Include Dependency gt oy Elevator h AlarmElevator h N Ne ae ma T a as if i ALA x Y aa Elevoatr_7 x os eae I ce 4 ie ee a RR ae Figure 2 6 Complex Example for a ReDHead Graph 18 S000 Oy Ol Geer 10 t EN 12 B13 S14 EIS S16 ole C18 s O19 C20 Bal yf ee class Elevator public Elevator int _top_floor current_floor 1 current_direction UP top_floor _top_floor 32 CE23 class AlarmElevator public Elevator virtual Elevator 33 public a dich 08s 34 E24 AlarmElevator int top_floor paa w oe 35 C25 Elevator top_floor VOLET CEES 36 S26 alarm_on 0 current_direction DOWN int which_floor return current_floor Direction direction 37 E27 void set_alarm 38 S28 alarm_on 1 39 E29 void reset_alarm return current_direction 40 S30 alarm_on 0 virtual void go int floor 41 E31 void go int floor if current_direction UP 42
96. g include A3 h Actions to take Select the action that will be taken _ forthe proposed suggestions for _ Checked Suggestions None Add markers Apply Quickfix ReDHead Suggestions All Suggestions O None Add markers Apply Quickfix 68 lt 7 3 5 Finding Unused Files The Find Unused Files algorithm always analyzes a whole project for files that are not used at all Files that are not in use can eihter directly be removed with the help of the ReDHead Suggestion Dialog or marked as unused in the corresponding files As an example consider the following C project T CIC Projects 3 N A main cpp 53 ih Unused h h Used h a ae include Used h YV tS TestProject int main D a Includes do something gt main cpp gt R Unused h When running the Find Unused Files algorithm the file Unused h will be proposed to be removed as can be seen in the screenshot E main cpp on Unused h A h Used h ifndef UNUSED H define UNUSED H_ a class Unused 69 10 13 7 3 6 Static Code Coverage The Static Code Coverage algorithm analyzes a whole C project and helps the pro grammer gain insight into which parts of his code are effectively used and which are not Compared with other code coverage tools ReDHead does not need to run the code itself to find unused parts The Static Code Coverage algorithm can run in two di
97. g language is required Since I am not the first person which is in need of such an abstract code representation I had a look at existing concepts which could be helpful in performing static include analysis 2 1 AST The first abstract code representation I want to talk about which is used often for code analysis is the Abstract Syntax Tree short AST The AST is a tree data structure which contains nodes that represent C C language constructs This can for example be statements class declarations variable names etc An AST is a very capable construct to find certain node types This is one thing that is needed when performing static analysis However this is only the first step to perform static include analysis Imagine that when one encounters the following statement in an AST MyClass myClass The instantiation contains the name MyClass which refers to a class declaration that is located somewhere else So MyClass is a declaration reference To perform detailed static analysis it is required to find the declaration of exactly that class Finding this declaration reference dependency however goes beyond the possibilities of an AST In the following Section 2 2 we will be looking for a solution of that problem 10 2 2 System Dependence Graph To fill the gap revealed in the previous section one can use the code representation of the system dependence graph short SDG which will be introduced in this section
98. gh This results that in this case getRequiredDependencies of the class ReDHeadDeclarationReference will return the dependency to the forward decla ration in the file Xfwd2 h Note here that this dependency to the forward declaration is preferred over the one in the file Xfwd1 h because Xfwd1 h is not included into main cpp 43 1X h 1 class X Xfwd1 h 1 class X Declaration Reference Dependency Xfwd2 h gt Include Dependency 1 class X imain cpp 1 include X h 2 include Xfwd2 h 3 X x Figure 5 5 Only Required DeclarationReferenceDependencies Namespaces in C are a somewhat special construct because they can be defined several times Initially this leads to the problem that the algorithms find unused includes proposed to include all the files which contain a definition of a used namespace which of course is wrong The solution of the problem in the ReDHead data structure is that only these definitions are referenced which are also used What used here means can be seen in Figure 5 6 The one dependency to file unused h is not returned because it is also not used ilbar h 1 namespace X 2 void bar 3 foo h gt Declaration Reference Dependenc X 1 foo 2 Us y gt Include Dependency Iifoo h 1 namespace X 2 void foo 3 unused h 1 namespace X 2 void unused 3 aa pes 4 eH wo I ear imain cpp 1 include bar h
99. h bar1 bar cp p usedDeclarationList main main cpp global main main cpp global main cpp main cpp A A h fool A h A A cpp main main cpp global bar2 bar h main main cpp global bar2 bar h main main cpp global main main cpp global main main cpp global main cpp A A h fool A h A A cpp main cpp A A h fool A h A A cpp main cpp A A h fool A h A A cpp bar2 bar h fool A cpp main cpp A A h fool A h A A cpp bar2 bar h fool A cpp main cpp A A h fool A h A A cpp bar2 bar h fool A cpp bar2 bar cpp main main cpp global main cpp A A h fool A h A A cpp bar2 bar h fool A cpp bar2 bar cpp main main cpp global main cpp A A h fool A h A A cpp bar2 bar h fool A cpp bar2 bar cpp Additional Information Function main uses A A h fool A h A A cpp These are removed from unusedDclarationList and added to activeDeclarationList and usedDeclarationList Definition global uses bar2 bar h which is removed from unusedDclarationList and added to activeDeclarationList and usedDeclarationList Class declaration A does not refers anything so nothing is added or deleted Member function declaration foo1 refers fool A cpp which is removed from unusedDclarationList and added to activeDeclarationList and usedDeclarationList Constructor definition A does not refers anything so nothing is added o
100. hat there are actually two classes which are used to represent include directives in CDT which can both be used for my cause Sadly they do not share any common interface or super class Also the Figure 5 1 is not completely correct here because it makes the reader believe that the IIncludeStatement gives access to the included ReDHeadFile instance dashed arrow edge which of course is not true since CDT has no relations to ReDHead Instead of directly giving access to the included ReDHeadFile a helper class method takes care of that task 5 5 Include Paths Include paths are as first mentioned in 1 4 an ordered set of include relations Very important here is the fact that in the ReDHead data structure an include path is defined as the path that leads out of any file to a target file The originating file however is no part of the path So the IncludePath instances in Figure 5 10 are all considered equal This allows for example to easily compare include paths independent of the file which contains them Q File Defining Type A File Using Type A Declaration Reference Dependency Include Include Path Figure 5 10 Equal Include Paths Example 47 6 Implementation This chapter describes implementation details which are worth mentioning To give an overview of the whole plugin the class diagram of the important classes is shown in Figure 6 1 For detailed description of the classes shown in the diagram read the fol
101. he menubar path defined in line 15 binds the menu entry to the menu which is defined in line 7 and 8 The enablement which starts in line 17 represents a binary expression and causes the menu entry to be disabled if the current selection should not be in one of C Editor or on any C file or project The second extension which can be seen on line 36 causes the menu to only be shown when the C perspective is active B 1 2 Extending the Navigator s Pop up Menu The XML extension point which adds a menu to the pop up which shows up when right clicking on items takes effect on all of the following Eclipse views e C Explorer e Project Explorer e Navigator 102 12 18 21 24 27 30 33 36 39 42 Demonstrated in the following listing is the extension point org eclipse ui popupMenus lt xml version 1 0 encoding UTF 8 gt lt eclipse version 3 4 gt lt plugin gt lt extension point org eclipse ui popupMenus gt lt objectContribution id ch hsr ifs redhead ui popupMenu objectClass java lang Object gt lt menu id popupReDheadMenu label ReDHead Static Analysis path additions gt lt separator name StaticAnalysisSeparator gt lt menu gt lt action class ch hsr ifs redhead ui actions FindUnusedIncludesAction enablesFor 1i icon icons unusedInclude gif id ch hsr ifs redhead actions FindUnusedIncludesAction label Find unused includes menubarPath popupReDheadMe
102. ibed in the pseudo code in Section 4 3 the directly include referenced declarations algorithms should propose includes which become useless after applying all suggestion solutions as removable Make find unused includes compatible to directly include referenced declarations The problem here is demonstrated on the graphs in Figure 10 1 When applying the directly include referenced declarations on the graph in the left hand side of the figure one is suggested to add an include to the file labeled C After applying the suggestions solution one gets the graph in the right hand side of the figure When now running the find unused includes algorithm on the graph on the right hand side it proposes to remove the just added include to the file labeled C again 84 I I l Declaration Reference Dependency Include Include Path Figure 10 1 Compatibility Problem Both the algorithms work perfectly right Thus both the suggestions to once remove and once add the include are no false positives when only considering the task of each of the applied algorithms Nevertheless find unused includes should be configurable to not propose includes which directly include a referenced type to be removed Organize includes should also propose forward declarations The organize includes algorithm suggests to add additionally required include directives Whenever it is enough it suggests to include a file that only contains a forward decla ration
103. ing unused includes in source and header files In complex systems it is practically impossible to detect if an include directive is still in use or not The aim of ReDHead is to assist the user in this challenge and heavily reduce time consumption 59 The help section Usage 7 2 describes how the ReDHead plugin can and should be used and how that is achieved The section ReDHead Code Analysis Algorithms 7 3 gives a detailed description about the available algorithms and the suggestions they propose The last section How ReDHead Analysis Works 7 4 gives a general insight into what the ReDHead Plugin does and how static include code analysis is achieved in Eclipse CDT 7 2 Usage This section of the ReDHead help shows how a programmer can make use of the include structure optimization algorithms provided by the ReDHead plugin The description includes information on how to trigger a ReDHead algorithm and on how the outcome is presented to the user Running a ReDHead Algorithm A ReDHead algorithm can be started in two different places Either through the main menu bar entry which is visible when the C perspective is active C C TestProject main cpp Eclipse SDK rch Project WGE UBS ler ries Run Window Help G Q Auto Organize Includes Shift Ctri O Organize Includes ka Directly include referenced declarations o NE includes main cpp 3 include Find unused includes in project Find unused files in
104. ion of figures in this documentation which for the integrity of the documenta tion could be set aside are either completely or in a reformulated also contained here The following two list identifies such sections e Adapted Paragraph describing Figure 1 1 and the figure itself e Unchanged JDT feature list in Section 1 5 e Adapted Section 3 1 e Adapted and extended Section 4 6 e Unchanged First three paragraphs of Section 6 2 e Adapted Section 6 3 e Unchanged Section 6 5 extended with Subsection 6 5 1 e Unchanged Section Vast Amount of CDT and Eclipse Code and Resolving Decla ration References in Chapter 9 e Adapted Section 10 2 2 e Unchanged Appendix Section A 1 and A 2 and A 4 e Adapted Appendix Section A 3 e Unchanged Appendix Section B 2 and B 3 e Unchanged Appendix Section C 1 116 D Nomenclature Annotation An annotation as we use the expression here is highlighted text within an editor Note that there are also Java Annotation What I refer to as annotations here are not Java Annotations An Annotation can not only highlight text with a background color but also with an icon or by underlining the text with sinuous lines AST Abstract Syntax Tree CDT C Development Tooling Ecla Completion proposal Other term for quickfiz Declaration The term declaration refers to C type function or variable declarations and so on Declaration Reference A declaration reference denotes for example be a
105. ions of ReDHeadDeclarations was reduced to less than 30 Further optimal improvements have been achieved by optimizing the iterations during the creation and traversal of the ReDhead data structure To give an example ReDHead now often uses iterators of Java collection which allow to remove the current element instead of collecting elements to remove in order to prevent a ConcurrentModification Exception Also a lot of bugs have been fixed among which the most severe one caused the memory consumption to raise exponentially Also noteworthy here is that all the measurements have been taken directly after the the testing Eclipse instance was started Due to caching in the CDT indexer if one re runs an algorithm after a first launch the execution time is lowered again by 35 to 50 percent 9 4 Preprocessor Problems While creating ReDHeadDeclarationReferences of a given file ReDHead needs to lo cate all IASTName instances which can be found in the AST of the file This process proved to be more complex than I initially thought of because it is quite complicated to get all these IASTNames Most names can be found easily enough by using a visitor to traverse the AST When it comes to preprocessor nodes which also are represented inside of the AST the visiting behavior of CDT becomes a bit strange because preprocessor related names for example macro function calls or preprocessor symbols are not visited Some instances like for example IAST
106. is a very strange way to offset crawl a given IASTPrepro cessorStatement to reach the contained IASTNames Noteworthy here is that the IASTNames returned by getNestedMacroReferences in listing line 11 are not found by the selector object and thus need to be considered as shown in the listing Vast Amount of CDT and Eclipse Code To find certain information it could not be avoided that I had to browse the vast amount of CDT s and Eclipse s implementation In general this code is well designed and also easy to read and understand The crux however lies in its extent There is so much code that one can spend hours trying to find a certain thing Often one then discovers that had one looked at the other code section first it would only have taken some minutes to find Such incidences can probably not be avoided but can still be very unnerving Resolving Declaration References Resolving a declaration reference to its declaration means in the CDT code to resolve one IASTName to another IASTName To do so in the normal cases is very simple I however soon discovered that there are many different ways to do so and that for many different scenarios another approach is required to resolve the dependency After inspecting a lot of CDT code I found that the method runOnAST in the class OpenDeclarationsJob was the right spot to begin searching What I discovered there was that the steps performed to find the target IASTName is a huge mess of di
107. it is also possible to remove all static code coverage annotations from a project Find unused files in project N Static Code Coverage XX Delete Static Code Coverage Annotations N Show Analysis Result In Dialog XX Remove all Static Analysis Markers Codan Integration In the subsection CDT Codan Integration 7 2 1 you can find a description on what the Codan framework is on the integration of ReDHead into the Codan framework and how the integration can be configured 7 2 1 CDT Codan Integration CDT Codan cod09 which stands for code analysis is the CDT framework for static code analysis If provides other Eclipse plugins such as ReDHead with the possibility to analyze C code while the user is typing The Codan framework cod09 allows to analyze C code while the user is typing ReDHead brings along an Organize Includes implementation for the Codan framework So as long as the ReDHead Organize Includes 64 Codan implementation is active markers which propose to add and remove include directives will appear automatically when C programmer is typing Configuring Codan The Codan Organize Includes can be enabled and disabled under Window gt Preferences gt C gt Code Analysis gt ReDHead Organize Includes 7 3 ReDHead Code Analysis Algorithms This help sections describes each ReDHead algorithm provided in a subsection To find out how to use ReDHead algorithms read section Usage 7 2 This ReDHead H
108. k it as used Both latter solutions of course are suboptimal and would slow the ReDhead algorithms execution Make ReDHeadSuggestionStore persistent Suggestions that are created by a ReDhead algorithm get stored in the ReDHeadSug gestionStore At the current time these stored suggestions get lost when Eclipse is shut down A useful enhancement thus would be to make these suggestions persistent so that possible suggested solutions could still be applied after Eclipse was restarted Check for Includes to add when proposing to remove one The problem at hand here was already explained in the the description of the algorithm directly include referenced includes 4 3 Suppose a code file A relies upon B to include C In the case that the remove unused includes algorithm proposes the include to C to be removed because it is not required anymore there A will not have an include path to C anymore So an improvement for the find unused includes algorithm would be to also propose to add an include to in file A alongside of the proposal to remove the include to C from B Complete directly include referenced declarations The directly include referenced declarations algorithms proposes to add include direc tives to all the files which contain referenced declarations When applying all these suggestions the situation might arise that some of the already existing includes in the file under consideration are not needed anymore So as already descr
109. lass CoverageTest2ManyFiles extends ReDHeadTest public CoverageTest2ManyFiles final String name final Vector lt TestSourceFile gt files super name files Override protected void runTest throws Throwable StaticCoverageAnalysisAlgorithm algorithm new StaticCoverageAnalysisAlgorithm List lt ReDHeadSuggestion gt suggestions runAlgorithms algorithm assertEquals 3 suggestions size String expectedTextInUse This declaration is in use through the file A cpp String expectedTextNotInUse This declaration is not in use through the file A cpp assertSuggestion suggestions get 0 A cpp expectedTextInUse 39 41 assertSuggestion suggestions get 1 Used h expectedTextInUse 0 12 assertSuggestion suggestions get 2 Unused h expectedTextNotInUse 0 17 The shown test first instantiates the algorithm StaticCoverageAnalysisAlgorithm which it wants to test Then it runs the algorithm and retrieves the proposed ReDHead Suggestion In the second part of the test attributes of these suggestions are tested on their correctness 57 6 5 1 External Include Directories Per default testing projects do not contain any references to external include directories This also includes the C standard library This makes sense because the testing project would become machine dependent since there might be different library version which are installed in different locations Nevertheless tests
110. lerPreferenceValue true presentationLayer 3 textStylePreference Value NONE symbolicIcon warning icon icons unusedInclude gif label Unused Include Directive textPreferenceValue true textPreferenceKey indexResultIndication verticalRulerPreferenceKey indexResultIndicationInVerticalRuler overviewRulerPreferenceKey indexResultIndicationInOverviewRuler showInNextPrevDropdownToolbarActionKey lt isIndexResultInNextPrevDropdownToolbarAction showInNextPrevDropdownToolbarAction true isGoToNextNavigationTargetKey isIndexResultGoToNextNavigationTarget lt isGoToNextNavigationTarget false isGoToPreviousNavigationTargetKey isIndexResultGoToPreviousNavigationTarget isGoToPreviousNavigationTarget false gt lt specification gt lt extension gt lt extension point org eclipse ui editors annotationTypes gt lt type markerType ch hsr ifs redhead unusedincludemarker name ch hsr ifs redhead unusedincludeannotation gt lt type gt lt extension gt lt plugin gt Creating a customized marker requires three things The first is the definition of a new marker listing line 5 The second thing is the definition of a the marker s look 105 12 18 line 12 The last thing is to link the new marker look to the new marker definition line 38 The new icon which is shown with our new maker can be found on line 23 In the marker definition above there is a marker id defined for the new m
111. lowing sections org eclipse IWorkspace ICProject IFile ae lt lt package gt gt e lt package gt gt redhead ui redhead dependency ReDHeadAction 2 IncludePath 1ASTPreprocessorincludeStatement IIndexinclude Figure 6 1 ReDHead Class Diagram 6 1 Used CDT Functionality The ReDHead data structure which was introduced in Chapter 5 gets constructed with information which is delivered from CDT components The important part of these components are introduced here 6 1 1 AST Abstract Syntax Trees in CDT are instances of IASTTranslationUnit These grant access to IASTNode that represent declarations includes macro definitions and so on To find certain node type one can use node visitors to traverse AST s Instances of 48 IASTTranslationUnit can be constructed with the help of a filename and the CDT helper class CoreModelUtil 6 1 2 Indexer The CDT indexer is a very important CDT component since it provides the ReDHead data structure with the functionality to find the dependencies between declaration ref erences to declarations Concretely it can resolve IBindings which for example can be acquired by the AST nodes of type IASTName as described in 6 1 3 Also the CDT indexer can produce lists of all transitively included files that are included into one given file IIndex instances can be acquired through the CDT plugin class CCore
112. mas Reps Integrating non interfering versions of programs ACM Transactions on Programming Languages and Systems 11 345 387 1989 Susan Horwitz Thomas Reps and David Binkley Interprocedural slicing using dependence graphs ACM Transactions on Programming Languages and Systems 12 26 60 1990 Institute for Software HSR University of Applied Sciences http r2 ifs hsr ch cdtrefactoring CDT C Refactoring Homepage British Standards Institute The C Standard Incorporating Technical Corrigendum No 1 John Wiley amp Sons Hoboken NJ USA 2003 John Lakos Large scale C software design Addison Wesley Longman Publishing Co Inc Redwood City CA USA 1996 L Larsen and M J Harrold Slicing object oriented software Software En gineering International Conference on 0 495 1996 Microsoft Corporation http www microsoft com visualstudio en us Mi crosoft Visual Studio Homepage Microsoft Corporation http office microsoft com en us visio Visio Homepage Bjarne Stroustrup The C Programming Language Third Edition Addison Wesley Longman Publishing Co Inc Boston MA USA 1997 Sun Microsystems Inc http hudson ci org Hudson Homepage Texlipse sf net community http texlipse sourceforge net TeXlipse Home page Ubuntu Community http www ubuntu com Ubuntu Homepage Mark Weiser Reconstructing sequential behavior from parallel behavior pro jections Inf Process Lett 17 3 12
113. n login gt AuthType Basic AuthName trac AuthUserFile etc apache2 trac passwd Require valid user lt Location gt The rest of the config shown here allows access to the resource files which are published to var www ReDHeadFiles by the automated build lt Location ReDHeadFiles gt SetHandler file lt Location gt lt VirtualHost gt Since Hudson runs an independent application it needs its own port My Hudson runs on port 8080 which is not reachable from the outside To make Hudson accessible from the outside I use Apache s mod_proxy as a reverse http proxy The following listing shows the mod_proxy configuration located in etc apache2 mods enabled prozy conf lt IfModule mod_proxy c gt ProxyPass hudson http localhost 8080 hudson ProxyPassReverse hudson http localhost 8080 hudson ProxyRequests Off ProxyPreserveHost On lt Proxy gt Order deny allow Allow from all lt Proxy gt The crux here is to get Hudson itself running not on but on hudson To ac complish this the argument prefiz hudson needs to be added the to the hudson script located in etc default hudson 93 HUDSON_ARGS webroot var run hudson war prefix hudson A 3 Automated Building of the ReDHead Plugin and Its Tests To build and test my plugins on the ReDHead server automatically with Hudson a script is needed which is capable of first building the ReDHead plugins
114. n the listing above the correct insert position is not after the second include directive but after the endif statement However that one should never insert inside of a conditional compilation block is not true in a single case This is the case with include guards ifndef A_FILE_H define A_FILE_ H include Qnething h include Anotherthing h class AClass endif Inserting after the endif here is wrong because it belongs to the include guard The listing above brings us to the next restriction One of course should never insert after declaration here AClass So the point to insert in the listing above would be is obvious place after the last include directive and before the AClass declaration In the case that there is an include guard but no include directives then the point to insert further includes is directly after the definition of the include guard s symbol A sample of the conflicting case can be found in the following listing ifndef A_FILE_H define SOME OTHER THING 7 include QOnething h include Anotherthing h class AClass endif In this case there is no valid include guard which means that there is only a conditional compilation block Firstly we want to insert after that conditional compilation block 38 But secondly we cannot insert after the first declaration in the code The tradeoff which is taken here is to insert at the beginning of the document 4 9 I
115. nclude Substitution In the case of certain external library header files the process of algorithms which propose to add includes is sometimes to explicit I will demonstrate the problem with an example from the C Standard Library Imagine the following C code int main std vector lt int gt v return 0 When running find unused includes adding include lt bits stl_vector h gt would be suggested This suggestion is not wrong at all since the class vector is declared in exactly that file But still a software engineer would be surprised by the suggestion since one expects the proposal of include lt vector gt To solve the problem at hand ReDHead algorithms do before suggesting to insert includes check a given list of include substitutions which in cases as the one mentioned above yields the correct include proposal This list was not created by hand but is rather generated automatically from the given C standard library header directory 39 5 ReDhead Data Structure The algorithms described in the previous Chapter 4 are as already mentioned based on the the ReDHead data structure of which the functionality is describe in the current chapter in detail The ReDHead data structure is an implementation of the ReDHead graph that was described in Section 2 3 Basically the ReDHead data structure gives access to the vertices and edges that were described with the ReDHead graph Also this chapter will co
116. ns which are used in a given file This strategy is closely related with Java s 28 L gt File Defining Type A File Using Type A Declaration Reference Dependency Include Include Path Figure 4 5 Problematic Example Graph File Defining Type A File Using Type A Declaration Reference Dependency Include Include Path Figure 4 6 Unproblematic Example Graph 29 12 18 way of importing classes since every class that is used needs to be imported directly in Java s code files To refactor towards this goal one the one hand one needs to add include directives for every not yet directly included declaration On the other hand one needs to remove include directives which do not include any used declarations directly The following listing shows the algorithm s pseudo code includeList findListOfIncludes presentFile includesToRemoveList includeList if existsFileNameCorrelatingHeaderFile presentFile fileNameCorrelatingHeaderFile getFileNameCorrelatingHeaderFile presentFile addToList includeList findListOfIncludes fileNameCorrelatingHeaderFile declarationRefList filesToIncludeList findAllDeclarationReferences presentFile emptyList for declarationRef in declarationRefList declaration findeDeclaration declarationRef referencedFile getFile0fDeclaration declaration if not containedInList includeList referencedFile addToList filesToIncludeList r
117. ns will be visualized either in the ReDHead suggestion dialog or they will be shown in the C editor in the form of markers or annotations ReDHead Suggestion Dialog The ReDHead suggestion dialog will show up with almost all algorithms but only if the Show Analysis Result in Dialog checkbox menu entry is enabled Note that this checkbox menu entry is only available through the main menu bar entry first screenshot above fc main cpp z include lt vector gt int main st N ReDHead Suggestions x i ReDHead Static Include Analysis The runnded ReDHead static included analysis algorithms produced 2 suggestions File main cpp Actions to take 0 Missing include lt string gt usr include c 4 4 string Select the action that will be taken for the proposed suggestions for C The include statement include lt vector gt is unneeded Checked Suggestions All Suggestions O None None O Add markers Add markers Apply Quickfix Apply Quickfix Cancei OK The ReDHead suggestion dialog will not show up when running Static Code Coverage or Auto Organize Includes In the suggestion dialog one can choose how to proceed There are options to apply to all the suggestions or only to these that are checked in the suggestion tree on the left side 62 ReDHead Suggestion Markers ReDHead suggestion makers show up in the editor as normal markers with one of ReD
118. nt which can be accessed through static methods of the Eclipse class ResourcesPlugin Access to ReDHeadFiles is granted by ReDHeadProject instances which themselves can be found with the help of the ReDHeadWorkspace The ReDHeadWorkspace is ac 50 cessible through the Singleton instance of the ReDHead plugin see Singleton pattern in GHJV97 Each ReDHeadFile contains the AST that represents the file This AST is used either by the file itself to construct ReDHeadDeclarationReference and to return includes or for individual analysis by any ReDHead algorithm C Element Classes The classes in the package redhead cxxelements encapsulates a certain CDT AST or index element Information that is required by ReDHead algorithms considering these classes can be accessed there ReDHeadDeclarationReference A ReDHeadDeclarationReference can be seen as an abstract AST node representation It represents a C C identifier that can be resolved to a ReDHeadDeclaration which makes the AST node that it contains an instance of IASTName ReDHeadDeclaration References are accessible through a ReDHeadFile which creates ReDHeadDeclaration Reference instances by traversing the file s AST for certain JASTName instances The ReDHeadDeclarationReference to a collection of DeclarationReferenceDepend encies A ReDHeadDeclarationReference which represents a C namespace name is in stantiated as the subclass NamespaceDeclarationReference which uses an
119. nt editor I used the well known Eclipse Platform Eclb CDT refactoring test editor To write down automated testing C C code I used the refactoring test editor which was developed by Emanuel Graf TeXlipse Eclipse Plugin As ATX editor the Eclipse plugin TeXlipse Tex was used texlive The TexLive Ubuntu packages are required to compile this ST f X documentation Visio To draw a part of the code diagrams I used Microsoft Visio Micb Dia All class diagrams were drawn with the open source tool Dia Dia as well as some other of the diagrams contained in this documentation 110 C 1 2 ReDHead Build Server The ReDHead project server can be found at http redhead ifs hsr ch The fol lowing components are running on the server Ubuntu The server operating system is a Ubuntu 8 04 3 LTS Ubu Git Git Git version 1 6 4 3 runs on server side to store the repository Apache Apache version 2 2 is used as web server Apab Trac As a project management environment I use Trac version 0 11 Edg texlive The TexLive packages are required to compile the AT X documentation Tex Hudson The build server in use is Hudson version 1 323 Sun Ant Building of the ReDHead plugin is done with the help of Ant tasks Apaal C 2 Project Plan I this section you can find two versions of the ReDHead project plan of this master thesis Figure C 1 shows the initial project plan that was created in the frist week of the master thesis
120. ntain ReDHead graph images which are used to describe special code scenarios Note that this chapter gives no internal implementation details of the ReDHead data structure More information about internal implementation details can be found in Section 6 2 in the following Chapter 6 The following class diagram in Figure 5 1 shows an overview of the of the important classes that are contained in the ReDHead data structure lt lt package gt gt redhead Logicaldesign ReDHeadDeclaration ReDHeadDeclarationReference i lt lt package gt gt redhead Logicaldesign dependency i DeclarationReferenceDependency i lt lt package gt gt redhead physicaldesign ReDHeadWorkspace ReDHeadProject ReDHeadFile lt lt package gt gt org eclipse cdt core H IIncludeStatement lt lt package gt gt redhead physicaldesign dependency IncludePath Figure 5 1 ReDHead Data Structure Each of the components in the redhead logicaldesign package grant access to the component s name as well as to location information which consists of file offset line number line offset and length The components in the redhead physicaldesign pack age can each yield information about its location meaning a file or folder path 40 How the ReDHead data structure can be used to traverse the graphs which were introduced in Section 2 3 and Chapter 4 can be seen in Figure 5 1 by following the diagram s edges
121. nu StaticAnalysisSeparator tooltip Finds unused includes and marks them gt lt action gt lt visibility gt lt or gt lt objectClass name org eclipse cdt core model ITranslationUnit gt lt and gt lt objectClass name org eclipse core resources IProject gt lt objectState name nature value org eclipse cdt core cnature gt lt and gt lt objectClass name org eclipse cdt core model ICProject gt lt and gt lt objectClass name org eclipse core resources IResource gt lt or gt lt objectState name extension value cpp gt lt objectState name extension value c gt lt objectState name extension value h gt lt objectState name extension value hpp gt lt or gt lt and gt lt or gt lt visibility gt lt objectContribution gt lt extension gt lt plugin gt The sample action which is added here will show up in the pop up menu as can be seen in Figure B 2 The label of the menu that is added can be found in code line 11 together with the menu that is defined in line 9 The menu is added to the separator group extentions which by default already exists The label of the added menu entry together with the icon definition and the class which will be instantiated and run when the menu entry is clicked on can be found inside of the action definition starting on line 15 In line 21 103 10 13 10 6 TestProject b ia Includes New 7 D main cpp Go Int
122. o ReDHead Static Analysis M XK Find unused includes Configure gt Properties Alt Enter Figure B 2 Example Pop up Menu Entry the action is coupled to the menu defined in line 9 and its contained separator line 13 Note that the visibility which starts in line 25 defines a boolean expression This expression is used to determine if the added menu is visible for a given selected item B 2 Example Problem Marker The example shown here creates and removes a dummy marker into the file opened in the active CDT editor To create a marker put the following code into the run method of an action class See Section B 1 to find out how to add actions to Eclipse menus public void run final IAction action CEditor ceditor CEditor PlatformUI getWorkbench lt getActiveWorkbenchWindow getActivePage getActiveEditor IResource resource ceditor getInputCElement getResource try IMarker marker resource createMarker ICModelMarker C_MODEL_PROBLEM MARKER marker setAttribute IMarker MESSAGE myDescription marker setAttribute IMarker SEVERITY IMarker SEVERITY_ERROR marker setAttribute IMarker CHAR_START 3 marker setAttribute IMarker CHAR_END 5 catch CoreException e System err printin Ups exception with sample marker e printStackTrace Removing all existing markers can be achieved with the following code also in the run method of an Action class public void run final IAc
123. odan checker class is registered listing line 7 Its impelen tation is shown in the following listing Beside the checker itself additionally a problem type is registered code line 11 public class ExampleChecker extends AbstractIndexAstChecker private static final String ER_ID ch hsr ifs redhead codan problems ExampleProblem Override public void processAst IASTTranslationUnit ast reportProblem ER_ID ast getDeclarations 0 This declaration is a problem reportProblem ER_ID ast getDeclarations 2 This one as well The example checker shown above calls the reportProblem method twice each time reporting a given AST node as a problem This will cause Codan to mark these two nodes in the editor 108 B 5 Undo Redo Operations When ReDHead applies a quickfiz normally TextChangees are applied on IDocument instances These operation automatically add IUndoableOperations to Eclipse which allows the user to undo and redo the applied text change with the help of the Edit menu If one wants to add such undo and redo behavior for any custom operation this can be done as shown in the following example listing The custom operation which shall be undoable and redoable in the listing could for example be a change which deletes an other file In this case the Change argument passed into addUndoRedoOperation of in the listing would be a DeleteFileChange instance private void addUndo
124. of C is added in pseudo code line 10 Why the include to A is proposed for removal in code line 15 can in detail be read in the description of the find unused includes algorithm in Section 4 1 4 3 Directly Include Referenced Declarations The directly include referenced declarations algorithms is based on the idea one should not rely on included header files to again include others To clarify this an example is given In the graph on the left hand side in Figure 4 5 one sees that both the files labeled A B C and A use the declarations B and C If for any reason the file labeled A now looses the dependency to C and the include to C gets thus removed graph in Figure 4 5 on the right hand side there will be no available include path to C which results in compiler error in the file labeled A B C This happens even if the file labeled C has not been changed at all In Figure 4 6 the same situation is given again but with additional includes from the file labeled A B C to B and C Here the problem at hand will never arise when the include from A to C gets removed So by directly including the referenced declarations B and C the transitive include coupling to A can be reduced The C code at hand will get less error prone to changes in the include structure This idea which was introduced above can also be found described in Lak96 on page 113 So the current algorithm allows the user to automatically suggest the inclusion to declaratio
125. of a referenced type Instead of suggesting this it would be even better to just propose to add the forward declaration itself instead of including one It is obvious that this would not reduce the coupling of the file current file Organize includes should also propose the removal of forward declarations The last section suggested to improve the organize includes algorithm by the capability to propose the insertion of forward declarations of types Furthermore it would also be useful if it could propose forward declarations to be removed in the case when it is not used anymore Once it does one should consider to change the algorithms name because it does not exactly organize only includes anymore Not adapt the CDT indexer automatically During the testing phases of the ReDHead plugin it became clear that adapting the CDT indexer automatically as described in 9 1 when a ReDHead algorithm is run is not the optimal solution The machine on which I developed ReDHead has a solid state hard drive SSD which is about tree time faster than a normal hard drive Due to this I underestimated the time which is required to 1 rebuild the index and 2 check if a index adaptation is even necessary When testing on an older notebook with projects that link to include directories of several external libraries it came clear that adapting the index takes far more than a minute Also testing if an adaptation is necessary alone took almost up to a minute
126. org bugs show_bug cgi id 296906 Sometimes it was not pos sible to resolve the operator lt lt for the indexer This bug reported the problem https bugs eclipse org bugs show_bug cgi id 306819 The problem describe here is to find all AS TNames also inside of ASTPreprocessorStatements which was already described in 9 4 Note that this problem was and will not be fixed https bugs eclipse org bugs show_bug cgi id 312399 This bug was reported be cause when clicking on a name after an undef statement a NullPointerException was thrown https bugs eclipse org bugs show_bug cgi id 316931 Assumed there is a function that takes an int argument The function is first declared where the argument s name is j and then defined where the name is i When asking the indexer to resolve the declaration s name j the result was that it returned a reference to i in the definition but not to j itself as it normally would https bugs eclipse org bugs show_bug cgi id 316836 This bug is the one that has already been discussed in 9 2 that concerns the synchronism of AST and indexer in the case of modified not yet saved files This problem has not been solved C 5 Personal Impression My overall personal impression on the ReDHead project is very positive I was able to gain even more experience on how to work in software projects The big difference between this project and predecessor project like my diploma thesis CFS06 was that I
127. ossible where an incomplete type is suf ficient More information about incomplete typess can be found in the C Standard Ins03 The following listing shows a pseudo code implementation of the algorithm declarationRefList findAllDeclarationRefs presentFile declarationSet findAllReferencedDeclarations declarationRefList for declaration in declarationSet refsToDeclarationList findRefsToDeclaration declarationRefList declaration if canReplaceWithForwardDeclaration declaration refsToDeclarationList includesToRemove findRemovableIncludesTo declaration normally only 1 proposeForRemoval includeToRemove proposeToAddForwardDeclaration declaration Note that the declaration which are found in pseudo code line 2 are added to a set which implies that even if there are several different declaration references to the same declaration only one of them will be contained in the set This is enough to represent the relation to the declaration 35 10 The function canReplaceWithForwardDeclaration checks several conditions which needs to be satisfied so that an include can be replaced with a forward declaration These conditions are introduced in the following listing canReplaceWithForwardDeclaration refsToDeclarationList declaration if not areUsedInOnlyDeclaration refsToDeclarationList and not areOnyPointerAndRefs refsToDeclarationList return false includePathList findPath
128. overage algorithm are in real normal Eclipse markers The only difference to others is that annotations have a more customized appearance So instead of adding marker icons they just highlight text 6 4 1 Codan Above the extension point org eclipse cdt codan core checkers was mentioned This ex tension point is used to register the organize includes algorithm as a Codan checker This causes the algorithm to also be triggered while the user is typing 6 4 2 Problem Feedback Sometimes ReDHead algorithms encounter problems during execution Such problems are logged and also presented to the user by using the logging functionality of Eclipse To log such problems IStatus instances are created which are passed to this logging functionality These statuses are then 1 presented to the user after the ReDHeadOptimization Runner stopped running and 2 added to the Error Log view of Eclipse 6 5 Testing In modern software engineering one uses automated tests which are run repeatedly when adding or changing code During development the tests are expanded so they also cover newly implemented functionality In this section I describe the ReDHead testing framework and show how a sample test looks like A ReDHead test normally runs the following steps 1 Setup a C project to test on 2 Add some testing source and header files 3 Run a method under test which might be e Run an algorithm e Access elements of the ReDHead data structur
129. plementation as used Of course this is not completely correct here The reason for this is that the the CDT AST does not contain a IASTName node which refers to the constructor calls One could argument that such a IASTName is obviously not be required because the name of the second constructor is not used in the term x 42 Nevertheless without this IASTName there is no way to let the CDT indexer resolve the relation to the second constructor which clearly exists in the mentioned term The IASTName instance which is created in the initialization new X 7 6 however refers to the correct constructor The static code coverage algorithm now just treats all the constructors and also the destructor if one exists as used because otherwise all of these would be marked as unused which in almost every case leads to wrong results This approach is for certain scenarios not accurate enough The listing given above without the last code line is such a case 83 I see three possible solutions for this problem The first is to file a CDT bug and hope that they will introduce the additional mentioned IASTName to the CDT AST The second solution is to extend the ReDHead data structure that it manually finds such problems when constructing DeclarationReferenceDependencies The last solution would be to add special checking functions to the static code coverage algorithm which are aware of this problem and navigate manually to the required constructor to mar
130. r deleted Function declaration bar2 refers bar2 bar cpp which is removed from unusedDclarationList and added to activeDeclarationList and usedDeclarationList Member function definition foo1 does not refers anything so nothing is added or deleted Function definition bar2 does not refers anything so nothing is added or deleted Loop iteration ends here since activeDeclarationList is empty The result of the algorhtim run is that the declaration and definition for both foo2 and bari are unused and the rest of the project is in use 34 6 4 6 Replace Includes with Forward Declarations If a type gets used in a file and is only used as pointer or reference type there is no requirement to actually include the full declaration of that type Instead a forward declaration is enough include A h Example with an include directive class B pubiic void foo const amp A a J class A Example with a forward declaration class B public void foo const amp A a J The example given above demonstrates this The advantage of the second listing over the first one is that the coupling between the two involved files in the first listing is removed and that the compile time is diminished The idea behind the replace include with forward declaration algorithm is to find scenarios similar to the first listing and refactor them so they become like the second listing Note that forward declarations are always p
131. ram slice is defined as parts of a program with respect to program point p and variable x It consists of all statements and predicates of the program that might affect the value of x at point p To produce a slice a SDG is traversed along its control flow and data flow edges The static code coverage algorithm starts to slice with a given set of program point that are considered as active Here the focus is not on the data flow and the control flow edges but rather only on the function call edges as in Figure 2 1 As already shown in the Section 2 3 what is a call edge in an SDG is represented in the ReDHead graph as a declaration reference dependency So one might say that the ReDHead static code coverage algorithm is a more coarse grained slicing algorithm To find all active program points one must first see if the project under consideration produces an executable which implies that it has a main function or if it does not which makes the project as it will be called hence a library project In the case that there is a main function this function combined with all the public variables and static member variables make the collection of active program points In the case of a library project all declarations contained in any source but not header files are taken as active 32 Here the algorithm in pseudo code 1 unusedDclarationList recursiveFindAllIncludedDeclarations presentProject activeDeclarationList findActiv
132. rection UP gt transitive data flow edge S5 if current_direction UP S6 while current_floor floor amp amp current_floor lt top_floor CT add amp current_floor 1 else Key for Parameter Vertices S8 while current_floor floor amp amp o i current_floor gt 0 Al_in a_in current_floor C9 add amp current_floor 1 A2_in b_in 1 e _ pes Al_out current_floor a_out S10 printf d current_floor A3 int bins 1 Fl_in a a_in E11 add int a int b F2_in b b_in add int a in a S12 a a b Fl_out a_out a Figure 2 1 Elevator Example System Dependence Graph with Function Calls LH96 To basically understand the graph it is important to map vertices to code statements by using the labels on the left side of the source code e g EO on the first code line Solid edges in the graph depict possible program execution control flow Here I would like to specially mention the two function call vertices C7 and C9 which are connected by a function call edge to the function definition vertex E11 Dashed graph edges represent a program s data flow S1 for example is connected by a data flow edge to S6 because 11 S10 Ell S12 E13 S14 E15 S16 S17 C18 S19 C20 522 the value of the variable floor that is defined in S1 is used in S6 To understand the data flow in and out of a function call actual in e g Alin actual out
133. rmation 20 will be used to set up a powerful data structure on which static include analysis will be performed Each C project in Eclipse CDT already provides what was before referred to as compile configuration In the project properties the user can set preprocessor symbols manually if he wants to do so Since this information is by default used to build CDT ASTs this CDT compile configuration is in my point of view a very good point to start The include analysis of the file or project could also be done on the basis of a completely user defined compile configuration which can be set in a ReDHead properties page Concretely this would mean that the user can define if a preprocessor symbol is set or not set what include directories shall be considered and so on ReDHead would then analyze only these code parts which are active in means of the preprocessor symbols in the compile configuration Such a properties page would mean additional effort and does something very similar to what CDT already provides Hence I will not implement such a properties page 3 2 Ul Elements Invoking a static include optimizer is achieved through menu entries As an alterna tive for certain optimizers also the automatic invocation while the user is typing is a promising feature used by ReDHead This alternative can be achieved by using the CDT Codan framework cod09 The best point to start with static include analysis results in the perspective
134. ros Asya Kamsky Scott McPeak and Dawson Engler A few billion lines of code later using static analysis to find bugs in the real world Communications of the ACM 53 2 66 75 2010 Lee Badger and Mark Weiser Minimizing communication for synchronizing parallel dataflow programs In ICPP 2 pages 122 126 1988 Thomas Corbat Lukas Felber and Mirko Stocker Refactoring Support for the Eclipse Ruby Development Tools http r2 ifs hsr ch trac 2006 Codan CDT Code Analysis http wiki eclipse org CDT designs StaticAnalysis 2009 Dia Community http projects gnome org dia Dia Homepage Eclipse Foundation http www eclipse org cdt Eclipse CDT Eclipse Foundation http www eclipse org Eclipse Homepage Edgewall Software http trac edgewall org Trac Homepage L Felber ReDHead Refactor Dependencies of C C Header Files http redhead ifs hsr ch ReDHeadFiles ReDHead_EndOfTermProject pdf 2009 Martin Fowler Continuous Integration Martin Fowler Martin Fowler Refactoring Improving the Design of Existing Code Addison Wesley Boston MA USA 1999 Erich Gamma Richard Helm Ralph Johnson and John Vlissides Design Patterns Addison Wesley Professional July 1997 119 Git HPR89 HRB90 Ins Ins03 Lak96 LH96 Mica Micb Str97 Sun Tex Ubu Wei83 The Git Project http git scm com Learning the Java Language Susan Horwitz Jan Prins and Tho
135. rrent_floor_out current_direction_in current_direction current_direction current_direction_out top_floor_in top_floor top_floor top_floor_out alarm_on_in alarm_on alarm_on alarm_on_out _top_floor_in _top_floor floor_in 5 top_floor 10 l_top_floor 10 Figure 2 4 Main Function LH96 14 When taking into account the introduced graphs above we now have a SDG for object oriented C which allows us to solve the problem of resolving declaration reference dependencies by traversal However there are many points which mark the C SDG as suboptimal Unneeded edges Several edge types given in the C SDG are not important for static include analysis This includes the control flow the data flow the parameter in and the parameter out edges Costly graph traversals it will be very time consuming to traverse a whole graph for only trying to resolve one declaration reference dependency Vertices are not distinct A single vertex in the C SDG can contain several declara tion references Missing Physical Design Elements To perform static include analysis it is required that the vertices for physical design elemements namely source and header files are also represented in the graph This allows then to also introduce graph edges which represent include directives which are also required The C SDG only contains logical design elements at the moment A more detailed description on logical and physical design can
136. s hudsontrac hudsontracplugin hudsontracplugin enabled tracext git enabled git cached repository true git_bin usr bin git persistent_cache true shortrev_len 6 header_logo alt IFS Logo height 1 link http redhead ifs hsr ch src site LOGO_IFS_TRANSPARENT_RIGHT_COLOR _ENGLISH png width 1 hudson alternate_success_icon true feed_url http localhost hudson rssAll main_page http redhead ifs hsr ch hudson trac A a repository_dir home felu ReDHead git repository_type git Note that site refers to the path var lib trac ReDHead htdocs A 2 4 Apache Configuration The Apache web server is reachable on port 80 from the internet Apache redirects requests to Trac Hudson and delivers ReDHead resources The following listings show the apache config located in etc apache2 sites enabled trac 92 ou 11 14 17 20 ou The first part of the config shows the typical Trac apache config lt VirtualHost gt ServerAdmin lfelber hsr ch ServerName redhead ifs hsr ch DocumentRoot var www ErrorLog var log apache2 error trac log CustomLog var log apache2 access trac log combined lt Location gt SetHandler mod_python PythonInterpreter main_interpreter PythonHandler trac web modpython_frontend PythonOption TracEnv var lib trac ReDHead PythonOption TracUriRoot PythonOption PYTHON_EGG_CACHE tmp lt Location gt lt Locatio
137. s additional information is needed An overview over the components which provide this information is given in Figure 6 3 lt lt package gt gt redhead resources ReDHeadWorkspace j ReDHeadProject eee lt lt package gt gt redhead cxxelements ple _ KE NamespaceDeclarationReference ee ae DeclarationReferenceDependency IncludePath lt A FullincludePath i lt lt package gt redhead dependency ES TPreprocessorinciudeStatement IIndexinclude _ IName 1ASTTranslationUnit FirstLastElementincludePath A IIndexName IASTName e ReDHeadPlugin Figure 6 3 Complete ReDHead Data Structure Note that the dashed diagram relations require a helper class method in order to find the related component Also note that package names all begin with redhead This is short for ch ifs hsr redhead which is the base name of all ReDHead packages The following paragraphs describe the components contained in the class diagram The description contains explanation on what other components are required to construct a component and where ReDHead gets access to it Resource Classes Classes contained in the resource package redhead resources represent files or fold ers used in the ReDHead plugin All these classes are initiated with their correlating Eclipse equivale
138. s there is also a ReDHead testing feature which contains the ReDHead the ReDHead Codan and the ReDHead test plugins The PDE build process is configured in the file config build properties The values I had to set can be seen in the following listing topLevelElementId ch hsr ifs redhead tests feature buildDirectory results baseLocation results eclipse The original build properties file is a template which can be found in the extracted Eclipse instance in the eclipse plugins org eclipse pde build_ templates headless build directory It is quite big but setting the three properties shown above is sufficient 97 12 15 18 21 24 27 30 33 36 39 42 45 48 51 The following listing shows a shortened version of the build xml file which is responsible for building the ReDHead plugins installing them into Eclipse and running the tests lt project default build gt lt target name build depends init setup gt lt antcall target install redhead source gt lt antcall target zips gt lt antcall target tests gt lt antcall target publish test results gt lt target gt lt target name init gt lt sets many path file properties gt lt target gt lt 1 lt target name install redhead source gt lt copy todir buildDirectory features ch hsr ifs redhead feature gt lt fileset dir basedir Software ch hsr ifs re
139. s either a C file or project and scans the ReDHead data structure for certain parts of information it is looking for It could for example look for declaration reference dependencies which do no yield any include paths which means that we are missing an include directive In the end the algorithm assembles a collection of suggestions which are then presented to the user The necessary information on where an Eclipse editor marker should be placed or where a solution should be inserted can be obtained form the collected data structure elements 72 8 Market Analysis This chapter gives an overview of the market of all C static code analysis tools Note The list ranges form academical to commercial that only a part of these tools mentioned in the following table compete with the Eclipse ReDHead plugin Most of them do not provide static include analysis features but rather other static analysis features open source etc W amp L S S xX a o o gt SAAHA HAA HA HAA oH a a a e W amp L W amp L W amp L W amp L W amp L W amp L W amp L W amp L W amp L W amp L W amp L W amp L 73 n g g g o EI zie e o ia S g pj g y Sl a g g o S 5 amp Kol g S 9 T f A 2 fo a S T a S D Fy ge a A a 5 a 5 s F k E g S 5 S io o 5 2 a g o E v a 9 S g a ie 5 g 2 5 fe S g A 5 s ba gt amp O O A 8 O x x x x x x W v x v v x x x v x W amp L v x v v x x x v x W amp L x x v V x x x x x W v x v 7 onl
140. s on the previously mentioned C project Measurements were performed either on one specific single file or on the whole project The file contains 200 line of C code while skipping comments and empty lines The whole project consists of about 190 source and header files totaling to about 30 000 lines of C code The following tree charts give an overview on the performed measurements The first two Figures 9 1 and 9 2 show the execution first of the single file and then on the whole project The third Figure 9 3 shows the memory usage of both the file and project runs Each dot on a chart line represents a measurement Sometimes measurements where only taken for either file or project In this case some dots are missing These remaining spaces on the line are necessary so that the chronological order between file and project measurement is still evident Algorithm Runtime Single File 05 45 60 5 11 86 05 02 40 04 19 20 03 36 00 02 52 80 MM SS 02 09 60 01 26 40 00 43 20 0 43 59 00 00 00 1 2 3 4 5 6 ni 8 9 10 11 12 13 14 15 16 17 18 48 20 Figure 9 1 Execution Time Single File The peak which arises from measurement 13 and 14 was the adaptation toward first last element include pahts based on IIndexs which was introduced in 6 2 This adapta tion including adequate caching mechanisms was finished at measurement 15 When comparing measurement 9 and 15 it becomes obvious that the adaptation was prosper ous
141. sTo declaration normally only 1 path includeList firstElementsOf includePathList if not areAllIncludesOnlyUsedToResolve includeList refsToDeclarationList return false return true Note here that the areUsedInOnlyDeclaration function tests if the refsToDec larationList are only used in declarations which are not themselves definitions 4 6 1 Refactor Towards iosfwd The standard C library contains a file named iosfwd which contains forward decla rations for common input and output stream types As an algorithm extension to the one which replaces includes with forward declarations one could in the case of streams add an include directive to iosfwd instead of forward declaring the used stream types 4 7 Introduce Redundant Include Guards Every C developer knows that include guards are necessary to prevent redefinition of symbols mostly types Include guards achieve that each file is only included exactly once So during compilation the same include files is opened and closed in numbers only to find that there is nothing to do with it because it was already included earlier In big software project this results in a considerable amount of time that is wasted by opening and again closing files The open and close operations executed by the compiler however can be prevented by introducing redundant include guards in the file which includes a header file A more detailed description of redundant include guards c
142. the problem marker s icon However it seams that this is also true for all the other CDT problem markers I will not investigate further on this issue but this should maybe be looked into 107 12 18 B 4 Example Codan Checker The CDT Codan framework allows to check the content of documents also while typing The components that can be added to the Codan framework are called Codan checkers When a file gets changed in a C editor each Codan checker is called to analyze the document and report problems These problems will then be shown as markers in the editor Given some useful checkers this is a very nice feature that helps to produce C code even more easy The following listing shows the additions which have to be added to plugin xml to implement a Codan checker lt xml version 1 0 encoding UTF 8 gt lt eclipse version 3 4 gt lt plugin gt lt extension point org eclipse cdt codan core checkers id org eclipse cdt codan core internal checkers gt lt checker class ch hsr ifs redhead codan checkers ExampleChecker id ch hsr ifs redhead codan checkers ExampleChecker name Find Unused Includes Checker gt lt problem category org eclipse cdt codan core categories ProgrammingProblems defaultSeverity Warning id ch hsr ifs redhead codan problems ExampleProblem messagePattern 0 name Unused Include gt lt checker gt lt extension gt lt plugin gt In the listing above the C
143. tion action CEditor ceditor CEditor PlatformUI getWorkbench lt getActiveWorkbenchWindow getActivePage getActiveEditor IResource resource ceditor getInputCElement getResource try resource deleteMarkers null true IResource DEPTH_INFINITE catch CoreException e1 System err printin deleting markers failed e1 printStackTrace 104 ou 11 14 17 20 23 26 29 32 38 Al 44 B 2 1 Customized Markers In certain cases one wants to give a marker which is added custom properties This can for example be to add an own marker icon to define a background color etc The following listing shows how such a marker definition looks It defines a mew marker type which instead of showing the normal problem icon shows a customized ReDHead icon lt xml version 1 0 encoding UTF 8 gt lt eclipse version 3 4 gt lt plugin gt lt extension point org eclipse core resources markers id ch hsr ifs redhead unusedincludemarker gt lt super type org eclipse core resources problemmarker gt lt persistent value true gt lt extension gt lt extension point org eclipse ui editors markerAnnotationSpecification gt lt specification colorPreference Value 254 155 0 annotationType ch hsr ifs redhead unusedincludeannotation verticalRulerPreferenceValue true colorPreferenceKey indexResultIndicationColor contributesToHeader false overviewRu
144. to highlight that in C a definition is always also a decla ration Hence when the term declaration is used if not specified otherwise it refers to either a definition or a declaration An important term which also needs clarification here is the term include path Normally people think of include paths as an argument passed to a C compiler which then is used to locate required header files As an example GCC accepts an include path as value of the option I Actually the argument passed is the path of a directory which would make the term include directory path more accurate The term include path in the scope of this documentation and also the ReDHead plugin refers not to these compiler include paths but rather to an ordered set of include dependencies which forms a path form one C code file to a target C header file An include dependency in Figure 1 1 is depicted as a blue arrow As an example in Figure 1 1 one can see two different include paths form file a cpp to file a h The first one is just gt a h the second one is gt b h gt a h So the ordered set of the first path contains only one element the one of the second include path contains two 1 5 Static Analysis in General One should be aware that optimizing the include structure of a project is only a small part of possible features that can be implemented with the help of static code analysis As already mentioned Eclipse JDT is a very noteworthy
145. uivalent of a DeclarationReferenceDependency which itself a logical design element see 5 1 for information about the two designs The small difference here is that as described in 5 5 IncludePath does not contain the originating ReDHeadFile as starting element but starts with the file that is included As can be seen in the Figure 6 3 the elements contained in IncludePaths are not files but rather IIndexInclude These can be used to find both the including and the included file Important to know here is that ReDHead only makes use of the included file This gives the advantage that IncludePath instances can be reused and compared independent of the file that contains the first include So as an example each IncludePath generated by the code include lt vector gt independent of the file that contains it is considered equals Note also here that instead of IIndexIncludes one could also use I ASTPeprocessor IncludeStatements They can basically provide the same information as the IIndex Include In spite of the same functionality as already mentioned in Chapter 5 they do not share any common interface or base class IncludePaths are always constructed as a collection since there might always exists several paths from a first included file to a target file To explain the construction of IncludPath instances I must first explain the two subclasses of IncluePath Up to now what was referred to as a include path is actually an instanc
146. ve is not a perfect metric to find the worst include This means that in very special cases it can happen that the find unused includes algorithm proposes not as many includes for removal as possible The heuristic approach here was chosen so that the algorithm is able to deliver suggestions in the shortest possible period of time An optimal calculation would with big C files take minutes if not hours to complete 25 Example Run To help understand the process of the filtering of the available include paths which was described in 4 1 1 the process will be illustrated here with an example Figure 4 3 is a repetition of Figure 4 2 additionally containng numbers indexing the include paths to make the description steps bellow easier Also the pseudo code given in Section 4 1 1 is repeated again so one does not have to leaf back and forward continuously File Defining Type A File Using Type A Declaration Reference Dependency Include Include Path Figure 4 3 Find Unused Includes Example with Numbered Include Paths filterAvailableIncludePaths availableIncludePathList pickedIncludeList EmptyIncludeList pickMandatoryIncludes pickedIncludeList availableIncludePathList while hasIncludeChoice availableIncludePathList removeWorstChoice availableIncludePathList pickMandatoryIncludes pickedIncludeList availableIncludePathList return pickedIncludesList Execution
147. was completely on my own The following two lists show positive and negative points of these projects Positive Points Only my own code All the code which was written in this project is completely my own This raises the overview I have over the project code Like this I can make sure there is no unknown code duplication 114 Effectively implemented design Based on the improved overview of the code I can de sign and also re design more effectively because I have all the knowledge thats is needed for this task Supervisor When working alone the person in the supervising role gets much more important because he is the sole person you can go to with problems Early documentation I started documenting very early in the term project This is a very good thing to do because one does not need to reconstruct precise knowledge one had at a past time Extensive Performance Testing Even due to the fact that I used a bigger real C project to test how the ReDHead plugin performs I think that I should have put more focus on this issue The fact that I found out that parts of the developed components do not perform fast enough to be used in a release highlights this fact Negative Points Early documentation Even though early documentation is as mentioned a good thing to write down detailed knowledge at a given time one has to re process these parts again and again because the circumstances change while the project proceeds When software
148. y x x x x W Vv x x includes v x x x x v W amp L x vA x v includes x x x v W amp L vA x x as tree Note that in the content of the column Platform W refers to Windows and L refers to Linux and Unix like platforms Also note that there are projects lists which can be used to perform static analysis but do only provide directly usable features This means that before being able to use the user has to provide some kind of script which then is used to perform customized static analysis Because of that such features to not have a check mark in the the column Static Code Analysis non include The list given above can also be found in the ReDHead wiki http redhead ifs hsr ch wiki SimilarTools The list given there also contains links to the tool s home pages For a list of tools which directly compete with the features of ReDHead I refer to the Chapter 5 in Fel09 8 1 Similar CDT Features Also CDT itself contains some limited static code analysis itself These are on the one side the build output which is parsed and added to the CDT editor in form of makers Additionally to this the newest version of CDT also brings along some rather simple Codan checkers These are listed in the following list e StatementHasNoEffect Checker e CatchByReferenceChecker e NamingConventionFunctionChecker 74 e NonVirtualDestructor e SuggestedParenthesisChecker e AssignmentInConditionChecker 75 9 Challenges During the

- HSR - Institutional Repository

Contents

Download Pdf Manuals

Related Search

Related Contents