Home

The FITTEST Tool Suite for Testing Future Internet Applications

1. 7 12 Bauersfeld S de Rojas A Vos T Evaluating rogue user testing in industry an experience report Technical Report UU CS 2014 011 Utrecht University 2014 67 68 69 70 71 Zeller A Isolating cause effect chains from computer programs In 10th ACM SIGSOFT symposium on Foundations of Software Engineering FSE 2002 1 10 Elyasov A Prasetya I Hage J A N Reduce first debug later In to appear in the proceedings of ICSE 2014 Workshops 9th International Workshop on Automation of Software Test AST 2014 Washington DC USA ACM IEEE 2014 Naish L Lee H J Ramamohanarao K A model for spectra based software diagnosis ACM Trans Softw Eng Methodol 20 3 2011 Prasetya I S W B Sturala A Middelkoop A Hage J Elyasov A Compact traceable logging In 5th International Conference on Advances in System Testing and Validation VALID 2013 Tonella P Marchetto A Nguyen C D Jia Y Lakhotia K Harman M Finding the optimal balance between over and under approximation of models inferred from execution logs In Proc of the Fifth IEEE International Conference on Software Testing Verification and Validation ICST 2012 21 30
2. Proceedings Third IEEE International 1998 254 261 Garvin B Cohen M Dwyer M Evaluating improvements to a meta heuristic search for constrained interaction testing Empirical Software Engineering 16 1 2011 61 102 Calvagna A Gargantini A A formal logic approach to constrained combinatorial testing Journal of Automated Reasoning 45 2010 331 358 Jia Y Cohen M B Harman M Petke J Learning combinatorial interaction testing strategies using hyperheuristic search Technical Report Technical Report RN 13 17 Department of Computer Sciences University of College London 2013 Harman M Burke E Clark J Yao X Dynamic adaptive search based soft ware engineering In Proceedings of the ACM IEEE international symposium on Empirical software engineering and measurement ESEM 712 2012 1 8 Burke E K Gendreau M Hyde M Kendall G Ochoa G Ozcan E Qu R Hyper heuristics a survey of the state of the art Journal of the Operational Research Society 2013 to appear Bauersfeld S Vos T E J GUITest a Java library for fully automated GUI ro bustness testing In Proceedings of the 27th IEEE ACM International Conference 50 ol 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 on Automated Software Engineering ASE 2012 New York NY USA ACM 2012 330 333 Bauersfeld S Vos T E A reinforcement learning approach to automated gui
3. strategy in evolving the models e g based on some assumption on expiration time of the models 10 this is left as future work Our CSTP tool can be used to inject artificial mutations on service responses and then used to rank test cases based on their sensitivity to the mutations This is useful in regression testing to prioritize test cases when time and budget are limited In the case study at IBM the resulting prioritization is as good as a manual prioritization made by an expert 58 In a separate case study just using 10 of the test cases in total 159 ranked by CSTP we can detect four out of five injected faults 23 We have improved the combinatoric testing tool CTE XL with new features e g ability to prioritize its test cases generation based on weight assigned to elements of the used classification tree 35 and ability to generate sequences of elementary test cases 36 The latter is also essential for Internet applications which are often event based An execution of such an application is a sequence of user events each may have parameters So a test case for such an application is also a sequence The case studies at IBM SOFTEAM and Sulake indicated that test cases generated by CTE can find errors that were not discovered by manual test cases 63 59 However CTE test cases also left some features uncovered 65 In a way this is as expected A domain expert would know how to activate a given set of target features
4. 241 250 11 Nguyen C D Tonella P Automated inference of classifications and dependencies for combinatorial testing In Proceedings of the 28th IEEE ACM International Conference on Automated Software Engineering ASE 2013 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Ernst M D Perkins J H Guo P J McCamant S Pacheco C Tschantz M S Xiao C The daikon system for dynamic detection of likely invariants Sci Com put Program 2007 35 45 Elyasov A Prasetya I S W B Hage J Guided algebraic specification mining for failure simplification In accepted in the 25th IFIP International Conference on Testing Software and System ICTSS 2013 Elyasov A Prasetya I S W B Hage J Log based reduction by rewriting Tech nical Report UUCS 2012 013 Utrecht University 2012 Prasetya I S W B Hage J Elyasov A Using sub cases to improve log based oracles inference Technical Report UUCS 2012 012 Utrecht University 2012 Anon The daikon invariant detector user manual groups csail mit edu pag daikon download doc daikon html 2010 Nguyen C D Marchetto A Tonella P Combining model based and combi natorial testing for effective test case generation In Proceedings of the 2012 International Symposium on Software Testing and Analysis ACM 2012 100 110 Tonella P FITTEST deliverable D4 3 Test data generation
5. Flash the ITE also includes a bytecode instrumentation tool called abci 3 to instrument selected methods and instruction blocks for logging thus allowing much more detailed logs to be produced Abci is implemented with the AG attribute grammar framework 4 allowing future extensions such as adding more instrumentation types to be programmed more abstractly 2 2 Model Inference Model inference is a key activity in continuous testing 5 6 Based on execution traces it infers models that describe structure and behaviour of a SUT using either event sequence abstraction or state abstraction 7 9 One of the most frequently used models is the Finite State Machine FSM In an FSM nodes represent states of the SUT and transitions represent an application event ac tion e g a method call an event handler that can change the application state if executed Additionally guards and conditions can enrich the model to capture the context in which events and actions are executed In the ITE models are automatically inferred from logs using either event sequence abstraction or state abstraction The outcome are FSM models that can be fed to the successive phases of the ITE cycle specifically event sequence generation and input data generation Models are updated continuously and incrementally 10 as the software evolves and new behaviours appear in the execution logs The events in the model often have input parameters From the log we
6. and UML2 profile 2013 Fraser G Arcuri A EvoSuite automatic test suite generation for object oriented software In Proceedings of the 13th conference on Foundations of Software En gineering ESEC FSE New York NY USA ACM 2011 416 419 Rothermel G Harrold M J A safe efficient regression test selection technique ACM Trans Softw Eng Methodol 6 2 1997 173 210 Rothermel G Untch R H Chu C Harrold M J Prioritizing test cases for regression testing IEEE Transactions on Software Engineering 27 2001 929 948 W3C Web service description language WSDL http www w3 org tr wsdl20 Technical report Accessed December 2010 Nguyen D C Marchetto A Tonella P Change sensitivity based prioritization for audit testing of webservice compositions In Proc of the 6th Int Workshop on Mutation Analysis co located with ICST 2011 357 365 Ludwig H Keller A Dan A King R Franck R A service level agreement language for dynamic electronic services Electronic Commerce Research 3 2003 43 59 10 1023 A 1021525310424 W3C XML path language XPath http www w3 org tr xpath Technical report 1999 W3C XML schema http www w3 org xml schema Technical report Accessed December 2010 Cohen M B Snyder J Rothermel G Testing across configurations implications for combinatorial testing SIGSOFT Softw Eng Notes 31 2006 1 9 Kuhn D R Wallace D R Gallo A M Software fault interact
7. from classification trees In Proceedings of ICST 2012 Workshops ICSTW 2012 Montreal Canada 2012 Kruse P M Lakhotia K Multi objective algorithms for automated generation of combinatorial test cases with the classification tree method In Symposium On Search Based Software Engineering SSBSE 2011 2011 Ferrer J Kruse P M Chicano J F Alba E Evolutionary algorithm for prior itized pairwise test data generation In Proceedings of Genetic and Evolutionary Computation Conference GECCO 2012 Philadelphia USA 2012 Prasetya I S W B Amorim J Vos T Baars A Using Haskell to script combi natoric testing of web services In 6th Iberian Conference on Information Systems and Technologies CISTI IEEE 2011 Cohen D M Dalal S R Fredman M L Patton G C The AETG system an approach to testing based on combinatorial design IEEE Transactions on Software Engineering 23 7 1997 437 444 Cohen M B Gibbons P B Mugridge W B Colbourn C J Constructing test suites for interaction testing In Proceedings of the 25th International Confer ence on Software Engineering ICSE 03 Washington DC USA IEEE Computer Society 2003 38 48 Hnich B Prestwich S Selensky E Smith B Constraint models for the covering test problem Constraints 11 2006 199 219 Lei Y Tai K In parameter order a test generation strategy for pairwise testing In High Assurance Systems Engineering Symposium 1998
8. robustness testing In 4 th Symposium on Search Based Software Engineering 2012 7 Edelstein O Farchi E Goldin E Nir Y Ratsaby G Ur S Framework for testing multithreaded java programs Concurrency and Computation Practice amp Experience 15 3 5 2003 485 499 Juristo N Moreno A Vegas S Reviewing 25 years of testing technique exper iments Empirical Softw Engg 9 1 2 2004 7 44 Hesari S Mashayekhi H Ramsin R Towards a general framework for eval uating software development methodologies In Proc of 34th IEEE COMPSAC 2010 208 217 Vos T E J Mar n B Escalona M J Marchetto A A methodological frame work for evaluating software testing techniques and tools In 12th International Conference on Quality Software Xi an China August 27 29 2012 230 239 Vos T E J Evolutionary testing for complex systems ERCIM News 2009 78 2009 Vos T E J Continuous evolutionary automated testing for the future internet ERCIM News 2010 82 2010 50 51 Nguyen C Mendelson B Citron D Shehory O Vos T Condori Fernandez N Evaluating the fittest automated testing tools An industrial case study In Empirical Software Engineering and Measurement 2013 ACM IEEE Interna tional Symposium on 2013 332 339 Nguyen C Tonella P Vos T Condori N Mendelson B Citron D She hory O Test prioritization based on change sensitivity an industrial case study Technical Re
9. with multiple clients Using the new record and replay functionality we can first record events and messages and then replay them at desired levels of intensity to generate load that can expose concurrency bugs at a network level When an execution fails it may still be non trivial to find the root cause of the failure This is especially true for Internet applications which are often event based An execution of such an application can be long driven by a sequence of top level events When the execution fails simply inspecting the content of the call stack as we usually do when debugging may not reveal the root cause since the stack only explained what the last top level event did In the FITTEST project we have also investigated this problem In 14 we use the pattern based oracles inferred by the continous testing tools to reduce the log belonging to a failing execution thus making it easier to be inspected The reduction tries to filter out events irrelevant to the failure and preserves the last logged abstract state of the execution where the failure is observed The approach works off line it is fast but is inherently inaccurate We then investigated if combining it with an on line approach such as delta debugging 67 will give us a better result Our preliminary investigation shows a promising result 68 Ultimately the performance depends on how well the chosen state abstraction used by the logger is related to the failure Dec
10. 0 have been established as well as part of the FITTEST project For all generation techniques available the use of search based techniques 37 has been considered as part of controlled benchmarks 38 4 2 Focus CTD The IBM Focus tool is a comprehensive tool for test oriented system modeling for model based test planning and for functional coverage analysis Its system modeling and test planning are based on computing Cartesian products of system or test aspects with restrictions applied to the models and henceforth to the computation The generated test plans are combinatorial thus placing the IBM Focus tool in the family of combinatorial test design tools 32 39 including CTE The Focus tool also provides advanced code coverage analysis capabilities It imports code coverage data from other tools and analyzes various coverage as pects In particular it excels in identifying system parts which are poorly covered and in planning tests to optimally cover those system parts Focus provides advanced reports independent of the programming language and the platform The Focus graphical user interface GUI allows manual acti vation viewing and manipulation of its modeling and combinatorial test design capabilities and of its coverage analysis Additionally the GUI can be bypassed to facilitate automatic or semi automatic test planning and coverage analysis The results of the tool and its reports can be exported into HTML spreadsheet an
11. EST ConTest was enhanced with bug isolation capabilities which allows learning which component or components of the system contain the bug This is of paramount importance in internat applications which are componentized and distributed Another FITTEST addition to ConTest is a high level record and replay ca pability which enables debugging the suspicious component in isolation while maintaining the communication patterns that manifest the bug This again is very important in internet applications where communication is commonly cen tral to the system yet there is a need to debug specific components in isolation Yet another FITTEST extension to ConTest is an orange box facility The orange box allows replay of the series of messages and events which took place just ahead of a concurrency bug manifestation This provide much help to con currency debugging of FI applications Deadlock analysis and lock history were also added to the FITTEST version of ConTest to further improve concurency testing and debuggiing capabilities The ConTest flow is depicted in Figure 9 Given an SUT and a set of test cases pre existing or generated by a test generation tool ConTest first instruments the SUT to allow noise delay to be injected to relevant program points The test cases are ordered using a heuristic and then run For each test case different schedules will be tried In case that a bug was exposed it is fixed externally to ConTest and
12. TTEST approach to test case prioritization for audit testing of services called Change Sensitivity Test Prioritization CSTP is based on the idea that the most important test cases are those that can detect mutations of the service responses when such mutations affect the semantics while leaving the WSDL syntax unchanged More details about CSTP including its empirical validation are available in a previous publication 23 Let s be the service under consideration Snew be a candidate that can sub stitute Snew is the subject of audit testing Let T S be the set of available test cases that are selected as candidate for audit testing These are the test cases whose executions result in the invocation of s TS is used in audit testing of spew with respect to the composition under consideration In practice execution of the full suite TS might involve too much time or might require a big monetary budget if only a limited number of invocations of Snew is available for testing purposes i e invocations in test mode or if the invocation of spew requires payment Hence the service integrator has to minimize and prioritize the test cases from T S that are actually run against Snew during audit testing The goal is then to prioritize the test cases in such a way that issues if any are detected by an initial small subset of the ranked test cases CSTP determines the change sensitivity of the test cases and uses this met rics to rank the te
13. The FITTEST Tool Suite for Testing Future Internet Applications Tanja E J Vost Paolo Tonella Wishnu Prasetya Peter M Kruse Onn Shehory Alessandra Bagnato and Mark Harman Universidad Polit cnica de Valencia Spain Fondazione Bruno Kessler Trento Italy 3 Universiteit van Utrecht Utrecht Netherlands Berner amp Mattner Berlin Germany IBM research Haifa Israel Softeam Paris France University College London UK Abstract Future Internet applications are expected to be much more complex and powerful by exploiting various dynamic capabilities For testing this is very challenging as it means that the range of possible behavior to test is much larger and moreover it may at the run time change quite frequently and significantly with respect to the assumed behavior tested prior to the release of such an application The traditional way of testing will not be able to keep up with such dynamics The Future Internet Testing FITTEST project a research project funded by the European Comission grant agreement n 257574 from 2010 till 2013 was set to explore new testing techniques that will improve our capacity to deal with the challenges of testing Future Internet applications Such techniques should not be seen as replacement of the traditional testing but rather as a way to complement it This paper gives an overview of the set of tools produced by the FITTEST project implementing those techniques 1 I
14. a w oO w Domain Input HUMAN ORACLES a Specifications pes ae e r PATTERN BASED H ed ORACLES e PROPERTIES BASED ORACLES ee ae z MODEL BASED fe MANUALLY ORACLES 5 EXECUTE oooO Q TEST CASES FREE ORACLES X lt Los eee if b ORACLES it i E EVALUATE TEST E CASES E a a gt iu 5 TEST Ww RESULTS z Fig 1 Activities performed in each testing cycle a logger is first attached to it if necessary by instrumenting the SUT Logs are then collected and then analyzed to infer a behavior model of the SUT in the form of a finite state machine as well as properties to be used as test oracles Fresh logs from the current cycle are used for this analysis but also logs from previous cycles up to some time in the past which we consider to still represent the current SUT Test cases are then generated from the model by traversing it according to proper criteria such as transition coverage Each test case is basi cally a path of transitions through the finite state machine Since self adaptation or other forms of dynamic changes will be visible on the logs the inferred model can keep up with them However using a model inferred like this to test the SUT is only meaningful if the generated test cases do not simply replay the logged executions We therefore apply e g a combinatoric approach over the parameters of the execution in order to trigger fresh executions Test cases generated from a model are typically stil
15. and share their in world experi ences Some quick Habbo facts from 2011 11 language versions Customers in over 150 countries Registered users 218 000 000 Unique visitors more than 11 000 000 per month 90 of the users between the age of 13 18 Combinatorial testing for a system such a Habbo is a challenging task since there exists a wide variety of operating systems and browsers and their dif ferent versions used by players Currently at Sulake testing new features is planned using high level feature charters to support exploratory testing and automated regression tests are designed for most critical use cases identified during exploratory testing Teams make sure that the developed features have automated regression tests Besides feature coverage test engineers and their teams are provided user information that contains for example of users using each browser operating system and flash player version This information is used to take combinatorial aspects into account and de sign the tests in such a way that user variables that cover most users setups are tested For example when a test is being designed that needs more than one browser e g a friend request from user A to B this user information is used to make sure that two different user set up one for A and one for B are configured in such a way that most popular user configurations are tested Habbo is built on highly scalable client server technology The ActionScr
16. cifying simple oracles based on the state of the GUI of the SUT For example we can enter a regular expression that describes those messages that you consider to be related to possible errors The RU will apply this expression to each title of each widget on the screen If it matches any widgets title the RU will report an error and save the sequence for later inspection Moreover this tab allows us to configure the freeze time The RU is able to detect crashes automatically because it realizes when the SUT is not running anymore However if the SUT does not really crash but just freezes is unresponsive for a long time then the RU does not know whether it is just carrying out heavy computations or hangs If the SUT is unresponsive for more than the set freeze time the RU will consider it to be crashed and mark the current sequence as erroneous The filter tab provides the tester an easy way to specifiy actions that should not be execucted e g because they are undesirable or dangerous or processes that can be killed during test generation i e help windows document viewers etc The time tab provides a means to define Action Duration The higher this value the longer the execution of actions will take Mouse movements and typing become slower so that it is easier to follow what the Rogue User is doing This can be useful during Replay Mode in order to replay a recorded sequence with less speed to better understand a fault Tim
17. d text files The main functionality of the Focus tool is depicted in Figure 4 Given a system under test SUT system attributes and attribute values are extracted 3 EXTRACT ATTRIBUTES SUT eo Domain experts EXISTING TEST g sa DEFINE ae k RESTRICTIONS NS ay oe Attribute Values 2 GENERATE y Restrictions d ne T MODEL E GENERATE TES lt S PLAN A S Combinatorial Model GENERATE TES SY i C CASES SF Test Suite YES ALAN GOOD ENOUGH S OPTIMIZED TEST PLAN Raw Test Su i Plan ee N EM z OPTIMIZE AND 7AREFINE TEST PLAN No Test Cost Requirements Quality amp Priority Requirements Fig 4 Focus flow Additionally restrictions are placed on the attributes their values and the de pendencies among them Based on these a combinatorial model of the system is generated by Focus Focus may be stopped at this stage however the com mon usage proceeds to test plan generation The tool can generate a plan from scratch or alternatively start from an existing plan Either way the plan is op timized to meet coverage cost and functional requirements Manual changes are allowed via the GUI supporting what if style modifications Once the test plan is optimized and meets user requirements Focus can automatically generate the needed test cases 4 3 Hyperheuristic Search Algorithm M
18. default assertions being that the appli cation should not crash or raise exceptions Since the client lacks controllability and observability over the service and SLA Service Level Agreement 24 con cerns only high level quality contracts e g performance response time the developer of the client service integrator has to make assumptions about tech nical details regarding the format of the responses We call these assumptions as service integrator s assumptions For example the integrator might expect a list of phone numbers separated by commas from the service when searching for a phone number Changes in the data format of the response e g using colon instead of comma as phone number separator may break the assumptions of the service integrator which may lead the client to misbehave thus making the test cases not to pass It is worth noticing that we focus on data changes e g format range etc Changes regarding the structure of the service responses can be easily detected because they require the interface definition of the service written for instance in the Web Service Definition Language WSDL 22 to be changed Adopting a new service interface involves rebinding and recompiling and the compiler is able to detect syntactic incompatibilities What the compiler cannot detect is subtle semantic incompatibilities such as the change of a list item separator This requires regression audit testing and test prio
19. e and checking of oracles use separate tools To execute the generated test cases the ITE can also control a version of the SUT that runs in a dedicated testing environment Tools and libraries for logging and test case execution de pend on the technologies used by the SUT Currently the ITE supports PHP and Flash applications 2 1 Logging and Instrumentation Loggers intercept events representing top level interactions with the application under test this way programmers do not need to explicitly write any logging code The used technology determines what constitutes a top level interaction For a PHP application top level interactions are the HTTP requests sent by the client to the application For a Flash application these are GUI events such as the user filling a textfield or clicking on a button The produced logs are in the text based FITTEST format 2 The format is compact but deeply structured allowing deep object serialization into the logs In the Flash logger we can register serialization delegates They specify how instances of the classes they delegate are to be serialized Using such a delegate one can also specify an abstract projection of the application s concrete state and log it Although the delegates have to be manually specified they are transparent the application is not even aware them The PHP logger does not currently log state information hence models in ferred from PHP logs are purely event based For
20. e evaluative case studies should involve realistic systems and subjects and not toy programs and students as is the case in most current work 52 53 be done with thoroughness to ensure that any benefit identified during the evaluation study is clearly derived from the testing technique studied ensure that different studies can be compared In order to simplify the design of case studies for comparing software test ing techniques while ensuring that the many guidelines and check list for doing empirical work have been met we have defined a general methodological frame work in FITTEST that can be found in 54 The framework we have developed has evolved throughout the past years by doing case studies to evaluate testing techniques The need to have a framework as described in this paper emerged some years ago during the execution of the EU funded project EvoTest IST 33472 2007 2009 55 and continued emerging during the EU funded project FITTEST ICT 257574 2010 2013 56 Both these are projects whose objec tives are the development of testing tools that somewhere in the project need to be evaluated within industrial environments Searching in the existing literature to find a framework that could be applied in our situation did not result in anything that exactly fit our need a methodological framework that is specific enough for the evaluation of software testing techniques and general enough and not make any assumptio
21. e to wait after execution of an action This is the time that the Rogue User pauses after having executed an action in Generation Mode Usually this value is set to 0 However sometimes it can make sense to give the GUI of the SUT more time to react before executing the next action If this value is set to a value gt 0 it can greatly enhance reproducibility of sequences at the expense of longer testing times SUT startup time This is the time that the Rogue User waits for the SUT to load Large and complex SUTs might need more time than small ones Only after this time has expired the Rogue User will start sequence generation Maximum test time in seconds The RU will cease to generate any sequences after this time has elapsed This is useful for specifying a test time out e g 1 hour one day one week Use Recorded Action Timing during Replay This option only affects Replay Mode If checked the RU will use the action duration and action wait time that was TEST CASES SUT St RE ORDER yo Q coe mm pp ORDERING INSTRUMENT A SUM ooo gt a i 3 EXECUTION INSTRUMENTED i gt TRACE A RuN o SUT NEW ae NG NOT ENOUGH SUCCESS CONCURRENCY COVERAGE FAILURE ENOUGH C FIKBUG No we s te Fig 9 The ConTest flow used during sequence generation If you uncheck the option you can specify your own values 6 Concurrency Testing Many concurrency bug
22. erated by Evosuite require a straightforward transformation to be usable to test our SUT 3 http www cte xl professional com 4 http seleniumhq org 2 5 Test evaluation The generated test cases are executed and checked against oracles If an oracle is violated the corresponding test case has then failed Further investigation is needed to understand the reason behind the failure Support for automatic debugging is currently beyond our scope As depicted in Figure 1 various oracles can be used for test evaluation The most basic and inexpensive oracles detect SUT crashes the most effective but expensive are human oracles The ITE offers several other types of oracles that are automatically inferred from the logs or the models Model based oracles The test cases produced by the ITE are essentially paths generated from the FSM models If these models can be validated as not too much under approximating then any path through them can be expected to be executable it should not the SUT to crash or become stuck in the middle which can be easily checked In Figure 1 this is called model based oracles Properties and pattern based oracles The executions of the test cases are also logged The ITE then analyzes the produced logs to detect violations to the inferred pre and post conditions as well as the pattern based oracles The checking of the latter is done at the suite level rather than at the test case level E g to check bb b the ITE
23. essentially looks for a pair of prefix sequences obb and ob in the logs produced by the test suite that contradicts the equation 3 Regression Testing Audit testing of services is a form of regression testing 20 21 that aims at checking the compliance of a new service including a new version of an existing service or a newly discovered service from a new provider with a FI System Under Test SUT that integrates the service and currently works properly In such a context test case prioritization has the goal of giving an order to the test cases so that the key ones e g those that reveal faults are executed earlier Audit testing of services differs from traditional regression testing because the testing budget is typically much more limited only a very small subset of the regression test suite can be executed and because the kind of changes that are expected in the service composition is known and quite specific In fact audit testing is used to check the proper functioning of a service composition when some external services change Some adaptations of the service composition that are required to make the composition work are trivially detected at the syntactical level by the compiler which will immediately report any interface change that needs adaptation on the composition side It is only those semantic changes that do not affect the service API description for example WSDL 22 that may go undetected and require audit testing The FI
24. estriction types and their corresponding mutation operators showed in Table 1 The Enumeration mutation operator randomly generates a new item added to the finite set of items admitted in the response according to the service as sumption The Length mutation operator changes the size of a response ele ment so as to make it differ from the integrator s assumptions Similarly the MazLength and MinLength mutation operators make the size of a response ele ment respectively greater than or lower than the limit admitted in the integra tor s assumptions When the numeric value of a response element is constrained to be within boundaries the MinInclusive MazInclusive MinExclusive Maz Exclusive mutation operators produce values that lie beyond the integrator s assumed boundaries The RegEx mutation operators can be used when the con tent of a response element is supposed to follow a pattern specified by means of a regular expression Such regular expression is mutated e g by replacing a constant character with a different one by making a mandatory part of the expression optional by concatenating an additional subexpression and mutated responses are generated by means of the mutated regular expression For exam ple the regular expression specifying a list of phone numbers as a list of comma separated digit sequences can be mutated by replacing the constant character with Taking a data restriction specified in a service as
25. extract concrete input values for such parameters We have implemented a data mining technique to infer input classes for combinatorial testing used later in test generation for these parameters 11 2 3 Oracle inference The tool Haslog and Lopi are used to infer oracles from logs Hashlog can infer pre and post conditions for each type of high level event it uses Daikon 12 at the back end In Figure 1 these are called properties based oracles If the logged abstract state consists of many variables simply passing them all to Daikon will cause Daikon to try to infer all sorts of relations between them most of which are actually not interesting for us The ITE allows groups of variables to be specified using regular expressions and constrains Daikon to only infer the oracles for each group separately Lopi infers so called pattern based oracles 13 These are algebraic equations of common patterns over high level events e g a or bb b saying that on any state executing the event b twice will lead to the same state as executing just one b Additionally such equations are also useful to reduce a failing log log produced by a failing execution when we diagnose the failure 14 Finally through instrumentation we can also log the entry and exit points of selected methods allowing pre and post conditions to be inferred for them Furthermore so called sub cases oracles can also be inferred 15 This requires instruction
26. follows 1 Obtain the GUI s state i e the visible widgets and their properties like position size focus The RU can determine the current GUI state of the SUT in the form of a widget tree A widget tree is a hierarchical composition of the widgets currently visible on the screen together with the values of associated widget attributes Figure 7 shows an example of such a tree for some user interface 2 Derive a set of sensible default actions Enabled buttons icons and hyperlinks can be tapped text fields can be tapped and filled with text the screen type Screen type Button text Contacts type Button text Game Center type Button text Settings type Button text Photos type Button text Safari WIDGET TREE USER ACTIONS SUT Oracle be F PEDESE EES p TEAN romisin SELECT Domain Experts Action PROMISING Factors PROMISING Definitions ACTIONS ACTION m A STOP SUT Fig 8 The Rogue User testing tools simple interface scrollbars and sliders may be dragged etc RU allows to simulate simple clicks keystrokes as well as complex actions drag and drop operations handwriting and other gestures etc and can thus drive even sophisticated GUIs Select and execute an action 4 Apply an oracle to check whether the state is valid If it is invalid stop se quence generation and save the suspicious sequence to a dedicated di
27. hether current testing practices could be improved or complemented by using some of the new testing techniques that were introduced by the FITTEST EU project For this purpose the IBM Research team has used the FITTEST tools and compared the results with the testing practices cur rently used during the initial steps of the Systems Verification Test SVT Only this level of tests was considered since the next stage of the SVT test ing is conducted elsewhere in IBM and so is beyond the case studies Finally since the IMP is a mature system and in order to be able to measure fault finding capability several faults were injected into it within the simu lated environment to mimic potential problems that had can be surfaced in such a system Details and results of these studies can be found in 59 57 58 SOFTEAM is a private software vendor and engineering company with about 700 employees located in Paris France Three case studies are conducted in this company 1 automated test cases generation with the FITTEST ITE 61 2 combinatorial testing with the CTE 62 63 and 3 rogue user testing 64 The studies were done within the development team responsible for Modelio Saas a rather new SOFTEAM product The team is composed of 1 project manager 2 software developers and 3 software analysts Modelio SaaS is a web application written in PHP that allows for the easy configuration of distributed environments It runs in virtualized en
28. iations will have to evolve even quicker than what now already is Each time we integrate components we face a critical decision of either to use the new components or to simply stay with the old ones knowing that they might have issues risks and lim ited support Unfortunately more often than not the latter is the preferred choice since the incurred regression testing activities are too time consum ing since there are too many tests We therefore propose to prioritizate the tests according to the available time and budget The FITTEST regression testing approach aims at automatically prioritizing a set of test cases based on their sensitivity to external changes The key idea is to give priority to the tests that can detect a high number of artificial changes Large Feature configuration Space FI applications will be highly customis able it offers a whole range of features that can be configured as well as configurations that depend on the user s context and environment As the result the space of possible comfigurations is combinatorically large To deal with this we developed and or improved three different approaches for com binatorial testing 1 The CTE XL Professional by Berner amp Mattner is a context sensitive graphical editor and a very powerful tool for the system atic specification and design of test cases using a combinatorial approach based on Classification Trees 2 The IBM Focus tool is a comprehensive tool for tes
29. iding what information to be included in the state abstraction is not trivial and is left as future work The above approaches help us in figuring out which top level events are at least related to the failure A more refined analysis can be applied e g spectra analysis 69 if we also know e g which lines of code are passed by the failing execution This requires tracing but when applied on a production system we should also consider the incured overhead In 70 we proposed a novel tracing approach with very low overhead at least for single threaded executions Extending the approach to multi threads setup is future work Acknowledgments This work has been funded by the European Union FP7 project FITTEST grant agreement n 257574 The work presented in this paper is due to the contributions of many researchers among which Sebastian Bauersfeld Nelly O Condori Urko Rueda Arthur Baars Roberto Tiella Cu Duy Nguyen Alessandro Marchetto Alex Elyasov Etienne Brosse Alessandra Bagnato Kiran Lakhotia Yue Jia Bilha Mendelson Daniel Citron and Joachim Wegener References 1 Vos T Tonella P Wegener J Harman M Prasetya I S W B Ur S Testing of future internet applications running in the cloud In Tilley S Parveen T eds Software Testing in the Cloud Perspectives on an Emerging Discipline 2013 305 321 2 Prasetya I S W B Elyasov A Middelkoop A Hage J FITTEST log format version 1 1 Technica
30. ions and implica tions for software testing IEEE Transactions on Software Engineering 30 2004 418 421 Grochtmann M Grimm K Classification trees for partition testing Softw Test Verif Reliab 3 2 1993 63 82 Kruse P M Bauer J Wegener J Numerical constraints for combinatorial inter action testing In Proceedings of ICST 2012 Workshops ICSTW 2012 Montreal Canada 2012 Grochtmann M Wegener J Test case design using classification trees and the classification tree editor cte In Proceedings of the 8th International Software Quality Week San Francisco USA 1995 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Lehmann E Wegener J Test case design by means of the CTE XL In Proceed ings of the 8th European International Conference on Software Testing Analysis amp Review EuroSTAR 2000 Kopenhagen Denmark Citeseer 2000 Nie C Leung H A survey of combinatorial testing ACM Comput Surv 43 2011 11 1 11 29 Kruse P M Luniak M Automated test case generation using classification trees Software Quality Professional 13 1 2010 4 12 Kruse P M Schieferdecker I Comparison of Approaches to Prioritized Test Generation for Combinatorial Interaction Testing In Federated Conference on Computer Science and Information Systems FedCSIS 2012 Wroclaw Poland 2012 Kruse P M Wegener J Test sequence generation
31. ipt 3 client communicates with a Java server cluster to support tens of thousands of concurrent interactive users The server cluster is backed by a MySQL database and intensive caching solutions to guarantee performance under heavy load The game client is implemented with AS3 and takes care of room rendering user to user interactions navigation purchasing UIs etc Client performance is challenging as it has to handle up to 50 moving danc ing talking characters with disco balls fireworks and user defined game logic running real time in AS3 The game server is a standalone JavaSE with game logic fully implemented in server side The server handles 10K messages second and Habbo runs as highly distributed system with dedicated role based clusters the largest instances having over 100 JVMs Habbo website and tools are built on Java Struts2 with AJAX CSS etc and run on Tomcats Details and results of this study can be found in 65 Clavei is a private software vendor from Alicante Spain which specializes in Enterprise Resource Planning ERP systems One of their main products is called ClaveiCon and is used in small and medium sized companies within Spain Due to their many clients it is of fundamental importance to Clavei to thoroughly test their application before releasing a new version Currently this is done manually Due to the complexity and size of the application this is a time consuming and daunting task Therefore Clavei
32. is eager to investigate alternative more automated approaches to reduce the testing burden for their employees and hence participated in an evaluation of the FITTEST Rogue User Testing tool The company not being part of the FITTEST consortium expressed explicit interest in the FITTEST Rogue User Tool and requested to carry out a trial period to investigate the applicability of the tool for testing their ERP products details and results of this study can be found in 66 8 Conclusion and future work With our continous testing tools it is possible to log the execution of an appli cation and from the gathered logs to infer a behavior model of the application and properties that can be used as test oracles Combinatoric test sequences can be generated from the model to be used as test cases Strategies e g a search based approach can be applied if these sequences need non trivial reachability predicates to be solved The usage of these tools can in principle be scripted so that they work in unattended cycles However the challenge is then to evolve the previously obtained models and properties when we advance to the new cycle rather than to just discard them This is not just a matter of efficiency The new cycle may not cover all functionalities covered in the previous cycles we lose information if we simply discard previous models On the other hand it is also possible that some old functionalities have been removed This calls for some
33. l Report UUCS 2012 014 Utrecht University 2012 3 Middelkoop A Elyasov A Prasetya I S W B Functional instrumentation of ActionScript programs with asil In Proc of the Symp on Implementation and Application of Functional Languages IFL Volume 7257 of LNCS 2012 4 Swierstra S D et al UU Attribute Grammar System www cs uu nl foswiki HUT AttributeGrammarSystem 1998 5 Dias Neto A C Subramanyan R Vieira M Travassos G H A survey on model based testing approaches a systematic review In 1st ACM Int ws on Empirical assessment of software engineering languages and technologies NY USA ACM 2007 31 36 6 Shafique M Labiche Y A systematic review of model based testing tool support Technical Report Technical Report SCE 10 04 Carleton University Canada 2010 7 Marchetto A Tonella P Ricca F Reajax a reverse engineering tool for ajax web applications Software IET 6 1 2012 33 49 8 Babenko A Mariani L Pastore F AVA automated interpretation of dynam ically detected anomalies In proceedings of the International Symposium on Software Testing and Analysis 2009 9 Dallmeier V Lindig C Wasylkowski A Zeller A Mining object behavior with ADABU In proceedings of the International Workshop on Dynamic Systems Analysis 2006 10 Mariani L Marchetto A Nguyen C D Tonella P Baars A I Revolution Automatic evolution of mined specifications In ISSRE 2012
34. l abstract They are then refined to concrete test cases and then executed Since the used test oracles are also inferred from logs they are not guaranteed to be sound nor complete Therefore violations reported also need to be inspected to check if they really represent errors After all errors are fixed the current cycle ends After some period of time or if we have a reason to believe that the SUT has changed the next cycle is started Importantly ev ery cycle is mostly automatic and it is except for the errors analysis and bugs fixing part Such a log based approach may not be as strong as it would be if we have the ground truth which we don t but at least now we do not leave the dynamic parts of the SUT completely untested In FITTEST we have developed a set of tools to support the activities in Fig ure 1 Most of these tools are integrated in an Integrated Testing Environment ITE This ITE is a distributed test environment with the main component in charge of model oracle inference and test case generation running at the tester s workstation Through remote agents it monitors and collects data from the SUT running in its production environment The SUT may in turn consist of a server part and users clients running the application within a Web browser I TE s agents are deployed to monitor them The inference of models from logs can be done from the ITE as well as the generation and execution of test cases The inferenc
35. n allows the tester to optimize a test suite by selecting subsets and therefore to reduce test efforts The weights can also be used for statistical testing e g in fatigue test The other new generation technique goes beyond conventional functional black box testing by considering the succession of individual test steps The automated generation of test sequences uses a multi agent system to trigger the system under test with input values in a meaningful order so that all possible system states are reached 36 as seen in the bottom center part of Figure 3 Ss GENERATE TEST SPECIFICATIONS l CRITICALITY OF z SYSTEM GENERAL SPECIFICATION s Peor rest K Fe SELECT TEST 7 TEST PLAN OF SUT CRITERION _ l ay S S _ g g Domain experts PRIORTIIZATION REQUIRED YES e CLASSIFICATION TREE ae i QUANTIFY ALL NY YES S SYSTEM CLASSES IN THE CONTINUOUS Ke TREE i PERFORM N SPECIFY ALLOWED Ce CLASSIFICATION G PRIORITIZED TEST TRANSITION BETWEEN CONVENTIONAL CONKENTIONAL TREE WITH _ CASE GENERATION S CLASSES E TESTCASE euE WEIGHTS lt a GENERATION wz optimize TEST SUITE BERRORMITES iam cs g a H eae GENERATION SMALL TEST OPTIMIZED TEST SEQUENCE maton as rhanamons ONS Fig 3 The CTE workflow Coverage criteria for both generation techniques are also available New mechanisms for describing numerical dependencies 3
36. ned customisation and dynamic re configurations Services and components could be dynamically added by customers and the in tended use could change significantly Moreover these self adaption and dynamic changes can be expected to happen frequently while the application is alive The range of possible classes of behavior is thus much larger and it is just very hard to anticipate them all during the testing which traditionally happens before we release the software The resulting traditional testwares might thus be already inadequate for a specific executing instance of the system Indeed some parts of such applications may remain fairly static so tradi tional testing will still play an important part in the overall testing of an FI application But the testwares of the dynamic parts of the application will need to evolve together with the system new test cases may need to be added some to be removed and some to be adapted to the changed functionalities We therefore propose to complement traditional testing with continous testing Continous testing is carried out in cycles throughout the lifetime of the cur rent release of the System Under Test SUT until at least its next release Each cycle consists of the activities shown in Figure 1 Before the SUT is deployed nttps code google com p fittest o lt oO o e l ne Log based _ oracles zzl 5 Domain experts a i wW z a
37. ns about the testing technique that is being evaluated nor about the subjects and the pilot projects We needed a framework that can be instantiated for any type of treatment subject and object and simplifies the design of evaluative studies by suggesting relevant questions and measures Since such a framework did not exist we defined our own making sure that the guide lines and checklist that can be found in the literature are satisfied We have successfully used the framework for the various case studies executed during EvoTest and during FITTEST 7 2 The case studies IBM Research Lab in Haifa Three case studies are executed at IBM Re search Lab in Haifa 1 automated test cases generation with the FITTEST ITE 57 2 regression testing 58 and 3 combinatorial testing with the CTE 59 The studies were done amongst the research team responsible for building the testing environment for future developments of an IT Manage ment Product IMP similar to 60 a resource management system in a networked environment At IBM Research Lab the developers conduct limited amount of testing the testing itself is conducted by this designated research team It is working to enable the testing of new versions of the IMP by developing a simulated environment in which the system is executed The testing activities related to the studies have been done on IMP but in a simulated testing environment The objective of the team was to iden tify w
38. ntroduction The Future Internet FI will be a complex interconnection of services appli cations content and media running in the cloud In 1 we describe the main challenges associated with the testing of applications running in the FI There we present a research agenda that has been defined in order to address the test ing challenges identified The Future Internet Testing FITTEST project a research project funded by the European Comission grant agreement n 257574 from 2010 till 2013 to work on part of this research agenda A whole range of techniques were developed within the project which are then implemented as tools this paper gives an overview of these tools These tools and the kinds of challenges that they can mitigate are listed below 1 nttp crest cs ucl ac uk fittest Dynamism and self adaptation The range of behaviours of an FI applica tion with such properties is very hard to predict in advance We propose to complement traditional testing with continous testing where the testwares are evolved together with the application We have a set of tools to support this These tools automatically infer behavioural models and oracles from monitored executions and uses these models to automate test case gener ation This can be run in cycles and unattended between the traditional pre release testing rounds Large scale and rapid evolution of components To meet rapid technol ogy and business changes components of FI appl
39. port UU CS 2014 012 Utrecht University 2014 Shehory O Citron D Kruse P M Fernandez N C Vos T E J Mendelson B Assessing the applicability of a combinatorial testing tool within an industrial environment In Proceedings of the 11th Workshop on Experimental Software Engineering ESELAW 2014 CiBSE 2014 http pic dhe ibm com infocenter director pubs index jsp topic 2Fcom ibm director vim helps doc42Ffsd0_vim_main htm1 Brosse E Bagnato A Vos T N C Evaluating the FITTEST automated testing tools in SOFTEAM an industrial case study Technical Report UU CS 2014 009 Utrecht University 2014 Kruse P Condori Fernandez N Vos T Bagnato A Brosse E Combinato rial testing tool learnability in an industrial environment In Empirical Software Engineering and Measurement 2013 ACM IEEE International Symposium on 2013 304 312 Condori Fernndez N Vos T Kruse P Brosse E Bagnato A Analyzing the applicability of a combinatorial testing tool in an industrial environment Technical Report UU CS 2014 008 Utrecht University 2014 Bauersfeld S Condori Fernandez N Vos T Brosse E Evaluating rogue user an industrial case study at softeam Technical Report UU CS 2014 010 Utrecht University 2014 Puoskari E Vos T E J Condori Fernandez N Kruse P M Evaluating appli cability of combinatorial testing in an industrial environment A case study In Proc JAMAICA ACM 2013
40. pplied to a wide range of com binatorial problem instances regardless of their structure and characteristics There are two subclasses of hyperheuristic algorithms generative and se lective 48 Generative hyperheuristics combine low level heuristics to generate new higher level heuristics Selective hyperheuristics select from a set of low level heuristics In our work we use a selective hyperheuristic algorithm Selective hy perheuristic algorithms can be further divided into two classes depending upon whether they are online or offline We use online selective hyperheuristics since we want a solution that can learn to select the best combinatorial heuristic to apply unsupervised as it executes The overall workflow of our hyperheuristic algorithm is depicted in Figure 5 The algorithm takes a combinatorial model generated from system under test as a input It outputs a covering array model which can be converted to a test suite with the help of domain experts The hyperheuristic algorithm contains a set of lower level heuristics and two layer of heuristic search The first or outer layer uses a normal metaheuristic search to find solutions directly from the solution space of the problem The inner layer heuristic searches for the best candidate lower heuristics for the outer layer heuristics in the current problem state As a result the inner search adaptively identifies and exploits different strategies according to the characteristic
41. ps see Figure 2 1 executing the SUT on the regression test suite T S 2 monitoring the service requests and collecting the response messages 3 generating mutated responses by means of mutation operators based on service assumptions 4 for each test case running the SUT and comparing the behaviour of the SUT when receiving the original response and when the mutated ones are received Then 5 the sensitivity mutation score is calculated and 6 test cases are ranked by mutation score EXTERNAL SERVICES WSDL SA REGRESSION L ae TestSuiteTS Va RUN SUT 5 COLLECT O l a _ i P RESPONSES SUT i SERVICE MUTATION i ASSUMPTIONS MUTATE el OPERATORS 5 XML P RESPONSES X A x H P A zi i i O ii DEVELOPER a a aa l ar eee RuNSuT pesos INCREMENT RANK TS RANK TS BY gt Pai Sei A MUTATION MUTATION S Ranked TS gt w SCORE A SCORE _ X l MUTATION e SCORES Fig 2 Activities performed when applying CSTP 3 1 Change Sensitivity Measure In the context of Webservices service clients communicate with services via mes sage passing clients send requests to services and receive responses During an audit testing session at the client side test inputs are provided to the client system which then sends requests to services and receives and processes the responses Finally the outputs from the system are evaluated the test oracle takes usually the form of assertions the
42. reacting to the mutated responses is compared to its behaviour with the original response Any deviation observed i e different outputs as sertions violated crash or runtime exceptions reported implies the change is detected or the mutant is killed Change sensitivity of a test case is measured as the proportion of mutants that are killed by each test case The change sensitivity metrics is then used to prioritize the test cases Only those at higher priority will be run against the real external services during online audit testing The monetary and execution costs associated with such kind of testing make it extremely important to prioritize the test cases such that the top ranked ones are also the most important in terms of capability of revealing actual problems due to service changes that are not managed at the purely syntactic level We conducted a case study to assess the effectiveness of our prioritization method in terms of fault revealing capability under the assumption that the resources for online audit testing are extremely scarce hence the fault revealing test cases should be ranked very high in the prioritized list produced by our method in order for them to have a chance of being actually executed during real audit testing Results show that using only a subset of around 10 of the available test cases those top ranked by our approach most injected faults can be detected 23 Moreover CSTP outperformed coverage based tes
43. rectory for later replay 5 Ifa determined given amount of sequences has been generated stop sequence generation else go to step 1 ow The tool offers the user a simple user interface see Figure 8 there are mainly four buttons which start the RU into its four main modes 1 Start in Spy Mode This mode does not execute any actions It will start the System under Test SUT and allows you to inspect the GUI Simply use the mouse cursor to point on a widget and the Rogue User will display everything it knows about it The Spy Mode will also visualize the set of actions that the Rogue User recognizes so that you can see which ones will be executed during a test 2 Start in Generation Mode This mode will start the SUT and execute a full test on the SUT 3 Start in Replay Mode This mode replays a previously recorded sequence The Rogue User will ask you for the sequence to replay 4 Start in View Mode The View Mode allows you to inspect all steps of a previously recorded sequence Contrary to the Replay Mode it will not ex ecute any actions but only show you the screenshots that were recorded during sequence generation This is ideal if a sequence turns out not to be reproducible Then there are various tabs that allow the tester to configure the tool The general settings tab enables the tester to specifiy where the SUT is how many sequences to generate the maximum length of the sequences etc The Oracle tab helps in spe
44. ritization In CSTP service integrators specify their service assumptions explicitly in order to simplify and automate audit testing of integrated services For the spec ification of the integrator s assumptions we propose an XML based assumption specification language A service assumption consists of an XPath reference 25 to an element in the response under consideration and it can include data re strictions regarding that element Data restrictions have the same format as those defined in the W3C XML Schema 26 Service assumptions specify what a client expects from a service for its own purposes Therefore the data restrictions specified by one service integrator may differ from those in the service definition e g in the WSDL interface of the service or from those specified by another integrator The mutation operators that we propose use service assumptions to inject artificial changes into service responses The changed responses also called mu tated responses and the original ones are then used to measure change sensi tivity of test cases In the following sections we discuss mutation operators and how change sensitivity is measured 3 2 Mutation Operators Based on a survey of service assumptions commonly made on Web service re sponses and on manual inspection of those parts of the service interface which is described only informally usually by annotations or comments inside the WSDL document we identified 9 main data r
45. s blocks to be instrumented as well for each method transitions between these blocks form the method s control flow graph Visit patterns over these blocks can be specified and used as a form of splitters 16 yielding stronger oracles 2 4 Test case generation Two methods for input data generation are available classification trees and search based input generation Classification trees require one additional input a partitioning of the possible inputs into equivalence classes represented as clas sification trees When such classification trees are available the tool combines event sequence generation with pairwise coverage criteria to produce concrete executable test cases combinatorial model based test generation The search based input data generation method can be used when classification trees are unavailable or as a complementary method with respect to classification trees It uses evolutionary search based testing to automatically produce concrete test cases including input values which achieve transition coverage of the model inferred through dynamic model inference Combinatorial model based test generation This method 17 starts from a finite state model and applies model based testing to generate test paths that represent sequences of events to be executed against the SUT Several algorithms are used to generate test paths ranging from simple graph visit algorithms to advanced techniques based on maximum diversi
46. s of the problems it faces A full explanation and evaluation of the algorithm can be found in our technical report 46 5 Rogue User Testing Graphical User Interfaces GUIs represent the main connection point between a software s components and its end users and can be found in almost all modern applications Vendors strive to build more intuitive and efficient interfaces to guarantee a better user experience making them more powerful but at the same time much more complex Especially since the rise of smartphones and tablets this complexity has reached a new level and threatens the efficient testability of applications at the GUI level To cope with this challenge it is necessary to automate the testing process and simulate the rogue user In the FITTEST project we have developed the Rogue User tool RU a Java application which allows to write automated robustness tests for FI applications at the User Interface level and from the user s perspective 49 50 In Figure 6 the approach that the RU tool uses is depicted Q SCAN GUI OBTAIN DERIVE SET OF WIDGET TREE USER ACTIONS a Domain Experts SELECT PROMISING ACTION LEARN PROMISING ACTIONS No DETERMINE Yes STOP SUT SEQUENCE DESIRED Replayable FITNESS LENGTH Erroneous Sequences promising Actions Definitions EXECUTE ACTION Fig 6 The Rogue User testing approach Basically the Rogue User tool works as
47. s originate from the need for shared resources e g local memory for multi threaded applications or network storage for FI applications Shared memory requires access protection Inadequate protection results in data corruption or invalid data reads races The protection mechanisms themselves can lead to further bugs notably deadlocks Other bugs result from broken assumptions about order of actions or about completion time of actions The IBM Concurrency Testing tool ConTest in short facilitates various aspects of handling concurrency bugs of FI applications and is capable of iden tifying concurrency bugs 51 We have improved this tool We have extended ConTest with a number of new capabilities Whereas before it works on mono lithic software it is now significantly extended to support the special concurrency testing and debugging needs of internet based applications Such applications are typically distributed componentized interact in high level protocols such as http and in times are composed in an ad hoc manner The main idea is that concurrency bugs either local ones or distributed across an internet wide system can be expressed in terms of event or message ordering Changes in the ordering of messages and events can expose bugs which are otherwise dormant This is exactly what ConTest does to expose concurrency bugs While execut ing the application with multiple orderings can expose bugs it does not isolate them Within FITT
48. semantic purpose classifications can be grouped into compositions Ad ditionally all classes can be refined using new classifications with classes again In Figure 3 this activity is called Specification of Test Frame and to be found in the upper part Afterwards the specification of constraints or dependency rules as they are called in the classification tree method 30 follows Definition of test cases Having composed the classification tree test cases are defined by combining classes of different classifications For each classifica tion a significant representative class is selected Classifications and classes of classifications are disjoint The Classification Tree Editor CTE is a graphical editor to create and maintain classification trees 31 During the FITTEST project the CTE XL Professional by Berner amp Mattner has received substantial improvement Started as a graphical editor 32 for combinatorial test design 33 the tool now features the following leading edge functionalities Prioritized test case gen eration using weights in the classification tree on the combinatorial aspects allows automatically generating test suites ordered by the importance of each test case easing the selection of valuable subsets 34 35 lower left part of Figure 3 Im portant classes or class tuples are combined into early test cases while less important classes or tuples are used later in the resulting test suite The pri oritizatio
49. st cases from the most sensitive to service changes to the least sensitive one Change sensitivity measures how sensitive a test case is to changes occurring to the service under consideration The rationale underlying this ap proach is that new versions of services may produce service responses that are still compliant with the service interface WSDL but violate some assumptions made often implicitly by the service integrator when building the service com position Thus test cases that are more sensitive to these changes are executed first Specifically we have defined a set of new mutation operators and we apply them to the service responses to inject artificial changes Our mutation operators are based on a survey of implicit assumptions commonly made on Web service responses and on manual inspection of those parts of the service API which is described only informally usually by comments inside the WSDL document After applying our mutations we measure change sensitivity for each test case by comparing the outputs of each test case in two situations with the original response and with the mutated responses A change is detected if the behaviour of the service composition differs when the original service is used as compared to the mutated response Similarly to the mutation score the change sensitivity score is the number of changes detected mutants killed over all changes The measuring of change sensitivity is divided into six ste
50. sumption as an input the corresponding mutation operator can generate new data that are not acceptable Operator Description Enumeration Length MaxLength MinLength MinInclusive MaxInclusive MinExclusive MaxExclusive RegEx Enumeration restriction limits the content of an XML element to a set of acceptable values Using this restriction Enumeration opera tor generates a new value which is not accepted by the restriction to replace the original one Length restriction limits the precise length of the content of an XML element Using this restriction Length operator generates a new content having its length differs from the required one Length restriction limits the length of the content of an XML el ement to be inferior than a specific value Using this restriction MazLength operator generates a new content having its length greater than the allowed one Length restriction requires the length of the content of an XML ele ment to be greater than a specific value Using this restriction Min Length operator generates a new content having its length smaller than the allowed one These restrictions are specified on numeric types e g double inte ger The corresponding mutation operators generate new numeric numbers that are smaller or greater than the acceptable minimum or maximum values Regular expression restriction requires the content of an XML el ement to follow a specific pattern RegEx based opera
51. t case prioritization 4 Combinatorial Testing Combinatorial testing also called combinatorial interaction testing 27 or com binatorial test design is a technique that designs tests for a system under test by combining input parameters For each parameter of the system a value is chosen This collection of parameter values is called test case The set of all test cases constitutes the test suite for the system under test Combinatorial testing can be a good approach to achieve software quality The t wise combination is a good compromise between effort of testing and fault detection rate 28 The most common coverage criterion is 2 wise or pairwise testing that is fulfilled if all possible pairs of values are covered by at least one test case in the result test set 4 1 The classification tree method The classification tree method 29 offers a well studied approach for combi natorial testing Applying the classification tree method involves two steps designing the classification tree and defining test cases Designing the classification tree In the first phase all aspects of inter ests and their disjoint values are identified Aspects of interests also known as parameters are called classifications their corresponding parameter values are called classes Each classification can have any number of disjoint classes describing the occurrence of the parameter All classifications together form the classification tree For
52. t oriented system modelling for model based test planning and for functional coverage analysis Its system modelling and test planning are based on computing Cartesian products of system or test aspects with re strictions applied to the models and henceforth to the computation 3 The Hyper Heuristic Search Algorithm that uses a hyperheuristic search based algorithm to generate test data Hyperheuristics are a new class of Search Based Software Engineering algorithms the members of which use dynamic adaptive optimisation to learn optimisation strategies without supervision Low observability FI applications have low observability their underlying internal states are complex but we are limited in how we can inspect them This makes it problematical for a test case to infer that an application has done something incorrect However quite often an incorrect state when interacted on can eventually trigger an observable failure enabling us to at least observe that something has gone wrong Our rogue user testing tool is a fully automatic testing framework that tests FI applications at the GUI level It uses the operating systems Accessibility API to recognize GUI controls and their properties and enables programmatic interaction with them It derives sets of possible actions for each state that the GUI is in and automatically selects and executes appropriate ones in order to drive the GUI and eventually crash it After starting the target applica
53. the system can later be re tested under ConTest The ordering is saved and every execution produces a trace Both are used to replay a failing test case to the point where the bug is exposed In case that no bug was exposed the ordering is changed by ConTest to cause new interleavings Thes cycle is repeated until some concurrency relevant coverage is reached or until we run out of time 7 Case studies Within the FITTEST project we have carried out case studies at four different companies comprising of in total eight studies in order to evaluate all the tools presented in the previous sections 2 6 The section gives an overview of the case studies for the details of these studies we will refer to their respective re ports The companies IBM SOFTEAM Sulake and Clavei indicated different needs for our tools as shown below IBM SOFTEAM Sulake Clavei continous testing tools J J regression testing tool J combinatoric testing tools y J J rogue user testing tool J y concurrency testing tool J We did not manage to complete the evaluation of the concurrency tool at Sulake as the company left the project before we could finish the study 7 1 Evaluation framework To be able to evaluate the FITTEST testing tools in such a way as to assure that the resulting body of evidence can yield the right guidelines for software testing practitioners about which tool fits his or her needs and which does not th
54. tion it proceeds to scan the GUI in order to obtain the state of all control elements on the screen from which it generates a hierarchical representation called a widget tree This tree enables it to derive a set of sensible actions According to the Rogue User s internal search state it now selects promising actions and executes these to test the system with rogue user behaviour Asynchronous time and load dependent behaviour Testing the concur rent part of an application is made difficult by dependency on factors like communication noise delays message timings and load conditions The re sulting behavior is highly non deterministic Errors that only happen under specific schedules are very difficult to trigger We have enhanced IBM s Con currency Testing tool ConTest which is based on the idea of selectively inserting noises in the application to increase the likelihood to trigger con currency errors The following sections 2 6 will describe these tools they can be down loaded from FITTEST software site Effort has also been taken to evaluate the tools against a number of industrial case studies These case studies are listed in Section 7 with references to the papers or reports that describe the studies in more details 2 Continuous Testing We envisage FI applications to be much more dynamic They are capable of self modifications context and environment dependent autonomous behaviours as well as allowing user defi
55. tors change slightly the original regular expression and generates new values based on the mutated expression CSTP implements six RegEx based operators 23 Table 1 Mutation operators to inject changes in service responses according to the service assumption This is to simulate the fact that when we plan to substitute s with spew Snew may have data that violate the current service integrator s assumptions 3 3 Measuring Change Sensitivity Let us assume that the original responses are monitored and collected locally see Figure 2 at the next step we apply the mutation operators discussed in previous section based on the service integrator s assumption specifications to generate mutated responses For each response and for each assumption we select a mutation operator based on the type of restriction specified The XPath is used to query the elements in the response to be injected with new contents generated by the mutation operator Eventually we obtain a set of N mutated responses each of them containing one change Then each test case in the regression test suite is executed offline against the N mutated responses Instead of querying s to get the response the system receives a mutated response and processes it In this way we can conduct test pri oritization without any extra cost for accessing s if applicable and without any time delay due to the network connection with external services The behaviour of the system
56. ty of the event frequencies and semantic interactions between successive events in the sequence Such paths are then transformed to classification trees using the CTE XL format enriched with domain input classifications such as data types and parti tions Then test combinations are generated from those trees using t way combi natorial criteria Finally they are transformed to an executable format depend ing on the target application e g using Selenium 4 Note that although these test cases are generated from models learned from the SUT s own behaviour the combinatorial stage will trigger fresh behaviour thus testing the SUT against the learned patterns Thanks to the integration with CTE XL other advanced analysis can be performed on the classification trees For example we can impose dependencies on the input classifications and filter the test combinations that violate the dependencies to remove invalid tests Search model based test generation In this approach 18 the ITE trans forms the inferred models and the conditions that characterize the states of the models into a Java program It then uses evolutionary search based testing to automatically produce test cases that achieve branch coverage on the Java repre sentation Specifically we use Evosuite 19 for this task Branch coverage solved by Evosuite has been theoretically shown to be equivalent to transition coverage in the original models Finally the test cases gen
57. uch research in the literature of combinatorial testing has been focused on the development of bespoke algorithms tailored for specific combinatorial prob lems For example some algorithms have been designed for solving unconstrained problems 40 43 while others have been tuned for constrained interaction prob lems 44 45 This leaves tester many different algorithms which have very dif ferent properties Problem occurs when testers are not familiar with these tech niques because an algorithm designed for one type of combinatorial problem may perform poorly when applied to another To address this problem we introduce a simulated annealing hyperheuris tic search based algorithm 46 Hyperheuristics are a new class of Search Based Software Engineering algorithms the members of which use dynamic adaptive optimisation to learn optimisation strategies without supervision 47 48 Our Extract Extract Configuration lt Sy Combinatorial I Model sut Define E LG D Outer search Reinforcement r Heuristic 1 earning Agent Refined armas Solution Heuristic 2 m lL Heuristic _ ee 2 C 4 i I i Heuristic 3 Fitness Evaluation Hyper Heuristic Algorithm Coverino Se ioe experts Y Test Suite Fig 5 The Hyperheuristic Search Algorirhtm workflow hyperheuristic algorithm learns the combinatorial strategy to apply dynamically as it is executed This single algorithm can be a
58. viron ments on different cloud platforms presenting a high number of configura tions and hence presents various challenges to testing 9 We focus on the Web administration console which allows administrators to manage projects created with the Modelio modeling tool 10 and to specify allowed users for working on projects The source code is composed of 50 PHP files with a total of 2141 lines of executable code Currently at SOFTEAM Modelio SaaS test cases are designed manually The process is based on a series of specified use cases to support exploratory testing As indicated before the objective of test design is to maximize use case coverage Each test case describes a sequence of user interactions with the graphical user interface Details and results of these studies can be found in 62 63 61 64 Sulake is a Finnish company that develops social entertainment games and communities whose main product is Habbo Hotel The case study executed at Sulake was related to combinatorial testing with the CTE of Habbo is the world s largest virtual community for teenagers Localized Habbo commu nities are visited by millions of teenagers every week all around the world 65 Habbo Hotel can be accessed via the local Habbo sites and through Face book where all 11 of Habbo Hotel s language versions are also available Through Facebook Connect Habbo users around the world can easily find their Facebook friends in the virtual world
59. whereas CTE would do better in systematically exploring patches in the search space A possible future work is to use available manual test cases as directives when generating combinatoric test cases Generating the specified kinds of combinations can however be non trivial in particular when there are constraints on which combinations are valid There are various heuristics to do this but their performance also depends on the kind of problem we have at hand Our Hyperheuristic Search based tool generates combinations by learning the right composition of its lower level heuristics A study has been carried out showing that this approach is general effective and is able to learn as the problem set changes 46 This approach is however not yet incorporated in either CTE nor Focus CTD this is future work Our rogue user tool can do robustness testing on Internet applications from their GUIs The tool is fully automatic it explores the GUI to try to crash the target application or to make it violates some pre specified oracles In the case studies at Clavei and SOFTTEAM the tool was able to find critical errors that were not found before 64 66 For concurency testing we have improved ConTest with a number of impor tant features it can now be used to test at the system level it can generate load and it can record and replay It is now thus suitable to test Internet appli cations which often form distributed systems e g multiple servers

The FITTEST Tool Suite for Testing Future Internet Applications

Contents

Download Pdf Manuals

Related Search

Related Contents