Home

Chapter 3 Algorithms Design

1. 51 52 and Cy S1 S3 However cardinality of the best solution can be found eliminating Sj because it is a subset of S3 This prune is implemented using a simple cycle for each set remaining in the list of unprocessed sets For every set not added to the solution the algorithm will select subsets of the set and will add them to the list of subsets that are not in the solution The function uses the same parameters explained in the previous section 3 2 2 Unique element subset This pruning rule adds necessary subsets to the solution Lemma 3 2 Let lt X F gt be an instance of set cover Let S and Sa be subsets of the base set X S S2 C F Let x E X be an element of the base set Let furthermore x E S If there is no subset Sg such that x Sy and S So then can be said that S is necessarily part of the solution in this branch because x must part of the union of subset which will be covered by the solution The proof of this is also trivial A corollary is defined as in the previous section Corollary 3 2 Let C C F be a collection of subsets which are necessarily removed from the solution Let X C X be the set of elements covered by C After this restriction the algorithm could have unique elements which belong to only one subset of the remaining subsets in F The same rule of Lemma 3 2 can be applied to these subsets A proof of this is analogous to a proof of the referred lemma The coro
2. all the implementations use a bitmap array to present the solution The reason for this is that it is easier to check whether a set is inside the solution outside the solution or just has not been processed Other advantage is that the positions in the array representing the sets can be directly accessed there is no need of a sequential access Greedy function returns a number representing the maximum cardinality of the so lution The recursive function uses this size when it adds a new set knowing beforehand which branches have not optimal solution The recursive function previously outlined receives the following parameters Sol It comes from solution A value of 1 is set in this variable if the function got a solution in this iteration e Next It is a pointer to the next subset that has not been processed in the solution e Bsets Bitmap of subsets These are arrays where the best solutions are saved It has an efficient access to any position e Subu Union of subsets that has been already included in the solution It will help to avoid recursive calls when a solution has been found e MaxS Maximum size of the solution Since in the second part of the algorithm CHAPTER 3 ALGORITHMS DESIGN 22 a set is added to the solution this bound has to be checked in order to reduce some recursive calls When a better solution is found this bound is updated e Size It has the cardinality of the partial solution in this i
3. implemented in C As the problem is NP complete there are no different complex algorithms An algorithm with a general branch and bound is used to solve set covering It is outlined in 23 This strategy assures the exactness of the algorithm A recursive strategy is used in order to reduce the complexity of the implementation The main idea of the algorithm is to eliminate a set and recursively find the minimum size Later it adds this set to the solution and gets the minimal size Both sizes are CHAPTER 3 ALGORITHMS DESIGN 20 compared and the function returns the set with the best solution It can be shown that this algorithm is On The time grows exponentially con sequently the time to solve a big input will be excessive In order to reduce it some strategies are used to avoid recursive calls to the function The first prune is to determine a starting maximum bound with the help of a greedy approximation algorithm in order to follow only those branches that promise better results This bound is updated as soon as a better solution is found during the run Also on each iteration two intuitive pruning rules are applied These prunes will be explained below They are used until no further pruning can be used In this case recursion is applied The algorithm works as follows fontadjust mathescape function integer set over collection Xi collectionwitness setpartial et unique lement rune Xi doXi lt X odi fied X odi f
4. set It is Ont remember that greedy approach is On and combining all the subsets in pairs is also On Intuitively it gives a better warranty than greedy algorithm However sometimes when the exact solution is an odd number this algorithm has some trouble on getting it as a consequence of selecting one set instead of two The greedy algorithm there works better than greedy by pairs The problem outlined can be easily solved if both strategies are combined In this way there are no disadvantages of using a greedy approach by pairs Other strategy that can be used to solve this problem is to check redundancy on sets or use pruning rules It helps if pruning is used only the first time but some instances get worst solutions if the pruning functions are CHAPTER 3 ALGORITHMS DESIGN 28 applied at every iteration 3 4 Dynamic programming approach The knapsack problem is one of the best solved NP complete problems It is explained and analyzed in 23 There is an algorithm to solve it that gets the best solution in a reasonable time for the average case Problems of this kind are called semi polynomial because they are polynomial according to other variables of the input The algorithm uses a recursive function when it is outlined At this point correctness can be checked This would represent a problem however the partial solutions of the function can be calculated previously converting the time of solution from exponential to polynomia
5. Chapter 3 Algorithms Design An exact algorithm that uses a branch and bound technique was implemented to solve the set covering problem Two polynomial algorithms were also designed and tested In this chapter design of these algorithms will be presented A DLV specification will be presented in the first section The exact algorithm will be explained in Section 3 2 Next two sections detail main ideas about the polynomial algorithms The first polynomial algorithm uses a greedy approach and the other uses a dynamic programming technique Finally in last section some other details about implementation are explained 3 1 DLV specification In this section a specification of the problem using DLV is shown DLV is a disjunctive datalog system It is useful to specify properties of a problem and to allow the inference engine to solve it NP complete problems are in general efficiently solved with logic programming Some variables will be defined to explain the program Let lt X F gt be an instance of the set covering problem Let X be the set of elements and F be a collection of subsets S F where S C X 17 CHAPTER 3 ALGORITHMS DESIGN 18 First it is need to define a range of constants that have the property of being subsets of the set S this is the indexes of S i_subs N N gt 1 number_of_subsets X N lt X int N Then two subsets are defined one of all the elements x X and the other of the elements th
6. at have been already included in the solution toti X in R s R X tot X i_subs R s R X It is asked for the inclusion of a variable subset R in the solution using a disjunctive clause in R V in R i_subs R Then it is defined that there is no solution if all elements x are not included in it tot X not tot1 X Finally the best solution is looked for the minimum number of sets int It means that DLV has to look for a solution with a minimum number of predicates in in other words with a minimum number of subsets in the solution This program is a small specification and it has many advantages It is easy to find mistakes on it and it can be checked in efficiency and effectiveness against the implemented algorithms The input has the next format s S E Where S is the number of subset E is the number of element and s is the predicate which means E belongs to S Example 3 1 Let X be the set a b c d e f Let F 51 S2 53 S4 Let S a b c Se a d S3 b e and S4 c f CHAPTER 3 ALGORITHMS DESIGN 19 This example is coded in DLV as follows s 1 a s 1 b s 1 c s 2 a s 2 d s 3 b s 3 e s 4 c s 4 f The output will be Best model in 2 in 3 in 4 Cost Weight Level lt 3 1 gt It means that sets Sy S3 and S4 are an optimal solution with a cardinality of 3 elements 3 2 Exact algorithm An exact algorithm was
7. ch S in F do card cardinality S if cardjmax Temp S max card The subset with maximum cardinality is added to the solution and remove it from the collection of subsets C C U Temp F F Temp return C As it can observed if the sets have the same size greedy algorithm selects the first set found Therefore it breaks ties with alphanumeric order The greedy algorithm was implemented to define a maximum limit of sets in order to reduce recursion calls The first implemented polynomial algorithm is a greedy variant It uses two different strategies which will be explained throughout this section The first strategy is the use of an exact algorithm at the lower level It increments the exactness of the greedy at the end where more ties can be found in the moment of selecting the set with the highest cardinality Therefore the general algorithm stays as follows 1 Apply general greedy in order to find the minimal bound 2 It is defined a minimum limit in order to call the exact algorithm 3 If the limit is bigger than some constant k apply a greedy strategy on each iteration until this limit is reached If it is not go to the next step 4 Apply the exact algorithm to the rest of sets Let n be the number of subsets obtained in the initial greedy It is defined a constant c 2 x logan For example if there are 500 subsets obtained the exact algorithm after c 2 x log 500 is called this is 18 elements In or
8. der to use improve exactness of the algorithm the constant c is defined as c maz k c where k is a number the CHAPTER 3 ALGORITHMS DESIGN 27 number of subsets such that exact algorithm still finds the solution in a reasonable time This constant was defined in order to conserve the property of the algorithm of being polynomial since branch and bound is exhaustive and 0 2 Therefore this algorithm is O n 2 l082 O n this is n Note that if the constant that multiplies the logarithm is increased the order of this algorithm also increases For example if it has 2 82 the algorithm increases to On Although it is still polynomial it was preferred to have at most the same complexity This approach improves the greedy algorithm solutions However it has of course some error margin compared against the exact algorithm Hence there will be presented the second strategy used to reduce error in the polynomial algorithm It consists into taking pairs of sets instead of single sets on each iteration Let X be a collection of subsets Sp C S Pairs of this subsets are selected this is lt Si Sj gt This pairs are elements of S x S Greedy algorithm is applied over this pairs This means that it will select the pair with the highest cardinality until the set S is completely covered This algorithm also breaks ties using alphanumeric order The strategy of using pairs diminishes the probability of selecting a local best
9. e algorithm The program stops when it is in the set N and the set saved in the table is the same than universal set The column j is returned as the result M A table is used for saving the results By the nature of the function only the previous line of results is necessary Therefore the table has only two rows and uses a module 2 operation to access the rows This has the purpose of saving memory space because sometimes the input can be really big A complete table of the cardinalities is used however in order to help in the recovery of the best sets defined in the solution The witness recovery is just a pointer to the matrix that backs down the rows It checks if there is any difference between two rows 2 and i 1 of the matrix If they are different it means that the row 7 was important and was included in the solution It is CHAPTER 3 ALGORITHMS DESIGN 31 important then to follow the correct path The column to start is the column M called j of the algorithm For example if the algorithm returns a best solution of 5 elements and the number of sets is 20 it starts jumping back and it stops when it arrives to the first column 3 5 General structure of the implementation The software will be divided in different sections with the goal of integrating these modules and exchange them to do the proofs 3 5 1 Methodology To develop the software a simple cascade method of software engineering consisting on four step
10. he list of element of the set N S2 is defined as the elements of N 1 this is when the set has not been added yet S3 is defined as the maximum set of elements until N 1 taking M 1 sets S4 is the union of S1 and S3 this is the largest set including N Therefore Out is defined as the largest set between S2 and 4 It can be easily mapped to the function 3 1 It is important the definition of the function which gets the maximum between two sets maxc S1 S2 S1 length S1 N length S2 M N M maxc S1 S2 S1 length S1 N length S2 M N gt M maxc S1 S2 S2 length S1 N length S2 M M gt N CHAPTER 3 ALGORITHMS DESIGN 30 The sets operation of union has also to be defined union X X 1 union X X union X Y Z X W not member X Z union Y Z W union X Y Z W member X Z union Y Z W The input is the same than in DLV program It is defined as s N E where N is the number of set and E is the element belonging to this subset XSB is a language that manages queries In this case the query would be dsc 4 3 Out The output format is the maximum set found which means that it is a list of elements Out a b c d e However this program is used only for some small examples where the number M of minimal subsets is known or at least there is an approximate idea The real program is not that simple It fills a table using 7 7 as the parameters n and m are used in th
11. ied subset runing X1 Xi Xmodi fied Xmc unique lement rune X 1 partial et witness while X 0di fied lt gt Xi set Selection set S selected et X odi fied Eliminate the set from the list of missing sets Xi Xi S Add the set and check if solution is found partial et partial etU S if S C partial et returnpartial ize 1 Find the best solution without this set sizel setCover Xi partial et Si S partial ize ind the best solution with this set partial ize partial ize lif partial ize 1 gt max Size size2 set over Xi witness partial et S maxSize partial ize if size2 sizel witness witness U 5 return size2 else return sizel The functions subset_pruning and unique_element_prune are explained in Sections 3 2 1 and 3 2 2 respectively The function selected_set selects a subset to be proved This set can be selected CHAPTER 3 ALGORITHMS DESIGN 21 by different strategies randomly selected the first one or the last one of the list It can use a more sophisticated strategy for instance the set that has more elements or more intersections with other sets In the implementation the first set of the list is selected A more sophisticated strategy was not implemented because exact algorithm is going to be used for small inputs First of all a function that gets the greedy solution of the problem is called The function returns a bitmap array of the selected sets Actually
12. in the exact algorithm It is useful also in prunes As the main program there is a module that can call the algorithms once or many times in order to run many inputs and to work with them The output is the running time of the entries and some other statistics like prunes ties in greedy algorithm and some other stuff useful to find some properties and conclusions of the algorithms All the algorithms and libraries were programmed in C A library of set manipulation was taken from 2 This library implements set operations in a very efficient way It uses bit level operations and logic operations It was used as a module of the system The advantage of bit operations is that logic operations meaning set operations increased the velocity of the set operations crucial for a good performance of the algorithms The operations found in the library were intersect Finds an intersection between two sets difference Finds the difference between two sets unite Finds the union between two sets add_element Adds an element to a given set remove_element Removes an element from a given set CHAPTER 3 ALGORITHMS DESIGN 33 clear_set Removes all elements from a set print_set Prints the elements of a set to the standard output isin Tests if an element belongs to a given set Some other functions were added to the library using the same structure of char array and bits cardinality Finds the number of elements of a given set i
13. is a union operation Case 2 If X U and X R then X is not unique so X must not be in U at the end of the iteration It can be seen as in the previous case that X remains belonging to U2 After step 1 X U but after step 2 X U X U2 so X R by step 3 Therefore X U after step 4 and 5 Case 3 If X U and X ER a X is a unique element Then X R until step 4 After it X U and by step 5 X U2 too b X is not a unique element this is X U2 Then after step 3 X R so X U but X U2 as it was supposed to be Now it is necessary to find the subsets containing unique elements x U These sets will be added to the solution Pruning functions use also the parameters for exact algorithm However they use two more flags e EndF A value of 1 indicates that the brand has ended this means that the algorithm has found a solution or there is no solution e Sol This flag indicates if there is or not a solution 3 3 Polynomial Algorithm using a Greedy Approach Greedy algorithm is one of the best polynomial approaches found for set covering Its approximation ratio is bounded by the function ln m Inlnm 1 The following CHAPTER 3 ALGORITHMS DESIGN 26 pseudocode is the general algorithm The function cardinality S gets the number of elements in a set fontadjust mathescape Greedy Set Covering set X collection F C while C X max 0 forea
14. l An algorithm using the same approach was designed to solve set covering It is defined with to variables n and m where n is the total number of subsets of a list and m is the number of subsets supposed to be minimal The goal is to find a function that maximizes the number of elements using m subsets This function is similar to that one of the knapsack problem Whether the set is added or not is decided in each function Let dsc be the function for the algorithm Let card be a function that obtains the cardinality of any set It can be defined as follows 0 ifm 0 orn 0 dsc n m 4 max dsc n 1 m 3 1 card n Udsc n 1 m 1 Otherwise where function max is defined as a ifa gt b max a b 3 2 b ifb gt a CHAPTER 3 ALGORITHMS DESIGN 29 The problem is solved in polynomial time However there is an error bound over the optimal value One of the reasons in addition to the NP complete nature of the problem is that the maximum value is not always precise This algorithm was first defined in XSB language The complete set is defined seti X L setof Y s X Y L Then the base cases of the function are defined setc N 0 setc 0O M Then the recursive function is defined setc N M Out seti N S1 N1 is N 1 setc N1 M S2 N gt 0 M gt O M1 is M 1 setc N1 M1 S3 union S1 83 S4 maxc S2 S4 Out It means Maximal output of N sets using M sets is Out S1 is defined as t
15. llaries give the possibility of applying the pruning rules at any level of the algorithm as it was defined in the exact algorithm proposed in the beginning of Section 3 2 The algorithm used for this pruning rule is presented here CHAPTER 3 ALGORITHMS DESIGN 24 fontadjust mathescape unique ement rune collectionX i setpartial et U 0 U2 For each X Xi R X partial et SteplU U R Step2R R U2 Step 3 U R U U Step 4 U2 RU U2 Step 5 Lemma 3 3 Letn be the number of elements If the algorithm is applied to every set S E Xi it can be proven that at the end of any iteration U2 is the set of all elements which appear at least once Moreover U is the set of unique elements elements that have appeared exactly once Proof It will be proven by induction over the cardinality of Xi Base case Following the steps of the algorithm Ugo U2 1 Re S1 0 R S1 2 U 3 ROR 4 U R U eel 5 U2 R U 82 U and U2 have the properties enunciated before An element X with different properties will be taken to demonstrate that properties remain Case 1 If X U and X R then X is unique so X must be in U and U2 at the end of the iteration X U It can be seen that as X R X U after steps 2 and 4 CHAPTER 3 ALGORITHMS DESIGN 25 X U2 If X U then by hypothesis of induction X U2 in the beginning U2 is modified in step 5 however no elements are removed because it
16. s was followed 1 The first step was to analyze the problem This is the shortest step because the typing problem and the conversion to set covering had already been done On the other side set covering is a well known problem and lots of literature about its description definition and even algorithms and proofs can be found This part was outlined in Chapter 1 2 The second part was to select out and design algorithms of those that have proven to be good 3 The third step was the implementation of the algorithms These steps 2 and 3 are explained in Chapter 3 4 Finally they were tested It was necessary to define the variables to be observed In this case the most important parameters are time size of input and exactness CHAPTER 3 ALGORITHMS DESIGN 32 in the case of polynomial algorithms There were two kinds of input The first one was the random input generated using a program that creates instances of set covering The second type is input from a real database of instances of typing sequences which helps to validate the results and to observe how different algorithms work on this kind of information There are different modules for the program For example it is important to have modules for reading input as its format could vary There is also used a module for set operations like union and intersection There is also a set of variables called state of solution defined for every iteration in greedy or
17. s_empty Tests if a given set is the empty set 0 subset Tests if a set is subset of another given set copy_to Returns a set with the same elements than a given set As it the problem was designed using a structured approach instead of object ori ented programming The diagrams and Documentation the software can be seen in Appendix A where Data Flow Diagrams are presented as well as function call dia grams and a class diagram The class diagram only presents the general structure of the software for it only shows how the libraries of the methods are used User s Manual of the software is found in Appendix B 3 5 2 Input and output of algorithms The input read has to be transformed to a set of character arrays Up to now it has been implemented only in one format First two numbers are read setsize and nofssets the total number of elements this is X and the number of subsets The sets are read as pairs of numbers lt x y gt where x is the number of set 0 lt x lt nofssets and y is the number of element 0 lt y lt setsize The output is the minimal number of subsets found by the algorithm The list of subsets selected as part of the solution is also printed this is a list of numbers z 0 lt x lt nofssets representing the number assigned to the subsets
18. teration The output of the function is the minimum size found taking the partial solution as described before 3 2 1 Subsets pruning It is explained now how this pruning rule works Lemma 3 1 Let lt X F gt be an instance of set cover Let S and Sj be subsets of the base set X This is S 5 C F It can be shown that if Si C S then S can be eliminated as if it were not part of an optimal solution This is because if S were part of any solution there would be other solutions equal or better with the subset S Proof of this is trivial Corollary 3 1 Let C C F be a collection of subsets which are necessarily added to the solution Let X C X be the set of elements covered by C Notice that this is not exactly a set covering problem because there is a set of subsets which is fixed However Lemma 3 1 can still be applied as follows It can be shown that if S X S X then again S 1s not part of the solution The proof of this is similar to a proof of Lemma 3 1 It is important here to remark that this does not mean that S could not be part of an optimal solution Therefore if all the optimal solutions are serached follow a path where 5S is part of the solution is needed For example Example 3 2 Let lt F 5S 52 53 S4 X a b c gt be an instance of set covering It is defined S a b 52 c and S3 b c There are two optimal CHAPTER 3 ALGORITHMS DESIGN 23 solutions C1

Chapter 3 Algorithms Design

Contents

Download Pdf Manuals

Related Search

Related Contents