Home

root parent child leaf node edge - TAMU Computer Science Faculty

1. e search and delete O n Worst case is if all n ele ments hash to same location CPSC 211 Data Structures amp Implementations c Texas A amp M University 270 Good Hash Functions for Chaining Intuition Hash function should spread out the data evenly among the M entries in the table More formally any key should be equally likely to hash to any of the M locations Impractical to check in practice since the probability distribution on the keys is usually not known For example Suppose the symbol table in a compiler is implemented with a hash table The compiler writer cannot know in advance which variable names will ap pear in each program to be compiled Heuristics are used to approximate this condition try something that seems reasonable and run some exper iments to see how it works CPSC 211 Data Structures amp Implementations c Texas A amp M University 271 Good Hash Functions for Chaining cont d Some issues to consider in choosing a hash function e Exploit application specific information For sym bol table example take into account the kinds of variables names that people often choose e g x1 Try to avoid collisions on these names e Hash function should depend on all the information in the keys For example if the keys are English words it is not a good idea to hash on the first letter since many words begin with S and few with X CPSC 211 Data Structures amp Implementations c
2. Bottom up testing Start with the bottom level meth ods and classes those that don t call or rely on any others Test them thoroughly with drivers Then progress to the next level up those methods and classes that only use the bottom level ones already tested Use a driver to test combinations of the bottom two layers Proceed until you are testing the entire program CPSC 211 Data Structures amp Implementations c Texas A amp M University 305 Kinds of Testing cont d Top down testing proceeds in the opposite direction making extensive use of stubs Reasons to do top down testing e to allow software development to proceed in parallel with several people working on different compo nents that will then be combined you don t have to wait until all levels below yours are done before testing it e if you have modules that are mutually dependent e g X uses Y Y uses Z and Z uses X You can test the pieces independently CPSC 211 Data Structures amp Implementations c Texas A amp M University 306 Other Approaches to Debugging In addition to testing another approach to debugging a program is to visually inspect the code just look through it and try to see if you spot errors A third approach is called a code walk through a group of people sit together and simulate or walk through the code seeing if it works the way it should Some companies give your group s code to ano
3. And in fact how do you know that your requirements correctly captured the customer s intent However testing still serves a worthwhile pragmatic purpose CPSC 211 Data Structures amp Implementations c Texas A amp M University 303 Test Cases Plans and Logs Run the program on various test cases Test cases should cover every line of code in every method in cluding constructors More specifically e test on valid input over a wide range e test on valid inputs at the limits of the range e test on invalid inputs Organize your test cases according to a test plan it 1s a document that describes how you intend to test your program Purposes e make it clear what a program should produce e ensure that testing is repeatable Results of running a set of tests is a test log should demonstrate that each test produced the output pre dicted by the test plan After fixing a bug you must rerun your ENTIRE test plan Winder and Roberts calls this the Princi ple of Maximum Paranoia CPSC 211 Data Structures amp Implementations c Texas A amp M University 304 Kinds of Testing Unit testing test a method M all by itself using e adriver program that calls M with the arguments of interest and e stubs for the methods that M calls a stub returns some hard coded value without doing any real cal culations Integration testing test the methods combined with each other Two approaches to integration testing
4. Bubble x up the tree until finding a correct place if x lt parent s key then swap keys and continue Time O log n To implement the priority queue operation remove Tricky part is how to remove the root without messing up the tree structure 1 Swap the key in the root with the key call it y in the rightmost node on the bottom level 2 Remove the rightmost node on the bottom level 3 Bubble down the new root s key y until finding a correct place if y gt at least one child s key then swap y with smallest child s key and continue Time O log n CPSC 211 Data Structures amp Implementations c Texas A amp M University 227 Using a Heap to Implement a PQ cont d a Pas Pa ge ge a 2 2 Pr o pa y G d O n o No longer have the severe tradeoffs of the array and linked list representations of priority queue By keep ing the keys semi sorted instead of fully sorted we can decrease the tradeoff between the costs of insert and remove CPSC 211 Data Structures amp Implementations c Texas A amp M University 228 Heap Sort Recall the sorting algorithm that used a priority queue 1 insert the elements to be sorted one by one into a priority queue 2 remove the elements one by one from the priority queue they will come out in sorted order If the priority queue is implemented with a heap the running time is O n log n This is much bett
5. Texas A amp M University 272 Average Case Analysis of Chaining Define load factor of hash table with M entries and n keys to be A n M How full the table is Assume a hash function that is ideal for chaining any key is equally likely to hash to any of the M loca tions Fact Average length of each linked list is n M A The average running time for chaining e Insert O 1 same as worst case e Unsuccessful Search O 1 A O 1 time to com pute h k A items on average in the linked list are checked until discovering that k is not present e Successful Search O 1 2 O 1 time to com pute h k on average key being sought is in middle of linked list so A 2 comparisons needed to find k e Delete Essentially the same as search For these times to be O 1 A must be O 1 son cannot be too much larger than M CPSC 211 Data Structures amp Implementations c Texas A amp M University 273 Open Addressing With this scheme there are no linked lists Instead all elements are stored in the table proper If there is a collision you have to probe the table check whether a slot table entry is empty repeatedly until finding an empty slot You must pick a pattern that you will use to probe the table The simplest pattern is to start at h k and then check h k 1 h k 2 h k 3 wrapping around to check 0 1 2 etc if necessary until finding an empty slot This is called line
6. k gt the key of x rightchild x insert rightchild x k return x endif Insert called on node z returns node z unless z is null in which case insert returns a new node As a result a child of a node is only changed if it is originally non existent Running Time O d O n CPSC 211 Data Structures amp Implementations c Texas A amp M University 246 Inserting into a BST cont d 6 ww iG CPSC 211 Data Structures amp Implementations c Texas A amp M University 247 Finding Min and Max in Binary Search Tree Fact The smallest key in a binary tree is found by following left children as far as you can Sh d sY Sa Running Time O d O n Guess how to find the largest key and how long it takes Min is 4 and max is 27 CPSC 211 Data Structures amp Implementations c Texas A amp M University 248 Printing a BST in Sorted Order Cute tie in between tree traversals and BST s Theorem Inorder traversal of a binary search tree vis its the nodes in sorted order of the keys Inorder traversal on previous tree gives 4 10 13 16 17 19 20 22 26 27 Proof Let s look at some small cases and then use induction for the general case Case 1 Single node Obvious Case 2 Two nodes Check the two cases Case n Suppose true for trees of size 1 through n 1 Consider a tree of size n CPSC 211 Data Structures amp Implementations c Texas A amp M University 249 Pr
7. endwhile return minkey minChild returns index of child w smaller key min Lchild if Rchild lt n amp amp A Rchild lt A Lchild then node has a right child and it is smaller min RChild endif return min CPSC 211 Data Structures amp Implementations c Texas A amp M University 233 Binary Tree Traversals Now consider any kind of binary tree with data in the nodes not just leftmost binary trees 99 In many applications we need to traverse a tree visit each node exactly once When the node is visited some computation can take place such as printing the key There are three popular kinds of traversals differing in the order in which each node is visited in relation to the order in which its left and right subtrees are visited e inorder traversal visit the node IN between vis iting the left subtree and visiting the right subtree e preorder traversal visit the node BEFORE visit ing either subtree e postorder traversal visit the node AFTER visit ing both its subtrees In all cases it is assumed that the left subtree is visited before the right subtree CPSC 211 Data Structures amp Implementations c Texas A amp M University 234 Binary Tree Traversals cont d preorder x if x is not empty then visit x preorder leftchild x preorder rightchild x inorder xX if x is not empty then inorder leftchild x visit x inorder rightchild x postorder x
8. if x is not empty then postorder leftchild x postorder rightchild x visit x a we Say E r e preorder abdecfghi e inorder debafchgi e postorder edbfhigce CPSC 211 Data Structures amp Implementations c Texas A amp M University 235 Binary Tree Traversals cont d These traversals are particularly interesting when the binary tree is a parse tree for an arithmetic expression e Postorder traversal results in the postfix representa tion of the expression e Preorder gives prefix representation e Does inorder give infix No because there are no parentheses to indicate precedence kK ai wt YN 3 5 2 l e preorder 53 21 e inorder 53 21 e postorder 5 3 2 1 CPSC 211 Data Structures amp Implementations c Texas A amp M University 236 Representation of a Binary Tree The most straightforward representation for an arbi trary binary tree is a linked structure where each node has e key e pointer to right child e pointer to left child Notice that the array representation used for a heap will not work because the structure of the tree is not necessarily very regular class TreeNode Object data data in the node TreeNode left ye Nest ee et TreeNode right f rigbt child constructor goes here void visit what to do when node is visited CPSC 211 Data Structures amp Implementations c Texas A amp M University 237 Representation of
9. only child Case 3 x has two children Use the same strategy as binary heap Instead of removing the root node choose another node that is easier to remove and swap the data in the nodes 1 Find the successor or predecessor of x call it y It is guaranteed to exist since z has two children 2 Delete y from the tree Since y 1s the successor it has no left child and thus it can be dealt with using either Case 1 or Case 2 3 Replace x s key with y s key Running Time O d O n CPSC 211 Data Structures amp Implementations c Texas A amp M University 255 Deleting a Node from a BST cont d S A B after deleting 13 then 26 then 10 so d o amp CPSC 211 Data Structures amp Implementations c Texas A amp M University 256 Balanced Search Trees We would like to come up with a way to keep a binary search tree balanced so that the depth is O log n and thus the running time for the BST operations will be O log n instead of O n There are a number of schemes that have been devised We will briefly look at a few of them They all require much more complicated algorithms for insertion and deletion in order to make sure that the tree stays balanced The algorithms for searching finding min max prede cessor or successor are essentially the same as for the basic BST Next few slides give the main idea for the definitions of the trees but not why the definitions give O
10. since keys are stored directly in the table Assume that there is always at least one empty slot Assume that the hash function ensures that each key is equally likely to have each permutation of 0 1 M 1 as its probe sequence Average case running times e Unsuccessful Search O h e Insert Essentially same as unsuccessful search e Successful Search O s In a where In is the natural log base e 2 7 e Delete Essentially same as search The reasoning behind these formulas requires more so phisticated probability than for chaining CPSC 211 Data Structures amp Implementations c Texas A amp M University 281 Sanity Check for Open Addressing Analysis The time for searches should increase as the load factor increases The formula for unsuccessful search is O h e As n gets closer to M X gets closer to 1 e so 1 A gets closer to 0 SO zh gets larger At the extreme when n M 1 the formula ay M meaning that you will search the entire table before discovering that the key is not there CPSC 211 Data Structures amp Implementations c Texas A amp M University 282 Sorting e Insertion Sort Consider each element in the array starting at the beginning Shift the preceding already sorted elements one place to the right until finding the proper place for the current element Insert the current element into its place Worst case time is O n7 e Treesor
11. value playGame throw getValue Player Cup name dice score throwDice playTurn getValue getScore This is a diagram of the classes not the objects Ob ject diagrams are trickier since objects come and go dynamically during execution Double check that the class diagram is consistent with requirements scenarios CPSC 211 Data Structures amp Implementations c Texas A amp M University 297 Object Oriented Analysis and Design cont d While fleshing out the design after identifying what the different methods of the classes should be figure out how the methods will work This means deciding what algorithms and associated data structures to use Do not fall in love with one particular solution such as the first one that occurs to you Generate as many different possible solutions as is practical and then try to identify the best one Do not commit to a particular solution too early in the process Concentrate on what should be done not how until as late as possible The use of ADTs assists in this aspect CPSC 211 Data Structures amp Implementations c Texas A amp M University 298 Verification and Correctness Proofs Part of the design includes deciding on or coming up with new algorithms to use You should have some convincing argument as to why these algorithms are correct In many cases it will be obvious e trivial action such as a table lookup e using a well known algorit
12. 9 Then store elements in an array A with 100 entries Initialize all entries to some empty indicator e To insert x with key k A k x e To search for key k check if A k 1s empty e To delete element with key k A k empty All times are O 1 0 1 2 99 x0 x2 x99 1 keyis0 keyis2 key is 99 But this idea does not scale well CPSC 211 Data Structures amp Implementations c Texas A amp M University 266 Hash Functions Suppose e elements are student records e school has 40 000 students e keys are social security numbers OO0 00 0000 Since there are billion possible SSN s we need an array of length 1 billion And most of it will be wasted since only 40 000 1 000 000 000 1 25 000 fraction 1s nonempty Instead we need a way to condense the keys into a smaller range Let M be the size of the array we are willing to provide Use a hash function A to convert each key to an array index Then h maps key values to integers in the range Oto M 1 CPSC 211 Data Structures amp Implementations c Texas A amp M University 267 Simple Hash Function Example Suppose keys are integers Let the hash function be h k k mod M Notice that this always gives you something in the range 0 to M 1 an array index e To insert x with key k Alh k x e To search for element with key k check if A h k is empty e To delete element with key k set Alh k to empty All tim
13. CPSC 211 Data Structures amp Implementations c Texas A amp M University 221 Trees Important terminology child _ leaf Some uses of trees e model arithmetic expressions and other expressions to be parsed e model game theory approaches to solving problems nodes are configurations children result from dif ferent moves e aclever implementation of priority queue ADT e search trees each node holds a data item CPSC 211 Data Structures amp Implementations c Texas A amp M University 222 Trees cont d Some more terms e path sequence of edges each edge starts with the node where the previous edge ends e length of path number of edges in it e height of a node length of longest path from the node to a leaf e height of tree height of root e depth or level of a node length of path from root to the node e depth of tree maximum depth of any leaf Fact The depth of a tree equals the height of the tree height depth 4 0 es obs i 2 2 0 1 r 3 0 4 CPSC 211 Data Structures amp Implementations c Texas A amp M University 223 Binary Trees Binary tree a tree in which each node has at most two children PRA LL IS Complete binary tree tree in which all leaves are on the same level and each non leaf node has exactly two _o AK Important Facts e A complete binary tree with L levels contains ee nodes e A complete binary tree with n nodes has approxi mately l
14. M University 285 Small Scale vs Large Scale Programming Programming in the small programs done by one person in a few hours or days whose length is just a few pages typically under 1000 lines of code Programming in the large projects consisting of many people taking many months and producing thousands of lines of code Obviously the complications are much greater here The field of software engineering is mostly oriented toward how to do programming in the large well How ever the principles still hold although simplified for programming in the small It s worth understanding these principles so that e you can write better small programs and e you will have a base of understanding for when you go on to large programs CPSC 211 Data Structures amp Implementations c Texas A amp M University 286 Object Oriented Software Engineering Software engineering studies how to define design implement and maintain software systems Object oriented software engineering uses notions of classes objects and inheritance to achieve these goals Why object oriented e use of abstractions to control complexity focus on one subproblem at a time e benefits of encapsulation to prevent unwanted side effects e power of inheritance to reuse sofware Experience has shown that object oriented software en gineering e helps create robust reliable programs with clean de signs and e promotes the development of programs by
15. a Binary Tree cont d class Tree TreeNode root other information void preorderTraversal preorder root preorder TreeNode t if t null stopping case for recursion kevisit user defined visit method preorder t Jere preorder ts TIGNE But we haven t yet talked about how you actually MAKE a binary tree We ll do that next when we talk about binary SEARCH trees CPSC 211 Data Structures amp Implementations c Texas A amp M University 238 Dictionary ADT Specification So far we ve seen the abstract data types e priority queue with operations insert remove min e stack with operations push pop queue with operations enq deq e list with operations insert delete replace Another useful ADT is a dictionary or table The abstract state of a dictionary is a set of elements each of which has a key The main operations are insert an element e delete an arbitrary element not necessarily the high est priority one e search for a particular element Some additional operations are e find the minimum element e find the maximum element e print out all elements in sorted order CPSC 211 Data Structures amp Implementations c Texas A amp M University 239 Dictionary ADT Applications The dictionary or table ADT is useful in database type applications For instance student records at a university can be kept in a dictionary
16. ar probing 0o 1 2 3 4 5 6 7 8 F FF F F F F If h k 7 the probe sequence will be 7 8 0 1 2 3 F means full CPSC 211 Data Structures amp Implementations c Texas A amp M University 274 Clustering A problem with linear probing clusters can build up A cluster is a contiguous group of full slots If an insert probe sequence begins in a cluster e it takes a while to get out of the cluster to find an empty slot e then inserting the new element just makes the clus ter even bigger To reduce clustering change the probe increment to skip over some locations so locations are not checked in linear order There are various schemes for how to choose the incre ments in fact the increment to use can be a function of how many probes you have already done CPSC 211 Data Structures amp Implementations c Texas A amp M University 275 Clustering cont d 0 1 2 3 4 5 6 7 F SE ue ES 2b By ob If the probe sequence starts at 7 and the probe incre ment is 4 then the probe sequence will be 7 2 6 Warning The probe increment must be relatively prime to the table size meaning that they have no common factors otherwise you will not search all locations For example suppose you have table size 9 and incre ment 3 You will only search 1 3 of the table locations CPSC 211 Data Structures amp Implementations c Texas A amp M U
17. are needed CPSC 211 Data Structures amp Implementations c Texas A amp M University 311 Software Reuse and Bottom up Programming The bottom line from section C 7 in Standish is e the effort required to build software is an exponen tial function of the size of the software e making use of reusable components can reduce the size of the software you have to build So it makes lots of sense to try to reuse software Of course there are costs associated with reuse e economic costs of purchasing the reusable compo nents e adapting or constraining your design so that it is compatible with the reusable components Using lots of reusable components leads to more bottom up rather than top down programming Or perhaps more appropriately middle out as mentioned last time CPSC 211 Data Structures amp Implementations c Texas A amp M University 312 Design Patterns As you gain experience you will learn to recognize good and bad design and build up a catalog of good solutions to design problems that keep cropping up Why not try to exploit other people s experience in this area as well A design pattern captures a component of a complete design that has been observed to recur in a number of situations It provides both a solution to a problem and information about them There is a growing literature on design patterns espe cially for object oriented programming It is worth while to become familiar with
18. combin ing existing components CPSC 211 Data Structures amp Implementations c Texas A amp M University 287 Object Oriented Software Engineering cont d Solutions to specific problems tend to be fragile and short lived any change to the requirements can result in Massive revisions To minimize effects of requirement changes capture general aspects of the problem domain e g student record keeping at a university instead of just focusing on how to solve a specific problem e g printing out all students in alphabetical order Usually the problem domain is fairly stable whereas a specific problem can change rapidly If you capture the problem domain as the core of your design then the code is likely to be more sta ble reusable and adaptable More traditional structured programming tends to lead to a strictly top down way of creating programs which then have rigid structure and centralized control and thus are difficult to modify CPSC 211 Data Structures amp Implementations c Texas A amp M University 288 Object Oriented Software Engineering cont d In OO analysis and design identify the abstractions needed by the program and model them as classes Leads to middle out design e go downwards to flesh out the details of how to implement the classes e go upwards to relate the classes to each other in cluding inheritance relationships and use the classes to solve the problem This approach tend
19. ction of notations and conventions that supposedly help people with the analysis design and implementation process Particularly aimed at large scale projects CPSC 211 Data Structures amp Implementations c Texas A amp M University 295 An Example OO Analysis Method CRC Class Responsibility Collaboration It clearly identifies the Classes what the Responsibilities are of each class and how the classes Collaborate interact In the CRC method you draw class diagrams e each class is indicated by a rectangle containing name of class list of attributes instance variables list of methods e if class 1 is a subclass of class 2 then draw an arrow from class 1 rectangle to class 2 rectangle e if an object of class 1 is part of an instance variable of class 2 then draw a line from class 1 rectangle to class 2 rectangle with a diamond at end e if objects of class 1 need to communicate with ob jects of class 2 then draw a plain line connecting the two rectangles The arrows and lines can be annotated to indicate the number of objects involved the role they play etc CPSC 211 Data Structures amp Implementations c Texas A amp M University 296 CRC Example To model a game with several players who take turns throwing a cup containing dice in which some scoring system is used to determine the best score Game Die players
20. data structure e When a new student enrolls an insert is done e When a student graduates a delete is done e When information about a student needs to be up dated a search is done using either the name or ID number as the key Once the search has located the record for that stu dent the data can be updated e When information about student needs to be retrieved a search is done The world is full of information databases many of them extremely large imagine what the IRS has When the number of elements gets very large efficient implementations of the dictionary ADT are essential CPSC 211 Data Structures amp Implementations c Texas A amp M University 240 Dictionary Implementations We will study a number of implementations Search Trees e basic binary search trees e balanced search trees AVL trees binary red black trees binary B trees not binary e tries not binary Hash Tables open addressing e chaining CPSC 211 Data Structures amp Implementations c Texas A amp M University 241 Binary Search Tree Recall the heap ordering property for binary heaps each node s key is smaller than the keys in both chil dren Another ordering property is the binary search tree property for each node z e all keys in the left subtree of x are less than the key in z and e all keys in the right subtree of x are greater than the key in z A binary search tree BST
21. e idea is to make sure that all root to leaf paths have exactly the same length and allow variation in the number of children The definition of a B tree uses a parameter m e every leaf has the same depth e the root has at most m children e every non root node has from m 2 to m children Keys are placed into nodes like this e Each non leaf node has one fewer keys than it has children Each key is between two child pointers e Each leaf node has between m 2 1 and m 1 keys in it unless it is also the root in which case it has between and m 1 keys in it e The keys within a node are listed in increasing order CPSC 211 Data Structures amp Implementations c Texas A amp M University 260 B Trees cont d And we require the extended search tree property e For each node z the 2 th key in z is larger than all the keys in x s 2 th subtree and is smaller than all the keys in x s i 1 st subtree 13 2 3 6 7 10 11 12 14 16 18 19 22 23 25126 B trees are extensively used in the real world for in stance database applications In practice m is very large such as 512 or 1024 Theorem The depth of a B tree tree is O log n Insert and delete algorithms are quite involved CPSC 211 Data Structures amp Implementations c Texas A amp M University 261 Tries In the previous
22. er than O n7 This algorithm is called heap sort CPSC 211 Data Structures amp Implementations c Texas A amp M University 229 Linked Structure Implementation of Heap To implement a heap with a linked structure each node of the tree will be represented with an object containing e key data e pointer to parent node e pointer to left child e pointer to right child To find the next available location for insert or the rightmost node on the bottom level for remove in con stant time keep all nodes on the same level in a linked list Thus each node will also have e pointer to left neighbor on same level e pointer to right neighbor on same level Then keep a single pointer for the whole heap that points to the rightmost node on the bottom level Q oe CPSC 211 Data Structures amp Implementations c Texas A amp M University 230 Array Implementation of Heap Fortunately there s a nifty way to implement a heap using an array based on an interesting observation If you number the nodes in a leftmost binary tree starting at the root and going across levels and down levels you see a pattern 2 ae io e FN 4 5 6 7 IN 8 9 e Node number 2 has left child 2 2 e Node number has right child 2 7 1 e If 2 2 gt n then has no left child elf2 2 1 gt n then has no right child e Therefore node number 2 is a leaf if 2 2 gt n e The parent of node 2 is 2 2 as long as i gt 1 e Next available locati
23. es are O 1 assuming the hash function can be computed in constant time O 1 2 99 X t key is k and h k 2 The key to making this work is to choose hash function h and table size M properly they interact CPSC 211 Data Structures amp Implementations c Texas A amp M University 268 Collisions In reality any hash function will have collisions when two different keys hash to the same value h k h kg although ky F ko This is inevitable since the hash function 1s squashing down a large domain into a small range For example if h k k mod M then ky 0 and ky M collide since they both hash to 0 0 mod M is 0 and M mod M is also 0 What should you do when you have a collision Two common solutions are 1 chaining and 2 open addressing CPSC 211 Data Structures amp Implementations c Texas A amp M University 269 Chaining Keep all data items that hash to the same array location in a linked list 0 e 1 e 2 e all have keys that hash to 1 M I e to insert element x with key k put x at beginning of linked list at A h k e to search for element with key k scan the linked list at A h k for an element with key k e to delete element with key k do search if search is successful then remove element from the linked list Worst case times assuming computing A is constant o insert O 1
24. ft child is also an ancestor of x 1 e follow parent pointers from x until reaching a key larger than z s Path to find successor of 17 is 17 16 10 19 If you never find an ancestor that is larger than x s key then x was already the maximum and has no succes Sor Path to try to find successor of 27 1s 27 26 22 19 Running Time O d O n CPSC 211 Data Structures amp Implementations c Texas A amp M University 253 Finding Predecessor in a BST The predecessor of a node x in a BST is the node whose key immediately precedes x s key in sorted or der To find it do the mirror image of the algorithm for successor Case 1 If x has a left child then the predecessor of zx is the maximum in the left subtree follow x s left pointer then follow right pointers until there are no more Case 2 If x does not have a left child then find the lowest ancestor of x whose right child is also an an cestor of x l e follow parent pointers from x until reaching a key smaller than z s If you never find an ancestor that is smaller than z s key then x was already the minimum and has no pre decessor Running Time O d O n CPSC 211 Data Structures amp Implementations c Texas A amp M University 254 Deleting a Node from a BST Case l x is a leat Then just delete x s node from the tree Case 2 x has only one child Then splice out x by connecting x parent directly to x s
25. hm such as heap sort But sometimes you might be coming up with your own algorithm or combining known algorithms in new ways In these cases it s important to check what you are doing CPSC 211 Data Structures amp Implementations c Texas A amp M University 299 Verification and Correctness Proofs cont d The Standish book describes one particular way to prove correctness of small programs or program fragments The important lessons are e It is possible to do very careful proofs of correctness of programs e Formalisms can help you to organize your thoughts e Spending a lot of time thinking about your program no matter what formalism will pay benefits e These approaches are impossible to do by hand for very large programs For large programs there are research efforts aimed at automatic program verification i e programs that take as input other programs and check whether they meet their specifications Generally automatic verification is slow and cumber some and requires some specialized skills CPSC 211 Data Structures amp Implementations c Texas A amp M University 300 Verification and Correctness Proofs cont d An alternative approach to program verification is prov ing algorithm correctness Instead of trying to verify actual code prove the cor rectness of the underlying algorithm e Represent the algorithm in some convenient pseu docode e then argue about what the algorit
26. hm does at higher level of abstraction than detailed code in a particular programming language Of course you might make a mistake when translat ing your pseudocode into Java but the proving will be much more manageable than the verification CPSC 211 Data Structures amp Implementations c Texas A amp M University 301 Implementation The design is now fleshed out to the level of code e Create a Java class for each design class e Fix the type of each attribute e Determine the signature type and number of pa rameters return type for each method e Fill in the body of each method As the code is written document the key design de cisions implementation choices and any unobvious aspects of the code Software reuse Use library classes as appropriate e Stack Vector Date HashTable Kinds of reuse use as is e inherit from an existing class e modify an existing class if source available But sometimes modifications can be more time con suming than starting from scratch CPSC 211 Data Structures amp Implementations c Texas A amp M University 302 Testing and Debugging The Limitations Testing cannot prove that your program is correct It is impossible to test a program on every single input so you can never be sure that it won t fail on some input Even if you could apply some kind of program verifi cation to your program how do you know the verifier doesn t have a bug in it
27. inting a BST in Sorted Order cont d eater than r less than r L contains at most n 1 keys and R contains at most n keys Inorder traversal e prints out all keys in R in sorted order by induction e then prints out key in r which is larger than all keys in R e then prints out all keys in L in sorted order by in duction All these keys are greater than key in r L Running Time O n since each node is handled once CPSC 211 Data Structures amp Implementations c Texas A amp M University 250 Tree Sort Does previous theorem suggest yet another sorting al gorithm to you Tree Sort Insert all the keys into a BST then do an inorder traversal of the tree Running Time O n since each of the n inserts takes O n time CPSC 211 Data Structures amp Implementations c Texas A amp M University 251 Finding Successor in a BST The successor of a node z in a BST is the node whose key immediately follows x s key in sorted order Case I If x has a right child then the successor of x is the minimum in the right subtree follow z s right pointer then follow left pointers until there are no more c of p db S w a Path to find successor of 19 is 19 22 20 CPSC 211 Data Structures amp Implementations c Texas A amp M University 252 Finding Successor in a BST cont d S F bye Case 2 If x does not have a right child then find the lowest ancestor of x whose le
28. is a binary tree that satis fies the binary search tree property TE re a CPSC 211 Data Structures amp Implementations c Texas A amp M University 242 Searching in a BST To search for a particular key in a binary search tree we take advantage of the binary search tree property search x k x is node where search starts SSS Sa eSS k is key searched for if x is null then stopping case for recursion return not found else if k the key of x then return x else if k lt the key of x then search leftchild x k recursive call else k gt the key of x search rightchild x k recursive call endif The top level call has x equal to the root of the tree In the previous tree the search path for 17 1s 19 10 16 17 and the search path for 21 is 19 22 20 null Running Time O d where d is depth of tree If BST is a chain then d n 1 CPSC 211 Data Structures amp Implementations c Texas A amp M University 243 Searching in a BST cont d Iterative version of search search x k while x null do if k the key of x then return x else if k lt the key of x then x Lleftchild x else k gt the key of x x rightchild x endif endwhile return not found As in the recursive version you keep going down the tree until you either find the key or hit a leaf The comparison of the search key with the node key tells you at each level whether to conti
29. it For instance search the WWW for design pattern and see what you get Thus design patterns support software reuse
30. log n depth and not how the algorithms for insertion and deletion work CPSC 211 Data Structures amp Implementations c Texas A amp M University 257 AVL Trees An AVL tree is a binary search tree such that for each node the heights of the left and right subtrees of the node differ by at most 1 a P4 CoO Ko TOL Theorem The depth of an AVL tree is O log n When inserting or deleting a node in an AVL tree if you detect that the AVL tree property has been vio lated then you rearrange the shape of the tree using rotations CPSC 211 Data Structures amp Implementations c Texas A amp M University 258 Red Black Trees A red black tree is a binary search tree in which e every real node is given 0 1 or 2 fake NIL children to ensure that it has two children and e every node is colored either red or black s t every leaf node is black if a node is red then both its children are black every path from a node to a leaf contains the same number of black nodes From a fixed node all paths from that node to a leaf differ in length by at most a factor of 2 implying Theorem The depth of an AVL tree is O log n Insert and delete algorithms are quite involved CPSC 211 Data Structures amp Implementations c Texas A amp M University 259 B Trees The AVL tree and red black tree allowed some varia tion in the lengths of the different root to leaf paths An alternativ
31. near probing Consider this sequence o Insert kj where h k 1 3 at location 3 o Insert k2 where h k 3 at location 4 o Insert k3 where h k3 3 at location 5 e Delete k from location 4 by setting location 4 to empty e Search for k3 Incorrectly stops searching at loca tion 4 and declares ks not in the table Solution when an element is deleted instead of mark ing the slot as empty it should be marked in a special way to indicate that an element used to be there but was deleted Then the search algorithm needs to continue searching if it finds one of those slots CPSC 211 Data Structures amp Implementations c Texas A amp M University 279 Good Hash Functions for Open Addressing An ideal hash function for open addressing would sat isfy an even stronger property than that for chaining namely Each key should be equally likely to have each per mutation of 0 1 M 1 as its probe sequence This is even harder to achieve in practice than the ideal property for chaining A good approximation is double hashing with this scheme e Let M be prime then let h k k mod M and let ho k 1 k mod M 2 Generalizes the earlier example CPSC 211 Data Structures amp Implementations c Texas A amp M University 280 Average Case Analysis of Open Addressing In this situation the load factor A n M is always less than 1 there cannot be more keys in the table than there are table entries
32. niversity 276 Double Hashing Even when non linear probing is used it is still true that two keys that hash to the same location will follow the same probe sequence To get around this problem use double hashing 1 One hash function h4 is used to determine where to start probing 2 A second hash function h is used to determine the probe sequence If the hash functions are chosen properly different keys that have the same starting place will have different probe increments CPSC 211 Data Structures amp Implementations c Texas A amp M University 277 Double Hashing Example Let h k k mod 13 and ho k 1 k mod 11 0 1 2 3 4 5 6 7 8 9 10 1I 12 79 69 98 72 15 50 e To insert 14 start probing at 14 mod 13 1 Probe increment is 1 14 mod 11 4 Probe sequence is 1 5 9 0 e To insert 27 start probing at 27 mod 13 1 Probe increment is 1 27 mod 11 6 Probe sequence is 1 7 0 6 e To search for 18 start probing at 18 mod 13 5 Probe increment is 1 18 mod 11 8 Probe sequence is 5 0 not in table CPSC 211 Data Structures amp Implementations c Texas A amp M University 278 Deleting with Open Addressing Open addressing has another complication e to insert probe until finding an empty slot e to search probe until finding the key being sought or an empty slot which means not there Suppose we use li
33. nue the search to the left or to the right Running Time O d O n CPSC 211 Data Structures amp Implementations c Texas A amp M University 244 Searching in a Balanced BST If the tree is a complete binary tree then the depth is O log n and thus the search time is O log n Binary trees with O log n depth are considered bal anced there is balance between the number of nodes in the left subtree and the number of nodes in the right subtree of each node You can have binary trees that are approximately bal anced so that the depth is still O log n but might have a larger constant hidden in the big oh As an aside a binary heap does not have an efficient search operation Since nodes at the same level of the heap have no particular ordering relationship to each other you will need to search the entire heap in the worst case which will be O n time even though the tree is perfectly balanced and only has depth O log n CPSC 211 Data Structures amp Implementations c Texas A amp M University 245 Inserting into a BST To insert a key k into a binary search tree search start ing at the root until finding the place where the key should go Then link in a node containing the key insert x k if x null then make a new node containing k return new node else if k the key of x then return null key already exists else if k lt the key of x then leftchild x insert leftchild x k return x else
34. ogs n levels CPSC 211 Data Structures amp Implementations c Texas A amp M University 224 Binary Trees cont d Leftmost binary tree like a complete binary tree except that the bottom level might not be completely filled in however all leaves at bottom level are as far to the left as possible TA EY RY RP AR Important Facts e A leftmost binary tree with L levels contains be tween 24 and 2 1 nodes e A leftmost binary tree with n nodes has approxi mately logs n levels CPSC 211 Data Structures amp Implementations c Texas A amp M University 225 Binary Heap Now suppose that there is a data item called a key inside each node of a tree A binary heap or min heap is a e leftmost binary tree with keys in the nodes that e satisfies the heap ordering property every node has a key that is less than or equal to the keys of all its children Do not confuse this use of heap with its usage in memory management Important Fact The same set of keys can be orga nized in many different heaps There is no required order between siblings keys ao a an T AH DE bgd p Ca amp CPSC 211 Data Structures amp Implementations c Texas A amp M University 226 Using a Heap to Implement a Priority Queue To implement the priority queue operation insert z 1 Make a new node in the tree in the next available location marching across from left to right 2 Put x in the new node 3
35. on for insert is index n 1 e Rightmost node on the bottom level is index n CPSC 211 Data Structures amp Implementations c Texas A amp M University 231 Array Implementation of Heap cont d Representation consists of e array A 1 max ignore location 0 e integer n which is initially 0 holding number of elements in heap To implement insert x ignoring overflow n n 1 make a new leaf node Aln x new node s key is initially x cur n Start Dubblang x lt UP parent cur 2 while parent 0 amp amp A parent gt A cur do current node is not the root and its key has not found final resting place swap A cur and A parent cur parent move up a level in the tree parent cur 2 endwhile CPSC 211 Data Structures amp Implementations c Texas A amp M University 232 Array Implementation of Heap cont d To implement remove ignoring underflow minkKey A 1 smallest key to be returned A 1 A n replace root s key with key in a rightmost leaf on bottom level n n l delete rightmost leaf on bottom level cur 1 start bubbling down key in root Lchild 2 cur Rchild lt 2 cur 1 while Lchild lt n amp amp A minChild lt A cur do current node is not a leaf and its key has not found final resting place swap A cur and A minChild cur minChild move down a level in the tree Lchild 2 cur Rchild 2 cur 1
36. put in some jumps spaghetti code to try to avoid this etc Eventually it may be better to replace the program with a new one developed from scratch CPSC 211 Data Structures amp Implementations c Texas A amp M University 309 Measurement and Tuning Experience has shown e 80 of the running time of a program is spent in 10 of the code e predicting where a program will be inefficient can be surprisingly difficult These observations suggest that optimizing your pro gram can pay big benefits but that it is smarter to wait until you have a prototype to figure out WHERE and how to optimize How can you figure out where your program is spend ing its time e use a tool called an execution time profiler or e insert calls in your code to the operating system to calculate CPU time spent CPSC 211 Data Structures amp Implementations c Texas A amp M University 310 Measurement and Tuning cont d Things you can do to speed up a program e find a better algorithm e replace recursion with iteration e replace method calls with in line code e take advantage of differences in speed of machine instructions e g integer arithmetic vs double pre cision arithmetic Don t do things that are stupidly slow in your program from the beginning On the other hand don t go overboard in supposed optimizations that might hurt readability unless you know for a fact based on experimental measurements that they
37. re engineering than for non object oriented CPSC 211 Data Structures amp Implementations c Texas A amp M University 293 Object Oriented Analysis and Design Main objective identify the classes that will be used and their relationships to each other Analysis and design are two ends of a spectrum Anal ysis focuses more on the real world modeling while design focuses more on the programming aspects For large scale projects there might be a real distinc tion for example several programming level classes 9 might be required to implement a real world level class For small scale projects there is typically no distinc tion between analysis and design we can directly fig ure out what classes to have in the program to solve the real world problem CPSC 211 Data Structures amp Implementations c Texas A amp M University 294 Object Oriented Analysis and Design cont d To decide on the classes e Study the requirements brainstorming and using com mon sense Look for nouns in the requirements names en titites things e g courses GPAs names roles e g student even strategies and abstractions These will probably turn into classes e g student course and or instance variables of classes e g GPA name See how the requirements specify interactions be tween things e g each student has a GPA each course has a set of enrolled students e Use an analysis method a colle
38. s node s is string to search for if length s 0 then if x holds a complete key then return x else return null s is not in the trie else C first character in s if no outgoing edge from x is labeled with c then return null s is not in the trie else x child of x reached by edge labeled c s result of removing first character from s search x S endif endif Start the recursion with the root To search for art and bee oa ij AN CPSC 211 Data Structures amp Implementations c Texas A amp M University 264 Hash Table Implementation of Dictionary ADT Another implementation of the Dictionary ADT is a hash table Hash tables support the operations insert an element e delete an arbitrary element e search for a particular element with constant average time performance This is a significant advantage over even balanced search trees which have average times of O log n The disadvantage of hash tables is that the operations min max pred succ take O n time and printing all elements in sorted order takes O n log n time CPSC 211 Data Structures amp Implementations c Texas A amp M University 265 Main Idea of Hash Table Main idea exploit random access feature of arrays the i th entry of array A can be accessed in constant time by calculating the address of A 1 which is offset from the starting address of A Simple example Suppose all keys are in the range 0 to 9
39. s to lead to decentralized control and programs that are easier to change For instance when the requirements change you may have all the basic abstractions right but you just need to rearrange how they interact Aim for a core framework of abstract classes and interfaces representing the core abstractions which are specialized by inheritance to provide concrete classes for specific problems CPSC 211 Data Structures amp Implementations c Texas A amp M University 289 Software Life Cycle inception obtain initial ideas requirements gather information from the user about the intended and desired use e elaboration turn requirements into a specification that can be implemented as a program analysis identify key abstractions classes and relationships design refine your class model to the point where it can be implemented identify reuse locate code that can be reused e implementation program and test each class integrate all the classes together make classes and components reusable for the future e testing e delivery and maintenance CPSC 211 Data Structures amp Implementations c Texas A amp M University 290 Software Life Cycle cont d Lifecycle is not followed linearly there is a great deal of iteration An ideal way to proceed is by iterative prototyping e implement a very simple minimal version of your program e review what has been achieved e decide what
40. search trees each key is independent of the other keys in the tree except for their relative positions For some kinds of keys one key might be a prefix of another tree For example if the keys are strings then the key at is a prefix of the key atlas The next kind of tree takes advantage of prefix rela tionships between keys to store them more efficiently A trie is a not necessarily binary tree in which e each node corresponds to a prefix of a key and e prefix for each node extends prefix of its parent The trie storing a ale ant bed bee bet an a CPSC 211 Data Structures amp Implementations c Texas A amp M University 262 Inserting into a Trie To insert into a trie insert x s x is node s is string to insert if length s 0 then mark x as holding a complete key else c first character ins if no outgoing edge from x is labeled with c then create a new child node of x label the edge to the new child node with c put the edge in the correct sorted order among all of x s outgoing edges endif x child of x reached by edge labeled c s result of removing first character from s insert x Ss endif Start the recursion with the root To insert an and beep O a ij AN CPSC 211 Data Structures amp Implementations c Texas A amp M University 263 Searching in a Trie To search in a trie search x s x i
41. t Insert the n items one by one into a binary search tree Then do an inorder traversal of the tree For a basic BST worst case time is O n but average time is O n log n For a balanced BST worst cast time is O n log n although code is more complicated CPSC 211 Data Structures amp Implementations c Texas A amp M University 283 Sorting cont d e Heapsort Insert the n items one by one into a heap Then remove the minimum element one by one Elements will come out in sorted order Worst case time is O n log n e Mergesort Apply the idea of divide and conquer Split the input array into half Recursively sort the first half Recursively sort the second half Then merge the two sorted halves together Worst case time is O n log n like heapsort how ever it requires more space CPSC 211 Data Structures amp Implementations c Texas A amp M University 284 Object Oriented Software Engineering References e Standish textbook Appendix C e Developing Java Software by Russel Winder and Graham Roberts John Wiley amp Sons 1998 ch 8 9 Outline of material e Introduction e Requirements e Object oriented analysis and design e Verification and correctness proofs e Implementation e Testing and debugging e Maintenance and documentation e Measurement and tuning e Software reuse CPSC 211 Data Structures amp Implementations c Texas A amp
42. ther group whose job is to try to make your code break CPSC 211 Data Structures amp Implementations c Texas A amp M University 307 Maintenance and Documentation Maintenance includes e fixing bugs found after delivery e correcting faulty behavior e g due to misunder standing of requirements e supporting the program when the environment changes e g anew operating system e modifications and new features due to changes in requirements Most often the person or people doing the mainte nance are NOT the one s who originally wrote the program So good documentation is ESSENTIAL There are at least two kinds of documentation both of which need to be updated during maintenance e internal documentation which explains how the pro gram works and e external documentation which explains what the pro gram does 1 e user s manual CPSC 211 Data Structures amp Implementations c Texas A amp M University 308 Maintenance and Documentation cont d In addition to good documentation a clean and eas ily modifiable structure 1s needed for effective mainte nance especially over a long time span If changes are made in ad hoc kludgey way either be cause the maintainer does not understand the underly ing design or because the design is poor the program will eventually deteriorate sometimes called software rot Trying to fix one problem causes something else to break so in desperation you
43. to do next e proceed to next iteration of implementation e continue iterating until reaching final goal This supports exploring the design and implementation incrementally letting you try alternatives and correct mistakes before they balloon CPSC 211 Data Structures amp Implementations c Texas A amp M University 291 Requirements Decide what the program is supposed to do before start ing to write it Harder than it sounds Ask the user e what input is needed e what output should be generated Involve the user in reviewing the requirements when they are produced and the prototypes developed Typically requirements are organized as a bulleted list Helpful to construct scenarios which describe a se quence of steps the program will go through from the point of view of the user to accomplish some task CPSC 211 Data Structures amp Implementations c Texas A amp M University 292 Requirements cont d An example scenario to look up a phone number 1 select the find phone number option 2 enter name of company whose phone number is de sired 3 click search 4 program computes to find desired number do NOT specify data structure to be used at this level 5 display results Construct as many scenarios as needed until you feel comfortable and have gotten feedback from the user that you ve covered all the situations This part of the software life cycle is no different for object oriented softwa

root parent child leaf node edge - TAMU Computer Science Faculty

Contents

Download Pdf Manuals

Related Search

Related Contents