Home
Part 1: Introduction
Contents
1. Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 11 The Task of DB Design 4 The three basic design errors are e I here are situations in the real world which do not correspond to a database state Data that actually occur cannot be entered e A legal question about the real world cannot be formulated as a query to the database The needed information is missing in the database e Database states are possible which do not corre spond to a legal state in the real world Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 12 Constraints 1 e I wo kinds of errors must be distinguished Entering wrong data i e the DB state corre sponds a different situation of the real world than the actual one E g 8 points given for Homework 1 in the DB vs 10 in the real world Then the DB state is wrong but not the schema What can be done to guard against such errors Entering data which do not make sense or are illegal E g 5 points for some homework or two entries for the same student and same exercise If such impossible data can be entered the DB schema is wrong design error Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 13 Constraints 2 e If the DB contains illegal meaningless data it be comes inconsistent with our general understanding of the real world e If a programmer assumes that the da
2. e But still there is the problem that programs eva luating the data cannot understand these remarks so they will simply ignore them Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 19 Flexibility and Computers 2 e Computers are stupid Every possible situation must be anticipated when developing programs or database schemas e This is what makes database design difficult e On the other hand computers are very fast very precise act 100 according to the given rules and do not complain about stupid tasks e Ihe decreased flexibility is not necessarily fatal if the users can accept them Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 20 Interpretation of the Data 1 e Ihe formal database schema describes only data e Data becomes information by interpretation e This interpretation i e the mapping between da tabase states and situations in the real word must be documented as part of the database design task It must be clear what the stored data actually means and what the users of the system are supposed to enter in the table columns e Ihe task of database design is certainly not com plete when only a set Of CREATE TABLE statements is delivered Additional documentation is needed Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 21 Interpretation of the Data 2 e Part of the task c
3. It is more object oriented than might be good for relational systems E g it has no built in notion of keys e Oracle Designer does not support UML but Oracle JDeveloper and Sybase PowerDesigner do Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 66 Overview 1 The Task of Database Design 2 Users Application Programs Data 3 Phases of Database Design 4 System Development Lifecycle 5 Summary Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 67 Oracle CASE Method Strategy User Documentation Barker 1990 Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 68 Strategy Phase 1 The purpose of the strategy phase is to develop a plan for information systems development The planned system must serve the organization s current and future needs The plan must also take into account organizatio nal financial and technical constraints Of course management wants to know what will we get and how much will it cost before the project goes into the next phase Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 69 Strategy Phase 2 e The strategy phase results in a contract e However even at the end of the strategy phase estimates about the cost will not be very reliable e Ihe contract can state that the price is o fixed Then the p
4. conceptual design phase This builds on the results of the strategy phase e The final ER diagrams are developed including all attributes and business rules constraints e The function hierarchy business process diagrams are further developed Dataflow and entity usages are analyzed Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 75 Analysis Phase 2 e Legacy systems must be carefully analyzed and a Strategy for transition and data migration must be developed Don t underestimate the effort of data migration from the old system into the new system How will data be handled that violates the constraints Is data cleaning possible e Describe required interfaces with other software e Collect information about the expected data volu mes function frequencies and performance expec tations Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 76 Analysis Phase 3 Collect security requirements Collect requirements for backup recovery It is not possible to meet a user s need that was never discovered Koletzke Dorsey A thorough requirements document can easily fill several thousand pages Koletzke Dorsey Describe what is needed but do not yet think too much about how it should be done The focus is still on the user not on the system Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 7
5. the schema will be in BCNF 4NF etc This is related to non redundancy and sufficient generality e Stability Flexibility Extensibility The schema can be easily adapted to changing requirements e Simplicity and Elegance A solution with fewer more generic schema elements might be pre ferable to a larger schema e Making good use of data model constructs E g EER constructs reduce the need for general constraints Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 101 Schemas Quality Criteria 4 e Communication Effectiveness Self Documenting Names of schema elements should be chosen well to make the inter pretation of data clear Terms should be familiar to business specia lists e Readability Diagrams should be drawn in a grid line crossings should be mini mized symmetric structures should be emphasized related concepts Should be near in the diagram e Uniformity Style naming conventions abbreviations should be uniform Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 102 DB Design is not Easy 1 The designer must learn about the application do main Domain experts often don t bother to say the obvious obvious to them or mention rare exceptions They use technical terms which the database designer must learn Exceptions The real world is very flexible One must somehow extrapolate from the single state or the few st
6. Datenbanken I Universit t Halle 2003 1 Introduction 1 56 Entity Relationship Model 4 e There is no commercial entity relationship DBMS Object oriented database management systems are somewhat similar but they are only used for non standard applications e g CAD tools e hus a schema transformation into another data model is necessary e Many variants extensions of the ER model have been proposed Several different graphical notati ons are used If you know one notation it is easy to learn another one since the basic concepts are the same Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 57 Entity Relationship Model 5 e The ER model is called a semantic data model because it more closely resembles the real world than e g the relational model For instance In the ER model persons are modelled In the re lational model only names numbers are found The ER model is an abstraction of the real world whereas the relational model is an abstraction of files on a computer In the ER model there is a distinction between entities and relationships In the relational mo del both are represented by relations Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 58 Entity Relationship Model 6 e Since ER schemas can be translated into relational schemas the expressiveness semantical richness of the ER model is not n
7. will later pay off by less phone calls of users who need help less time lost by the users for trying to find a way to do what they need to do a better impression by the users about the soft ware quality easier cheaper maintainance modifications e A user manual can even say This is no bug this is a feature and the users might accept that Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 86 Documentation 4 e Few people read a big manual before they start using the software e There should be a short introduction lt 20 pages e After that a good table of contents a good index and good cross references are essential It should be possible to understand a section without reading all the previous ones However a few users do want to read more than the introduction in a sequential manner Repeating again and again the same things is not nice for them Sequential readers also can expect that concepts are defined before they are used Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 87 Documentation 5 e Manuals are always missing when they are needed Thus there should be a good online help system Documentation Should be available in electronic form e Documentation might also include the preparation of training courses e Also a web site might be developed that contains an FAQ and a list of bugs and other pr
8. you should be able to e explain correctness and quality criteria for database schemas explain difficulties and risks e enumerate what else besides the mere schema de Sign needs to be done during a database project e explain the relationship between application pro grams and database design e explain the three phases of database design Why does one not directly start with a relational design e explain a system development lifecycle Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 3 Overview 2 Users Application Programs Data 3 Phases of Database Design 4 System Development Lifecycle 5 Summary Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 4 Basic Database Notions 1 e The main task of a database system DBS is to answer certain questions about a subset of the real world domain of discourse e g Which homework Database has Ann Smith Succi completed y e This is done by selecting aggregating and combi ning information that was previously entered e Ihe system must know the structure of the data to support powerful queries Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction Basic Database Notions 2 e In the relational data model the data is structured in form of tables relations 1 5 e Each table has a name sequence of named columns attribut
9. Relationship Model 1 ER Schema in Graphical Notation e This mini world contains students and homework exercises entities objects e Students have a name and an email address attributes properties data about objects Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 54 Entity Relationship Model 2 e Students are identified by their name key e Exercises have a number and a number of points They are identified by their number e Students solve exercises relationship e A student can solve between O and arbitrarily many exercises and an exercise can be solved by O or any number of students many to many relationship However between each student and each exercise at most one connec tion can exist relationships are sets of pairs e Students get points for their solution Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 55 Entity Relationship Model 3 e Proposed by Peter Pin Shan Chen 1976 An International Conference on the Entity Relationship Approach has occurred almost every year since 1979 e Standard tool for conceptual design Every professional DB designer must Know it well e The graphical notation helps to establish a better overview to see the structure of the data It is also useful for communicating with the future users This notation was probably an important success factor Stefan Brass
10. an exact specification of the applications similar to blueprints of an architect which are given to a contractor for building a house Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 80 Build Phase 1 e In the Build phase the working system is created e E g tables views procedures triggers and other database objects are created the final decisions of physical design are made Storage parameters for tables including the partitioning among table spaces disks indexes clusters etc e he database should be filled with example data of the same size as the production database will be Only in this way performance can be tested and tuned Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 81 Build Phase 2 e The application programs are developed hopefully many programs can be generated with a tool like Oracle Designer out of specifications developed du ring the Design phase e Of course testing the developed programs is man datory First every developer will test his her program in isolation But then also other people including real users must test it and the integration with other programs must be tested A test plan should be developed during the design phase Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 82 Build Phase 3 e Whenever systems are built apparently small constraints and lim
11. direct SQL access the checking can be done in the application programs But this means that the programs must do the user management instead of letting the DBMS do this work Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 98 Schemas Quality Criteria 1 e Syntactical Correctness The schema is legal with respect to the given data model e Completeness The necessary data can be stored All the given questions about the real world can be answered from the database e Enforcement of Business Rules Illegal updates are rejected Data violating the given business rules cannot be entered Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 99 Schemas Quality Criteria 2 e Sufficiently general All situations that are possible in the real world can be represented in the database I e all data that does not violate the business rules can be stored e Preciseness The relation to the real world is exactly documented The intention interpretation of schema elements must be clear e Non Redundancy Every relevant aspect of the real world should be represented only once The schema should be minimal i e no schema element can be remo ved without violating the completeness requirement Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 100 Schemas Quality Criteria 3 e Normality If translated into the relational model
12. example scenarios develop more prototypes Of course one can also hire more experi enced data modellers There is a tradeoff between risk and money but sometimes relatively little money or simply doing things in a different way can significantly reduce the risk e The requirements change Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 108 Risks Risk Management 2 e Ihe software is not ready on time The budget is insufficient e The software does not work correctly it might ac tually destroy data or enter incorrect data e The DBMS is down not available E g because of hardware faults software bugs bad adminstration this includes the case that a disk is suddenly full e The system does not deliver the required perfor mance Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 109 Risks Risk Management 3 J The DBMS vendor goes bankrupt and and the soft ware is no longer supported The DBMS vendor changes the licensing terms and the system gets more expensive at least updates e A disk fails There is a fire in the computer room Although it might be possible to restore the latest DB state this might takes hours downtime e The DBA accidentally deletes an important table Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 110 Risks Risk Management 4 e The programmers or the DBA do not
13. pro grams Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 16 Constraints 5 Why specify constraints Some protection against data input errors Constraints document knowledge about DB states Enforment of laws company standards Protection against inconsistency if redundant data is stored Queries programs become simpler if the program mer is not required to handle the most general cases i e cases where the constraint is not satisfied E g if columns are known to be not null no indicator variable Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 17 Constraints 6 Constraints and Exceptions e Constraints cannot have any exceptions e A good DBMS will reject any attempt to enter data which violates a specified constraint e One can expect that eventually there will be excep tional situations in which the DBS seems unflexible because of the specified constraints e Only conditions that are unquestionable should be defined as constraints Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 18 Flexibility and Computers 1 e The introduction of a computerized system always changes the real world e With the old paper based forms it was always pos Sible to scribble something at the border or bottom of the form e In a database a field for notes or remarks must be added to the table
14. sufficiently know the DBMS software e Important people leave the project e Employees users of the system do not know it well enough to use it correctly and efficiently e Employees accidentally enter incorrect data e Employees accidentally delete important data Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 111 Risks Risk Management 5 e A hacker tries to access or damage the data e Somebody who leaves the company takes informa tion from the database with him her In the extreme case an export file of the entire database e The employees do not like the new system The worker s union protests against it e Ihe system violates data privacy laws Or the company gets a bad reputation because of questionable practice regarding personal data Stefan Brass Datenbanken I Universit t Halle 2003
15. the overall development cost isertainly bigger e Certain tasks e g copying data between systems might need to be done manually extra work pos sible errors Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 91 Transition Phase 4 Gradual Transition Continued e One can get an impression of the software quality and the transition problems first for a smaller part of the company But this might be able to paralyze the rest of the company e Users who already switched to the new system may help in training users which still have to switch Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 92 Overview The Task of Database Design Users Application Programs Data Phases of Database Design System Development Lifecycle Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 93 Business Rules 1 e Business rules are similar to constraints but they refer to the real world not to the DB Constraints can only be specified after the structure of the databa se state is defined e g tables columns Business rules describe restrictions in the real world they are more general They not only restrict states in the real world but also who is allowed to do what and temporal constraints and procedures that must be followed if the invoice is not paid after 30 days a letter is sent to remi
16. yet submit a homework is there an entry in the results table with O points or is there no entry for this combination of student and homework Should the student grade be calculated as the average of 2 homeworks 95 85 and ignore the missing homework or should the grade be the average of 3 95 85 0 Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 27 Interpretation of the Data 8 e Need for historial information E g it should be stored who borrowed a book from the library Can the information be deleted when the book is returned Does a counter suffice how often the book was borrowed Or is the complete history needed If the same student borrowed the same book several times does this have to be stored Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 28 Overview 1 The Task of Database Design 2 Users Application Programs Data 3 Phases of Database Design 4 System Development Lifecycle 5 Summary Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction Data vs Programs 1 The beginning of a DB project is the understanding that there are specific real world tasks which need to be supported by computerization 1 29 e he tasks require that data be collected and com piled so that they can be analyzed or summerized e Programs need to be created to facilitate the col lection
17. 1 Introduction 1 1 Part 1 Introduction References Ramez Elmasri Shamkant B Navathe Fundamentals of Database Systems 3rd Ed Ch 16 Practical Database Design and Tuning Toby J Teorey Database Modeling amp Design 3rd Edition Morgan Kaufmann 1999 ISBN 1 55860 500 2 Graeme C Simsion Graham C Witt Data Modeling Essentials 2nd Edition Coriolis 2001 ISBN 1 57610 872 4 459 pages Robert J Muller Database Design for Smarties Using UML for Data Modeling Morgan Kaufmann 1999 ISBN 1 55860 515 0 ca 40 Peter Koletzke Paul Dorsey Oracle Designer Handbook 2nd Edition ORACLE Press 1998 ISBN 0 07 882417 6 1075 pages ca 40 Martin Fowler Kendall Scott UML Distilled Second Edition Addison Wesley 2000 ISBN 0 201 65783 X 185 pages Grady Booch James Rumbaugh Ivar Jacobson The Unified Modeling Language User Guide Addison Wesley Longman 1999 ISBN 0 201 57168 4 482 pages Carlo Batini Stefano Ceri Shamkant B Navathe Conceptual Database Design Benjamin Cummings 1992 ISBN 0 8053 0244 1 470 pages Richard Barker CASE Method Tasks and Deliverables Addison Wesley 1990 ISBN 0201416972 ca 69 Rauh Stickel Konzeptuelle Datenmodellierung in German Teubner 1997 Udo Lipeck Skript zur Vorlesung Datenbanksysteme in German Univ Hannover 1996 Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 2 Objectives After completing this chapter
18. 7 Design Phase 1 e Ihe focus now shifts from the user to the system e The relational database design is developed based on the given ER model x logical design phase Probably denormalization should already be considered if really ne cessary but other physical design decisions e g indexes can be deferred until the build phase When defining the tables you should work together with an experienced DBA preferable the one who has later to live with the design e Functions are mapped into modules application programs and manual procedures Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 78 Design Phase 2 e The Design phase is where the blueprints are drawn for building the system Every detail should be laid out before generation Koletzke Dorsey e Design standards must be set This includes the development of screen concept prototypes All programs should have the same look and feel User documentation should have a similar structure Programming styles should be uniform naming standards Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 79 Design Phase 3 e Design is complete when the design documents could be handed over to another team to build with each application having its own screen or report design list of detailed functionality and create retrieve update delete CRUD report Koletzke Dorsey This is
19. a Independence 4 e It would be bad if one had to change all application programs when the data organization is changed When programs directly access files this is of course necessary e In relational DBMSs indexes can be added or dele ted without any change to an application program e SQL is a declarative language One specifies only which conditions the result must satisfy but not how it should be computed e Ihe query optimizer automatically uses indexes Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 38 Data Independence 5 e I his has led to the distinction between two schema levels The Conceptual Schema describes the logical in formation contents of the database E g in the relational model The Internal Schema or Physical Schema des cribes the way the data is actually stored E g relations plus indexes disks and many storage parameters e Users can refer in SQL queries only to the con ceptual schema Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 39 Data Independence 6 e The query optimizer translates the SQL query into an internal query program which is evaluated on the actually stored instance of the internal schema e In most systems the storage parameters are de fined as part of the CREATE TABLE statement and most have a CREATE INDEX command in their SQL Theoreticians would have wished a cl
20. an be solved by choosing good names for the schema elements e g tables and columns e Names should be self documenting understanda ble without additional explanation but also not too long e g max 18 characters and not similar to names elsewhere in the schema e Choosing good names needs some thought But the invested time will later pay off Discussing the names with other people might help The DB designer must talk about the schema with the future users customers domain experts and programmers Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 22 Interpretation of the Data 3 e Abbreviations and other naming conventions should be documented and used consistently especially important when developing in a team E g table names Some designers use the singular form of a noun some the plural form E g placement of underscores capitalization e The documentation might include a small example DB state e I here should be some explanation for every schema element e g tables and columns These can be used later in the help files for input fields Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 23 Interpretation of the Data 4 e Ihe data inside the tables needs interpretation meaning of specific codes elements of enume ration types units for physical measures format of strings that contain several pieces of data this shou
21. ates one observes to all possible states e Size Database schemas can be very big Despite the name miniworld schemas with more than 100 entity types are still quite normal Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 103 DB Design is not Easy 2 e I he solution is usually not unique e Sometimes there is no perfect solution one can choose only between two bad things e Existing software or data might reduce the choices e Such a project brings changes into the company but the users might fear changes only manage ment wants it Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 104 DB Design is not Easy 3 e It is a mistake to assume that once you know the syntax of the ER model you can work as DB desi gner for large projects e What else is needed besides experience Translation into the relational model reverse en gineering Normal form theory and the intuition behind it redundancy constraints Having seen many DB designs knowning typical patterns Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction DB Design is not Easy 4 e Things a DB designer should know continued 1 105 Interviewing techniques Basic business knowledge Form analysis text analysis view integration schema condensation Business process modeling CRUD analysis CASE tools s
22. ational model and vice versa Automatic generation of software prototypes Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 63 UML 1 e Currently the Unified Modeling Language UML is gaining more and more acceptance e UML is a system of notations for visualizing diffe rent aspects of an object oriented software design e UML 1 1 was adopted as a standard by the OMG Object Management Group on Nov 14 1997 Current version 1 3 The UML project started in 1994 when Grady Booch Ivar Jacob son and James Rumbaugh authors of previously competing object oriented design methods joined their efforts Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 64 UML 2 e The UML has nine common types of diagrams Class Diagram Object Diagram Use Case Diagram Sequence Diagram Collaboration Diagram Statechart Diagram Activity Diagram Component Diagram Deployment Diagram e UML class diagrams are similar to ER diagrams The ER model is certainly not outdated by UML only extended and again the notation is slightly changed Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 65 UML 3 e One of the CASE tools for UML is Rational Rose The three UML inventors work for own the company Rational e Probably many future database projects will use UML But Its goal is software design not DB design
23. chema elements e E g the old homework results database consisted of three tables and four programs Program Registration Change Password View Results Import Points Stefan Brass Datenbanken I STUDENTS RESULTS EXERCISES Universit t Halle 2003 1 Introduction 1 33 Data vs Programs 5 e It is difficult to specify what application programs have to do without refering to a DB schema e The database schema determines already a large part of the needed programs Basically for every table a program is needed to enter display the data One program may do this for a small set of tables Lookup tables don t need programs fixed after DB creation e Ihe database schema is smaller than the complete Specification of the needed programs It can be understood as a concise representation of the essential functions of a large subset of the required programs Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 34 Data Independence 1 e Before databases were widely used data were sto red in files directly managed by programs e Ihe programs were considered the main thing the re was no independent documentation for the data The data was organized in a way which was well suited only for the single program which used the file e It was difficult to use the data for other purposes than the one for which they were orginally collected It is frustrating if one Knows
24. compilation and querying of the data e DB design and application development are of equal importance neither are subordinate to the other Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 30 Data vs Programs 2 e In normal software engineering projects the pro grams are seen as the main goal and the data only as a means of implementation e Database projects are special There are usually many programs that access the Same database The same data may be used by future programs Ad hoc SQL queries and even updates can be used on the data Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction Data vs Programs 3 e I he specification of programs amp data is intertwined The data must meet the information needs of the programs no data is missing No unnecessary data i e data not needed by any Current or forseen program should be collected Programs are needed to insert modify the data 1 31 e As ad hoc queries and updates in SQL are possible the second and third condition can have exceptions It is important for a DB project to know whether there will be users knowing SQL or whether the goal is closed system Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 32 Data vs Programs 4 e CRUD analysis Matrix that shows which program creates retrieves updates deletes data for which s
25. dependence 9 External Schema 1 Conceptual Schema Internal Schema ANSI SPARC 1978 Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 43 Tasks of a DB Project 1 Development of the database schema Including physical parameters and external views Development of the application programs Including interfaces to other systems May require a redesign of business processes This may actually help the business Migration of old data cleaning old data The old data might not fully satisfy the new constraints The neces sary data cleaning can take a lot of time Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 44 Tasks of a DB Project 2 Entering buying additional data Ensuring that performance requirements are met Defining access rights Writing documentation Training users In the transition phase many questions must be answered Developing procedures for backup and recovery A knowledge transfer to the DBA might be a project goal Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 45 Overview 1 The Task of Database Design 2 Users Application Programs Data 3 Phases of Database Design 4 System Development Lifecycle 5 Summary Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 46 Database Design Phases 1 e There are usually th
26. design This ensures that the conceptual design is not invalidated if a diffe rent DBMS is later used e In the conceptual schema non standard datatypes for the attributes can be used Of course this makes the logical design more difficult But object relational systems do have an extensible type system Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 49 Database Design Phases 4 e Only the physical design should depend on sizes of database objects invocation frequency for each program performance of the hardware quality of the DBMS query optimizer e These parameters will change over time e If the logical design depends on them it must be changed which means that application programs must be changed too Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 50 Database Design Phases 5 e Don t accept compromises in the logical schema for the sake of performance Unless experiments prove that the current design cannot deliver the required performance e Old DB designs are often heavily denormalized which makes changes difficult and expensive Each piece of redundant data that is not completely managed by the DBMS like e g an index makes application programs more difficult and inconsistencies possible Denormalization also means that certain pieces of information can only be stored together which makes the schema les
27. earer separation But since the internal schema normally must repeat the information in the concep tual schema and add its own parameters this is not very practical e However these are not part of the SQL standard and highly system dependent Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 40 Data Independence 7 e The independence of programs from the data or ganization is called physical data independence e Often additional application programs are develo ped for an existing DB These programs access the existing data but might also need additional data e E g columns must be added to existing tables e It would be good if the existing application pro grams do not have to be changed in such cases logical data independence Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 41 Data Independence 8 This is reached by adding a third schema level the external schemas In this model each application program or group of programs users has a schema of its own Of course the data for this schema are not actually stored but consist of views virtual tables that are computed from the conceptual schema In this way it might be possible to keep the exi sting external schemas stable when the conceptual schema has to be changed Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 42 Data In
28. eeded to satisfy the informa tion requirements of the applications e But it makes the correspondence between the DB schema and the real world clearer like a comment e Extended ER models have e g specialization sub classes which is a very useful feature The transla tion into the RM is possible but often less elegant Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 59 Entity Relationship Model 7 e Unfair comparison C is translated into assembler but one prefers to write programs in C The comparison is unfair because the language level difference bet ween C and assembler is much greater than between the ER model and the relational model In the end most entity types correspond to tables and vice versa Conceptual models that are as high above the relational model as C is above assembler still have to be defined Also portability is very important for C whereas assembler depends on the machine type A bit this also appears in conceptual design since an ER schema does not depend on the features of a specific DBMS But again the effect is smaller than in the C Assembler case unless one also translates into OODBMS XML etc Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 60 Entity Relationship Model 8 e Since there is anyway no implementation one can extend the ER notation if necessary e However the ER notation acts as a communicati
29. es and a set of rows tuples Solved Student Homework Points l Ann Smith Ann Smith Michael Jones P Michael Jones DB Schema Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 6 Basic Database Notions 3 e In SQL schemas are declared by means of CREATE TABLE statements CREATE TABLE Solved Student VARCHAR 40 NOT NULL Homework NUMERIC 2 NOT NULL CHECK Homework gt 0 Points NUMERIC 2 NOT NULL CHECK Points gt 0 PRIMARY KEY Student Homework e Schemas with 20 100 tables are medium size tables with gt 100 columns are common Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 7 Basic Database Notions 4 e I he information is stored in the database state Query Answer 1 Current State SELECT Homework FROM Solved 2 WHERE Student Ann Smith e Entering modifying or deleting information changes the state Old BEINE New State INSERT INTO Solved State VALUES Ann Smith 3 10 Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 8 The Task of DB Design 1 e Database design is the process of developing a da tabase schema for a given application e The requirements for a DB project can be specified by listing all of the questions which the DBS must be able to answer Specific example values or variable names can be used e During DB design a for
30. its get introduced during the build stage I can t imagine them ever needing more than 255 The biggest one I ve ever seen had only seven line items I think I ll code those codes directly into the pro gram to make it work faster Barker 1990 Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 83 Documentation 1 e Documentation should be an ongoing process oc curring throughout the system development pro cess It should accompany the first prototype the user sees and every other software deliverable Koletzke Dorsey e We all Know the nightmare stories of developers who come in to modify an existing system for which there is no documentation Koletzke Dorsey Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 84 Documentation 2 By preparing careful system and user documenta tion throughout the life cycle of the project devel opers are not left with a major task at the end In addition frequently no client money is left at this point to pay to extend the development process further Koletzke Dorsey System documentation will be mainly developed during the Design phase User documentation and the help system can only be developed when the design is complete Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 85 Documentation 3 e Time and money invested in good documentation
31. ld anyway be avoided unusual ambigous meanings for specific values E g 1 point student should resubmit Really bad style meaning of null values Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 24 Interpretation of the Data 5 Examples of possible misunderstandings e General concept vs concrete instances E g the course INFSCI 2711 Database Analy sis and Design vs this course given in a specific term key CRN e Null values and strings of zero length should not be allowed for the same column too difficult to distinguish e Ihe two ends of recursive relationships parent_of actually contains the ID of the child Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 25 Interpretation of the Data 6 e Entities in a context vs globally unique entities If one student is in two classes that I teach do I list him her as one student or two separate instances of a student Am I counting total bodies in each of my classes or the number of unique students in all of my classes If two patients have bronchitis is this counted as two different health problems or two instances of the same problem Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 26 Interpretation of the Data 7 e Non existence of a relationship vs existence with value O If a student did not
32. mal model of some aspects of the real world a mini world must be built The information which is needed to answer the required questions must be available Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 9 The Task of DB Design 2 e Building a model is done by abstraction Details which are irrelevant for the given application are left out Any model is a simplification of reality Classification On some abstraction level objects are the same Aggregation Objects are seen as a unit individual identity left out Generalization The same object on different abstraction levels e A model needs to be structured Though text may contain all of the necessary information it cannot be used by a computer for answering questions ex cept with higher Al Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 10 The Task of DB Design 3 e When a database schema is defined too closely to existing paper forms text fields which can only be printed but which cannot be used in statistical eva luations may result Suppose the goal is to determine from how many different countries there are students in this course If the country is defined as a text field entries might be e g Germany Federal Republic of Germa ny Fed Rep Germany FRG It will be necessary to eliminate synonyms manually e One must ask What do I want to do with the data
33. nd the customer e Business rules are what prevents the business from chaos not everybody can do what he she wants Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 94 Business Rules 2 e Like constraints business rules cannot have any ex ceptions This might be difficult for business people to understand but fle xible business rules are basically not relevant for database design They might be useful for programs default values warnings e It is also important which business rules are likely to change in future and which ones are very stable I watched a large insurance company struggling to introduce a new product The hold up was the time required to develop a supporting information system Meanwhile one of the company s competitors was able to introduce a similar product making use of an existing information system and win a major share of the market Simsi on Witt 2001 Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 95 Business Rules 3 During DB design business rules are transformed into e Structural elements of the schema E g if every student can have only one contact email address for this course it can be stored as an attribute of the STUDENTS table Otherwise an extra table is needed e Constraints If each student must have an email address this attribute must be NOT NULL If there cannot be two studen
34. oblems that are currently being resolved A good website might mean that less support help desk people are needed at the telephones Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 88 Transition Phase 1 Big Bang vs Gradual Phased Transition e One one day all tasks are switched to the new system e Clean solution no development effort into tempo rary interfaces e Risky What happens if the software does not quite work Developers will always promise that it works tomorrow and only minor details are missing 99 effect When do you switch back to the old system Can you switch back to the old system Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 89 Transition Phase 2 Big Bang continued e Needs a lot of training Even with training it will look different when the employees have to do real work with it In the days after the switch there might be not enough staff to answer all questions And the development team will be busy removing real errors e Companies can go bankrupt this way The productivity will go down for a while There must be financial reserves to survive this Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 90 Transition Phase 3 Gradual Phased Transition e Temporary interfaces between new parts and old parts are needed These will be thrown away in the end Thus
35. oftware engineering techniques SQL current database software programming knowledge Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 106 DB Design is not Easy 5 Data quality is almost certainly the biggest problem you re going to have in a legacy bound project If your schedule doesn t include a big chunk of time for analyzing fixing and testing the data from the legacy system your schedule is wrong Muller 1999 If the system you re proposing to build is an order of magni tude greater in size than the ones you have built previously it is a good bet your culture isn t capable of doing it Muller 1999 There was a reuse organization in yet another building that needed to have a say in making sure everything was reusable or was at least contributing to the concept of reuse The head of this organization did not like the head of the application organization so nothing ever got done Muller 1999 Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 107 Risks Risk Management 1 Consider possible risks and what to do about them e The collected requirements are wrong or not com plete In order to reduce this risk one can invest more time and money into the requirements analysis One can do more interviews study more existing standard solutions have more thorough presentations and discussions of the solution play through More
36. on tool between designers programmers customers This is endangered if one does arbitrary changes to the notation e Of course if one uses a CASE tool for managing ER diagrams one has to stick to the notation sup ported by the tool Modern CASE tools have some support for user defined extensions Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 61 CASE Tools 1 e CASE Computer Aided Software Engineering In general CASE tools support the development of software e g by managing design documents enforcing syntax rules performing con sistency style checks translating between different views of a system and supporting project management and team work e here are special CASE Tools for database pro jects e g Oracle Designer ERwin PowerDesigner ER Studio e A specialized graphical editor for ER diagrams is a Standard component of such tools Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 62 CASE Tools 2 e Standard features of database CASE Tools Support for different kinds of diagrams e g ER diagrams diagrams of relational schemas business process diagrams Repository for storing all design documents This should include version management and consistency checks Normally many ER diagrams must be managed A single one would be too big could only be used as wallpaper Automatic translation from the ER model into the rel
37. ree schema design phases Conceptual Database Design produces the initial model of the real world subset in a conceptual data model like the Entity Relationship Model Logical Database Design transforms this schema into the data model supported by the DBMS often the relational model Physical Database Design aims at improving the performance of the final system E g indexes and storage parameters are selected Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 47 Database Design Phases 2 Why multiple design phases e Reduction in complexity If not all design decisions depend mutually on one another problems can be separated and attacked one after the other e Protection against changes If design decisions do not depend on specific input parameters they are not invalidated by changes to those parameters e Different tasks need different tools techniques e Milestones Ceremony accepted method Easier to track the project s progress vs the schedule Project can celebrate or submit bills for each milestone Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 48 Database Design Phases 3 e E g during conceptual design there is no need to worry about limitations of a specific DBMS Focus is on producing a correct model of the real world e DBMS features do not influence conceptual design and only partially influence the logical
38. rice might be higher than necessary the developers take the whole risk and the finished product might be not very good just satisfy the requirements in the contract an hourly rate Then there is no incentive to ever finish the product a mixture of both or hourly rate paid at the end Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 70 Strategy Phase 3 e Already in this phase ER diagrams and function hierarchy business process diagrams should be developed e hey do not yet have to be very detailed but they Should cover the whole area of the planned system E g attributes might not yet be needed But definitions descriptions of all entities might be very useful The more of the analysis that can be done in the strategy phase the better but time money is limited e What will be the architecture of the proposed sy stem Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 71 Strategy Phase 4 e It is important to understand the company e g Business objectives Critical success factors Strengths Weaknesses Opportunities Threats Key performance indicators e Existing systems legacy systems and required in terfaces must be understood and documented e Develop good working relationships with the people involved and understand the political environment Many projects are bound to fail because of the political environmen
39. s flexible Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 51 Database Design Phases 6 View Integration e The conceptual design step is much more compli cated than the other two The logical design step can be largely automatic and the physical design has a relatively limited set of options e Often it is not possible to create the complete ER Schema in one step because this is very large e Then one starts with small ER schemas which de scribe only the data necessary for one application or user or a small group of related applications For each application user one such schema is developed Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 52 Database Design Phases 7 e In this case one starts with the design of the exter nal schemas before the conceptual one Of course this is also done in the entity relationship model It is possible that not only the conceptual schema but also the external schemas are translated into the relational model during the logical design and actually exist as views in the final database But this is a design decision One can also treat them as only temporary sketches for collecting requirements e These schemas must then be integrated to get the complete enterprise data model i e the conceptual schema of the database Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 53 Entity
40. t Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 72 Strategy Phase 5 e Develop a timeline for the development project plan and an estimate of the needed resources Time and money are important resources But also the access to Stakeholders and users interview partners is an important resource The valuable time of people within the company is critical to the project but must be listed as a project cost Also access to hardware and to the data must be discussed e Is the project feasible in the given limits e Prioritize the project goals Not everything that would be nice to have is worth the effort If it should turn out later that time or budget is insufficient What can be sacrificed and what is essential Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 73 Strategy Phase 6 e Think about risks to the project and what can be done to manage them e Develop a cost benefit analysis and provide suffi cient motivation as to why the proposed project is worth the effort Quantify the impacts on the business e One method to estimate the complexity of a project is the Function Point Method See Software engineering textbooks http www ifpug org Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 74 Analysis Phase 1 e In the analysis phase all system requirements are gathered in complete detail
41. ta fulfills some condition but it actually does not this can have all kinds of strange effects including the loss of data E g the programmer assumes that a certain column cannot contain null values So he she uses no indicator variable when fetching data As long as there are no null values this works But if the schema does not prevents this after some time somebody will enter a null value Then the program will crash with a user unfriedly error message Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 14 Constraints 3 e Given only the structural definitions e g tables columns column datatypes there are usually still many database states which do not correspond to States of the real world e Additional conditions which database states have to satisfy should be specified In this way invalid States are excluded e Such conditions are called integrity constraints Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 15 Constraints 4 e Each data model has special support for certian common kinds of constraints e g the relational model and SQL offer Keys Unique identification of rows Foreign keys Dynamic domain defined by a key NOT NULL Entries for a column cannot be empty CHECK Conditions that refer only to single rows e Arbitrary conditions can be specified as constraints in natural language logic as SQL queries
42. that the information is in there but the new evaluation would be too difficult to program or even require manual analysis Good from the data privacy standpoint Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 35 Data Independence 2 e Data often lives longer than the programs New versions of programs are developed relatively frequently but the data collected with the old program cannot be thrown away it has to be migrated to the new system might require considerable effort e hus data must be seen independent from a spe cific program e Vice versa programs should not depend on the way the data is stored data organisation file format Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 36 Data Independence 3 e It might be necessary to change the data organisa tion from time to time e g because the number of rows in a table has grown so much that a sequential scan of all rows to find one with a specific value takes too long An index must be added e g a B tree A indes over attribute A of relation R speeds up queries that search for rows in R with A where c iS a constant plus possibly other queries certain application programs are executed so of ten that a single disk cannot support the required number of accesses per second Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 37 Dat
43. ts with the same first and last name these two attributes form a key e View Definitions The weighting of points for a course is 30 homeworks 35 project 35 final exam Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 96 Business Rules 4 Results of Business Rule Transformation continued e Programs If first and last name are unique they can be used to identify students in program inputs Otherwise a more complicated selection procedu re iS necessary Programs can also be used to check more general constraints than can be declared in current database systems e Triggers procedures that are executed at certain updates E g if the quantity on hand is smaller than 5 the item is reordered Stefan Brass Datenbanken I Universit t Halle 2003 1 Introduction 1 97 Business Rules 5 Results of Business Rule Transformation continued e Programs that are executed in certain time inter valls e g every day at the end of each month E g if the homework is not submitted one week after the deadline the student gets O points for it e Database Accounts Access rights Views E g if there is a global homework results database for all courses of the department but each professor may see only the results of his her students there probably will be accounts for every professor who wishes SQL access and a view that selects the data the current user may see Without
Download Pdf Manuals
Related Search
Related Contents
詳しくは、こちらをご覧下さい。 User Manual - PrimusDanmark.dk PYLE Audio PWMA30 User's Manual Copyright © All rights reserved.
Failed to retrieve file