Home

Manual - Mariposa - University of California, Berkeley

image

Contents

1. relname The name of the class refname The name used to refer to the class in the query relid The OID of the class fralogicalid The OID shared by all copies of this fragment frastoreid The OID for this copy of this fragment frapages Size of fragment in pages fratuples Size of fragment in tuples port TCP port not used address Network address of storage site hostid Hostid of storage site Table 11 Entries in rtable variable made available to Bidder 3 6 3 3 The subcontract Command In some cases a bidder will be asked to bid on some operation that it cannot perform It can refuse to bid as mentioned earlier or it can subcontract out some or all of the work to another processing site For example if a bidder is asked to perform a join between two classes A and B and it has A but not B it may choose to subcontract out the sequential scan of B to another site We have added the command subcontract to the Tcl provided with Mariposa The format of the subcontract command is subcontract plan contract The subcontract returns a list of five elements response price delay staleness and accuracy as described in Table 8 The plan passed to subcontract is any plan variable It can be the entire string that was passed into the bidder or a part of the string representing a subplan The contract passed to subcontract is the global variable made
2. Without a qualification the delete command deletes all instances of the given class leaving it empty The system will not request confirmation before performing this command 2 1 8 Using Aggregate Functions As in most other query languages POSTGRES supports aggregate functions However in the current implementation of POSTGRES the usage of aggregate functions is limited Specifically while there are aggregates to compute such functions as the count sum average maximum and minimum over a set of instances aggregates can only appear in the target list of a query and not in the qualification i e the where clause For example SELECT int4max ON_HAND as MAX_ON_HAND FROM WIDGETS MAX ON HAND 11000 However this query won t be accepted by POSTGRES SELECT PART_NO LOCATION ON_HAND FROM WIDGETS WHERE ON HAND int4max ON HAND As mentioned in Section 2 1 3 aggregates are commonly used with GROUP BY clauses Mariposa User Manual v 1 0 15 2 2 ADVANCED POSTGRES SOL FEATURES This section discusses those features that distinguish POSTGRES from conventional data managers These features include inheritance time travel and non atomic data values i e array and set valued attributes Examples in this section can be found in advance sql in the tutorial directory Refer to the introduction of the previous chapter for details 2 2 1 Inheritance
3. CREATE OPERATOR leftarg complex rightarg complex procedure complex_add commutator To create unary operators just omit one of leftarg for left unary or rightarg for right unary from the binary operator example Comparison operators such as lt gt and are defined the same as other binary operators The type returned by a comparison operator must be boolean TRUE or FALSE All of the POSTGRES operators are defined in this way a function definition and an operator based on the function For example the operator is defined for int4 using a built in function int4eq for char16 using charl6eq for float8 using float 8eq and so on If you give the system enough type information it can automatically figure out which operators to use SELECT a b AS c FROM test_complex E 5 2 6 05 133 42 144 95 2 7 EXTENDING SQL AGGREGATES Aggregates in POSTGRES are expressed in terms of state transition functions That is an aggregate can be defined in terms of state that is modified whenever an instance is 5 8 1024 8192 bytes In fact the type must be considerably smaller than 8192 bytes since the POSTGRES tuple and page overhead must also fit into this 8KB limitation The actual value that fits depends on the machine architecture Mariposa User Manual v 1 0 30 processed Some state functions look at a particular value
4. Table 7 Fragment Information for Data Broker The data broker may use the information in any way it wants Here is a small data broker that buys fragments it doesn t already have the second time they re accessed TEREEREHREEERREEEREREEREREEERRERERARREETAESETAHSAHdAHEEAAT EHE EE ERE E ERE databroker tcl FE EHE AE EE FE AE E TE EE FE AE FE FE E AE FE HE FE TE AE FE TE HE AE TE AE FE AE AE AE TE FE FE AE AE FE E FE AE E E E E E E E E E EE E E E E set fragmentsAccessed proc DataBroker global fragments global hostid global fragmentsAccessed dE de de OSE Hb b de Gb Go through each fragment in the list of fragments accessed If a fragment has been accessed twice then buy it Use the fragment s logicalid to keep track of which fragments have been accessed Logicalid s are unique across sites but are the same for copies of the same fragment foreach fragment fragments set fraStorageSites lindex fragment 5 If we don t already have a copy of the fragment at this site if lsearch fraStorageSites Shostid 1 set fralogicalid lindex fragment 1 set listPos lsearch fragmentsAccessed S fralogicalid If we ve already seen this fragment before remove it from the list of fragments accessed and bring it to this site using takefragment Otherwise record it Mariposa User Manual v 1 0 47 if SlistPos gt 1 set fragmentsAccessed lreplace fragmentsAccessed SlistPos listPos
5. ANGUAGE sql p o SELECT clean EMP ignore this E 2 4 2 Programming Language Functions This section describes programming language functions on base types and on composites 2 4 2 1 Programming Language Functions on Base Types Internally POSTGRES regards a base type as a blob of memory User defined functions over a user defined base type define the way that POSTGRES operates on the type That is POSTGRES will only store and retrieve the data from disk It will use the user defined functions to input process and output the data Base types can have one of three internal formats e pass by value fixed length e pass by reference fixed length e pass by reference variable length By value types can only be 1 2 or 4 bytes in length even if your computer supports by value types of other sizes POSTGRES itself only passes integer types by value You should be careful to define your types so that they will be the same size in bytes on all architectures For example the long type is dangerous because it is 4 bytes on some machines and 8 bytes on others whereas the int type is 4 bytes on most UNIX machines though not on most personal computers A reasonable implementation of the int4 type on UNIX machines might be 4 byte integer passed by value typedef int int4 Mariposa User Manual v 1 0 24 On the other hand f
6. rm rf usr local postgres95 data base foo Make a symbolic link from usr local postgres95 data base to the new directory Data for certain classes may be stored elsewhere if a nonstandard storage manager was specified when the classes were created Use of nonstandard storage managers is an experimental feature that is not supported outside of Berkeley Mariposa User Manual v 1 0 63 9 ln s new place foo usr local postgres95 data base foo 1 Restartthe postmaster 4 2 3 Updating Databases POSTGRES is a research system In general POSTGRES may not retain the same binary format for the storage of databases from release to release Therefore when you update your POSTGRES software you will probably also have to modify your databases This is a common occurrence with commercial database systems as well Unfortunately unlike commercial systems POSTGRES does not come with user friendly utilities to make your life easier when these updates occur In general you must do the following to update your databases to a new software release e Extensions such as user defined types functions aggregates etc must be reloaded by re executing the SOL CREATE commands See Appendix A for more details e Data must be dumped from the old classes into ASCII files using the COPY command the new classes created in the new database using the CREATETABLE command and the data reloaded from the ASCII files e Rules and views
7. 3 iami 10000 5000 8000 2 ew York 3000 0 1000 m ew York 500 1500 300 3 ew York 1800 200 750 6 ew York 1800 5000 1500 5 ew York 2500 2000 2000 4 ew York 3200 0 0 To group records together use GROUP BY GROUP BY is generally used with aggregates SELECT PART_NO int4sum ON_HAND as TOTAL_ON_HAND int4sum ON_ORDER as TOTAL_ON_ORDER int4sum COMMITTED as TOTAL_COMMITTED FROM WIDGETS GROUP BY PART_NO ORDER BY PART_NO PART_NO TOTAL_ON_HAND TOTAL ON ORDER TOTAL_COMMITTED 1 500 1500 300 2 12300 700 6000 3 11800 5200 8750 4 11700 0 200 5 2500 2000 2000 6 12800 5000 4500 To find out more about aggregates see Section 2 1 8 2 1 4 Redirecting SELECT Queries Any select query can be redirected to a new class SELECT INTO TABLE temp from WIDGETS This query implicitly creates a new class temp with the attribute names and types specified in the target list of the SELECT INTO command Thus you can perform operations on the resulting class as well as on other classes SELECT FROM temp Mariposa User Manual v 1 0 13 PART NO LOCATION ON_HAND ON_ORDER COMMITTED 1 ew York 500 1500 300 2 ew York 3000 p oO 1000 3 iami 100
8. POSTGRES supports arrays of user defined types as well When you define a type POSTGRES automatically provides support for arrays of that type For historical reasons the array type has the same name as the user defined type with the underscore character _ prepended to it Composite types do not need any function defined on them since the system already understands what they look like inside Mariposa User Manual v 1 0 29 A Note About Large Objects The types discussed to this point are all small objects that is they are smaller than 8KB in size If you require a larger type for something like a document retrieval system or for storing bitmaps you will need to use the POSTGRES large object interface 2 6 EXTENDING SQL OPERATORS POSTGRES supports left unary right unary and binary operators Operators can be overloaded or reused with different numbers and types of arguments If there is an ambiguous situation and the system cannot determine the correct operator to use it will return an error and you may have to typecast the left and or right operands to help it understand which operator you meant to use The following example shows how to create an operator for adding two complex numbers First you need to create a function to add the new types Then you can create the operator with the function CREATE FUNCTION complex add complex complex RETURNS complex AS SPWD obj complex so LANGUAGE c
9. complex in char str double x y Complex result if sscanf str 1f 1f amp x amp y 2 elog WARN complex in error in parsing return NULL Mariposa User Manual v 1 0 28 result Complex palloc sizeof Complex result gt x x result gt y y return result The output function can simply be char complex_out Complex complex char result if complex NULL return NULL result char palloc 60 sprintf result g 9g complex gt x complex gt y return result e Try to make the input and output functions inverses of each other If you do not you will have severe problems when you need to dump your data into a file and then read it back in say into someone else s database on another computer This is a particularly common problem when floating point numbers are involved To define the complex type you need to create the two user defined functions complex inand complex out before creating the type CREATE FUNCTION complex in opaque RETURNS complex AS usr local postgres95 tutorial obj complex so LANGUAGE c CREATE FUNCTION complex out opaque RETURNS opaque AS usr local postgres95 tutorial obj complex so LANGUAGE c CREATE TYPE complex internallength 16 input complex in output complex out As discussed earlier POSTGRES fully supports arrays of base types Additionally
10. ooo SWHA KF N This query lists all binary operators SELECT o oprname AS binary_op left typname AS left_opr right typname AS right_opr result typname AS return_type FRO pg_operator o pg type left pg type right result WHERE o oprkind b binary and o oprleft left oid and o oprright right oid and o oprresult result oid ORDER BY left opr right opr 65 pg type This query returns the name number of arguments parameters and return type of all user defined C functions The same query can be used to find all built in C functions if you change the C to internal or all SQL functions if you change the C to postquel SELECT p proname p pronargs t typname FROM pg_proc p pg_language 1 pg_type t WHERE p prolang l oid and p prorettype t oid and l lanname c ORDER BY proname This query lists all of the aggregate functions that have been installed and the types to which they can be applied count is not included because it can take any type as its argument SELECT a aggname t typname FROM pg aggregate a pg type t WHERE a aggbasetype t oid ORDER BY aggname typname This query lists all of the operator classes that can be used with each access method as well as the operators that can be used with the respective operator classes SELECT am amname opc opcname opr oprname
11. restrict eqsel join eqjoinsel Notice that five operators corresponding to less less equal equal greater and greater equal are defined The final step is to update the pg_amop relation To do this you need the following attributes amopid the OID of the pg_am instance for B tree 403 see above amopclaid the OID of the pg opclass instance for int4 abs ops whatever you got instead of 17314 see above amopopr the oids of the operators for the opclass which we ll get in just a minute amopselect cost functions amopnpages The cost functions are used by the query optimizer to decide whether or not to use a given index in a scan Fortunately these already exist btreesel which estimates Mariposa User Manual v 1 0 35 the selectivity of the B tree and btreenpage which estimates the number of pages a search will touch in the tree You need the OIDS of the operators you just defined To find them look up the names of all the operators that take two int 4s and pick yours out SELECT o oid AS opoid o oprname INTO TABLE complex_ops_tmp FROM pg_operator o pg_type t WHERE o oprleft t oid and o oprright t oid and t typname complex which returns oid oprname 17321 lt 17322 lt 17323 17324 gt 173257 1 8 Again some of your OID numbers will almost certa
12. set fraclassid lindex fragment 0 set frastoreid lindex fragment 2 set frahostid lindex fraStorageSites 0 takefragment fraclassid frastoreid frahostid else lappend fragmentsAccessed fralogicalid We have added two Tcl commands for the data broker takefragment and movefragment The syntax for these commands is takefragment classid storeid fromhostid movefragment classid lt storeid gt lt tohostid gt As the names suggest takefragment takes a fragment from the remote host indicated by lt fromhostid gt and installs it locally movefragment moves a fragment to the remote host lt tohostid gt 3 6 QUERY PROCESSING IN MARIPOSA As described in Section 1 3 when a user submits a query to Mariposa it passes through several modules including a parser optimizer fragmenter and so on The behavior of some of these modules can be affected by the user to a greater or lesser extent either by using special Mariposa commands or in the case of the bidder by providing the Tcl script that defines its behavior This section describes these modules and how a Mariposa user can change their behavior and thus the behavior of the system 3 6 1 The Fragmenter After the Mariposa system accepts a query from a client process parses it and performs optimization the system hands the query in the form of a query plan to the fragmenter The Mariposa fragmenter module takes a plan that only references whole classes
13. usr local mariposa include You should always include I usr local mariposa include on your cc command lines Sometimes you may find that you require header files that are in the server source itself In those cases you may need to add one or more of I usr local postgres95 src backend I usr local postgres95 src backend include I usr local postgres95 src backend port lt PORTNAME gt I usr local postgres95 src backend obj where lt PORTNAME gt is the name of the port e g alpha or sparc e When allocating memory use the POSTGRES routines palloc and pfree instead of the corresponding C library routines malloc and free The memory allocated by palloc will be freed automatically at the end of each transaction preventing memory leaks Mariposa User Manual v 1 0 27 e Always zero the bytes of your structures using memset or bzero Several routines such as the hash access method hash join and the sort algorithm compute functions of the raw bits contained in your structure Even if you initialize all fields of your structure there may be several bytes of alignment padding holes in the structure that may contain garbage values e Most of the internal POSTGRES types are declared in postgres h so it s usually a good idea to include that file as well e Compiling and loading your object code so that it can be dynamically loaded into POSTGRES always requires special flags See Appendix A for a detailed explanation of h
14. Defining a function like this can be tricky Some of the more important considerations are e The target list order must be exactly the same as that in which the attribu appear in the CREATE TABLE statement or when you execute a query tes e You must typecast the expressions using very carefully or you will see the following error WARN function declared to return type EMP does not retrie EMP ve e When calling a function that returns an instance you cannot retrieve the entire instance You must either project an attribute out of the instance or pass the entire instance into another function Mariposa User Manual v 1 0 23 im SELECT name new emp AS nobody nobody one Because the parser doesn t understand the other dot syntax for projection when combined with function calls you should use the function syntax for projecting attributes of function return values SELECT new_emp name AS nobody WARN parser syntax error at or near wr Any collection of commands in the SQL query language can be packaged together and defined as a function The commands can include updates i e insert update and delete as well as select queries However the final command must be a select that returns whatever is specified as the function s return type CREATE FUNCTION clean EMP RETURNS int4 S DELETE FROM EMP WHERE EMP salary lt 0 ECT 1 AS ignore _ this
15. Mariposa 1200 Mariposa 1320 The default beginning of a time range is the earliest time representable by the system and the default end is the current time thus the above time range can be abbreviated as noz 2 2 3 Non Atomic Values Arrays One of the tenets of the relational model is that the attributes of a relation are atomic POSTGRES does not have this restriction attributes can contain subvalues that can be accessed from the query language For example you can create attributes that are arrays of base types With POSTGRES attributes of an instance can be defined as fixed length or variable length multi dimensional arrays Arrays of any base type or user defined type can be created To illustrate this first create a class with arrays of base types On UNIX systems this is always midnight January 1 1970 GMT Mariposa User Manual v 1 0 17 CREATE TABLE SAL EMP name text pay by quarter int4 schedule chari6 i The above query creates a class named SAL_EMP with a text string name a one dimensional array of int 4 pay by quarter which represents the employee s salary by quarter and a two dimensional array of char16 schedule which represents the employee s weekly schedule To insert values into an array use the INSERT statement Note that when appending to an array enclose the values within braces and separate them by commas This is not unlike the syntax for i
16. WHERE amop amopid am oid FROM pg am am pg amop amop pg opclass opc pg operator opr Mariposa User Manual v 1 0 and amop amopclaid opc o and amop amopopr opr oid ORDER BY amname opcname id oprname 66 Mariposa User Manual v 1 0 67 Appendices A Sample Bidder Script This bidder script is in SPGDATA base files bidder tcl It gives an idea of how a cost based bidder might be constructed It looks at each node in the query plan passed in and charges a fixed amount at each node per page and or per tuple It multiplies the cost element of the bid by the current load average HEH HE HE HH FE FE E HE HH EEE FE FE EE EE FE FE EE HE FE FE FE HE HE TE EE EE HE EH HE EE HE TE HE E FE HE HE HE E E E E E E bidder tcl Input plan tree represented as a string Output list containing response cost delay staleness accuracy response BID if all data fragments references in the query are local REFUSETOBID otherwise cost Based on the per tuple and per page charge for each node in the query plan delay Based on the per tuple and per page delay for each node in the query plan staleness accuracy ignored Recursively descends the plan tree keeping track of the number of pages and number of tuples generated and adding up the cost and delay until the root is reached At this point the total cost and total delay have been calculated Multiplies cost by the current load average Ignores staleness and
17. and aggregates Standard relational systems store information about databases tables and columns in what are commonly known as system catalogs Some systems call this the data dictionary Although the DBMS stores its internal bookkeeping within the system catalogs this information is typically not available to users One key difference between POSTGRES and standard relational systems is that POSTGRES stores much more information in its catalogs than relational systems do not only information about tables and columns but also information about its types functions access methods and so forth These classes can be modified and extended by the user thereby extending the built in capabilities of POSTGRES By comparison conventional database systems can only be extended by changing hardcoded procedures within the DBMS or by loading modules specially written by the DBMS vendor POSTGRES is also unlike most other data managers in that the server can incorporate code written by users through dynamic loading That is a user can specify an object code file e g a compiled o file or shared library which implements a new type or function and POSTGRES will load it as required Code written in SQL is even easier to add to the server 2 3 1 The POSTGRES Type System POSTGRES types are divided into two categories base types and composite types Base types are those like int 4 which are implemented in a programming language such as C Th
18. 0 32 an operator e g gt or lt and tell what kind of comparison it is In fact some access methods don t impose any ordering at all For example R trees express a rectangle containment relationship whereas a hashed data structure expresses only bitwise similarity based on the value of a hash function POSTGRES needs some consistent way of taking a qualification in your query looking at the operator and then deciding if a usable index exists This implies that POSTGRES needs to know for example that the lt and gt operators partition a B tree POSTGRES uses strategies to express these relationships between operators and the way they can be used to scan indices Defining a new set of strategies is beyond the scope of this discussion but we ll explain how B tree strategies work because you ll need to know that to add a new operator class In the pg am class the amstrategies attribute is the number of strategies defined for this access method For B trees this number is 5 These strategies correspond to less than less than or equal greater than or equal E 2 equal 3 E 5 greater than The idea is that you ll need to add procedures corresponding to the comparisons above to the pg amop relation see below The access method code can use these strategy numbers regardless of data type to figure out how to partition the B tree compute selectivity and so on Don t worry about the details of a
19. 1 is up to an hour out of date If Site 2 generated writes frequently and users at Site 1 required more accurate information a smaller update interval might have been appropriate The next two sections describe the Mariposa replica system and name service in more detail Section 3 5 describes the Mariposa Data Broker Section O explains distributed query processing in Mariposa in more detail 3 3 THE MARIPOSA REPLICA SYSTEM Mariposa permits the replication of data fragments In the current implementation a replica is created from one other replica which we will refer to as its parent Replicas created from a parent are called its children Each replica periodically receives updates from its parent This allows Mariposa to use one replica in the place of another improving availability during host crashes and network failures and improving performance There are two types of Mariposa replicas A read only replica receives all updates from its parent but cannot process updates A read write replica propagates its updates to its children as well as receiving updates from its parent if it has one We use the term update to mean any tuple insertion deletion or modification Mariposa User Manual v 1 0 43 Updates are sent from a parent to a child in an update stream Update streams are initiated by the site manager of the parent site at regular intervals The update interval is specified at the time a copy is created There are two update
20. 2 Dropping a Copy Dropping a copy in Mariposa removes the contents of the copy from the database and cancels the update contract with the parent site Note The system will not warn you if you try to drop the last copy of a fragment The syntax for the drop copy command is DROP COPY fragname where fragname is the name of the copy 3 3 3 Moving a Copy Since copies are fragments with associated update stream contracts the MOVE FRAGMENT command is used to move copies as well as fragments In the case of copies the system takes care of renogotiating the update stream contracts between the other copy holders and the new copy holder See Section 3 2 3 for information on moving fragments 3 4 MARIPOSA NAME SERVICE The purpose of name service is to supply client sites with the necessary information to run queries on remote data In order to process a query on remote data Mariposa needs the queried tables metadata at various stages Mariposa User Manual v 1 0 44 e During parsing the syntactic correctness of the query statement has to be verified This requires information about the queried tables attributes and their types about operators used in the query etc e The fragmenter needs information about the fragmentation of remote tables e The query broker needs to know the location of the remote fragements For local tables this information is stored in the site s local database catalogs For remote tables a name serve
21. POSTGRES create table command is the same command used to create a table in a traditional relational system However POSTGRES tables classes have properties that are extensions of the relational model 2 1 2 Populating a Class with Instances The insert statement is used to populate a class with instances INSERT INTO WIDGETS VALUES 1 New York 500 1500 300 The copy command is used to load large amounts of data from flat ASCII files on the client to the POSTGRES server For example the command COPY WIDGETS FROM src tutorial widgets txt will copy the entries in the text file src tutorial widgets txt into the WIDGETS table Mariposa User Manual v 1 0 11 2 1 3 Querying a Class The WIDGETS class can be queried with normal relational selection and projection queries An SQL select statement is used to do this The statement is divided into a target list 1 e the part that lists the attributes to be returned and a qualification 1 e the part that specifies any restrictions For example to retrieve all the rows of WIDGETS type SELECT FROM WIDGETS and the output will be PART NO LOCATION ON_HAND ON_ORDER COMMITTED 1 ew York 500 1500 300 2 ew York 3000 0 1000 3 iami 10000 5000 8000 4 iami 8500 p 0 200 5 ew York 2500 2000 2000 3 ew York 1800 200 750 2 ia
22. The following examples illustrate inheritance in POSTGRES The statements below create the class cities as well as the capitals class which contains all the state capitals The capitals class inherits from the cities class CREATE TABLE cities name text population float8 altitude int4 in ft CREATE TABLE capitals state char2 INHERITS cities In this case an instance of capitals inherits all attributes ie name population and altitude from its parent cities The type of the attribute name is text a built in POSTGRES type for variable length strings The type of the attribute population is float8 the POSTGRES built in type for double precision floating point number The type for the altitude attribute is int4 a built in POSTGRES type for regular four byte integer numbers The capitals class has an extra attribute state which contains the capital s state In POSTGRES a class can inherit from zero or more other classes In addition a query can reference either all instances of a class or all instances of a class plus all of its descendants For example the following query finds all the cities that are situated at an attitude of 500 feet or higher SELECT name altitude FROM cities WHERE altitude 500 name altitude Las Vegas 2174 Mariposa 1953 On the other hand to find the names of all cities including state capital
23. a Mariposa Class In Mariposa classes are created at a site just as in a single site database However the CREATE TABLE command in Mariposa has been extended In Mariposa a user can specify how the table is to be partitioned should the table ever be split into fragments Note that when a table is split it is always split into two fragments Each of the resulting fragments can be split again and so on There are four ways to partition a table in Mariposa shown in Table 5 Partition Explanation Mode Random Records are placed in one fragment or the other at random Round Robin Half the records are placed in one fragment half in the other Key Based The records are partitioned by value based on one of the attributes of the class and a split value supplied by the user Hash Based The records are partitioned by the comparing the result of a function over one of the attributes to a value The function the attribute and the split value are all supplied by the user Table 5 Mariposa Partitioning Strategies The SQL CREATE TABLE statement has been extended in Mariposa to include a PARTITION clause The syntax for the PARTITION clause for the four partitioning modes is in Table 6 Partition PARTITION Clause Syntax Mode Random RANDOM Round Robin RONDROBIN Key Based PARTITION ON lt attribute gt USING lt function gt Hash Based PARTITION ON lt f
24. an example of an aggregate that requires both a function to compute the running sum and a function to compute the running count When all of the instances have been processed the final answer for the aggregate is the running sum divided by the running count You can use the int4pl and int4inc routines you used previously as well as the POSTGRES integer division routine int4div to compute the division of the sum by the count CREATE AGGREGATE my average sfuncl int4pl sum basetype int4 stypel int4 sfunc2 int4inc count stype2 int4 finalfunc int4div division initcondl 0 initcond2 0 un 4 ECT my_average salary as emp_average FROM EMP Mariposa User Manual v 1 0 31 emp average 1640 2 8 INTERFACING EXTENSIONS TO INDICES The procedures described to this point enable you to define a new type new functions and new operators However you have not yet seen how to define a secondary index such as a B tree R tree or hash access method over a new type or its operators Look back at Figure 2 The right half shows the catalogs that you must modify in order to tell POSTGRES how to use a user defined type and or user defined operators with an index ie pg am pg amop pg amproc and pg opclass Unfortunately there is no simple command to do this This section demonstrates how to modify these catalogs through a running example a new operator cla
25. available to the bidder and described in Table 9 The Tcl bidder script needs to pass it back to identify the larger plan to which the subplan belongs When a bidder uses the subcontract command the subplan is passed to the query broker at the bidder s site which contacts potential processing sites gets their bids and returns the best one to the bidder The bidder can then add the subcontracted price and delay into its own bid and return a completed bid to the query broker that contacted it originally If the bidder site is awarded the bid the site manager automatically sends out the subcontracted part of the plan to the appropriate site 3 6 3 4 Sample Bidder Script A sample bidder script is included in Appendix A The first procedure Get QueryBid must be included in all bidder scripts as mentioned above This bidder script includes one procedure for each node type GetQueryBid calls Cost BasedBid to calculate the cost and delay this bidder script ignores staleness and accuracy CostBasedBid takes the first element in the plan string passed in which is the node type and calls the procedure of the same name This bidder formulates a bid by recursively visiting the nodes of the plan tree and assigning a cost and delay to each node The final bid returned by CostBasedBid is the sum of the bids for all the nodes in the tree GetQueryBid multiplies the cost element in the bid returned by Cost BasedBid by the load average In addition to the p
26. cost lindex subBid 1 set delay lindex subBid 2 return list response cost delay 0 0 0 0 Input left subtree Output bid Charges a fixed price per tuple and per page proc SORT nodeNum leftTree junk junk2 global BID REFUSETOBID global nTuples global nPages global rtable global hostid set perTupleCharge 001 set perTupleDelay 000400 set leftSubBid CostBasedBid leftTree set leftTuples nTuples if SleftTuples 0 set leftTuples 10000 set nTuples 10000 set cost expr leftTuples perTupleCharge set delay expr leftTuples SperTupleDelay set bid SBID cost delay 0 0 0 0 set bid CombineBids leftSubBid bid return bid Mariposa User Manual v 1 0 Input nodeNum children nodes Output bid Charges a fixed price per tuple for the merge proc MERGE nodeNum subTreeList junk junk2 global BID REFUSETOBID global nTuples global nPages global rtable global hostid set perTupleCharge 001 set perTupleDelay 000400 set bid list BID 0 0 0 0 0 0 0 0 set mergeTuples 0 set mergePages 0 For each child node get the bid for the subplan and combine it with the current bid Keep track of the number of tuples and pages in the children nodes foreach subPlan subTreeList set bid CombineBids bid CostBasedBid subPlan incr mergeTuples nTuples incr mergePages nPages set cost expr mergeTuples SperTupleCharge set delay
27. in the instance when computing the new state sfuncl in the create aggregate syntax while others only keep track of their own internal state sfunc2 If you define an aggregate that uses only sfuncl you are defining an aggregate that computes a running function of the attribute values from each instance Sum is an example of this kind of aggregate Sum starts at zero and always adds the current instance s value to its running total The following example uses the int4pl that is built into POSTGRES to perform this addition CREATE AGGREGATE complex sum sfuncl complex add basetype complex stypel complex initcondl 0 0 SE im ECT complex sum a FROM test complex complex sum 34 53 9 If you define only sfunc2 you are specifying an aggregate that computes a running function that is independent of the attribute values from each instance Count is the most common example of this kind of aggregate Count starts at zero and adds one to its running total for each instance ignoring the instance value Here you can use the built in int4inc routine to do the work for you This routine increments adds one to its argument CREATE AGGREGATE my count sfunc2 int4inc add one basetype int4 stype2 int4 initcond2 0 SELECT my count as emp count from EMP emp count Ls Average is
28. relation SEQSCAN NODENUM 1 LEFT TREE SORT Sort SORT NODENUM LEFT TREE MERGE Merge MERGE NODENUM CHILD CHILD CHILD XIN Exchange In XIN NODENUM LEFT TREE AGG Aggregate such AGG NODENUM LEFT TREE as count GROUPBY Group By node GROUPBY NODENUM LEFT TREE UNKNOWN Mariposa will UNKNOW ODENUM LEFT TREE RIGHT TREE fill in UNKNOWN Table 10 Plan Nodes in plan variable made available to Bidder The rtable variable is a list of information about the tables referred to in the query plan rtable is short for range table which comes from the SQL syntax range of E is EMP The range table provides information about the size location and fragmentation information of the tables referred to in the query plan It is in the form where each rangeTb1 relnam rangeTblEntry rangeTblEntry Entry is a list refnam relid fragInfo and fragInfoisalistin which each entry is in the form and finally storagesites is a list in which each entry is of the form port address hostid These entries are described in Table 11 fralogicalid frastoreid frapages fratuples storagesites These are added to plan trees in between nodes processed at different sites An Exchange In node accepts a tuple stream from a remote site and feeds it into the next node Mariposa User Manual v 1 0 56 Entry Name Description
29. report of the user defined attributes and their types for all user defined classes in the database SELECT c relname a attname t typname FROM pg class c pg attribute a pg type t WHERE c relkind r no indices and c relname pg no catalogs and c relname Inv no large objects and a attnum 0 no system att s and a attrelid c oid and a atttypid t oid ORDER BY relname attname This query lists all user defined base types not including array types EC WHERE T u usename t typname pg_type t pg_user u u usesysid int2in int4out t typowner typrelid 0 oid no complex types typelem 0 oid no arrays usename lt gt postgres ER BY usename typname Mariposa User Manual v 1 0 This query lists all left unary post fix operators SELECT o oprname AS left unary right typname AS operand result typname AS return type FROM pg operator o pg type right pg type result WHERE o oprkind 1 left unary a a O nd o oprright right oid nd o oprresult result oid RDER BY operand This query lists all right unary pre fix operators ELECT o oprname AS right_unary eft typname AS operand esult typname AS return_type ROM pg_operator o pg_type left pg_type result HERE o oprkind r right unary nd o oprleft left oid nd o oprresult result oid RDER BY operand
30. toto EERS 37 3 2 A4 DISTRIBUTED BXAMBPLE 55503 PIER ERE eee nan EKSE donnee ESEE E OA suudeovs sues sods Rs e aere Fotos 38 3 2 1 Creating a Mariposa Class Ai e 39 3 2 2 Spliiting a Class into Fragments siere eds eee a is 40 3 2 3 Moving Fragments apes entente kr 41 3 2 4 Copying a Fragment iei tes resa dee ee are rie are ER e e E SERT 42 3 3 THE MARIPOSA REPLICA SYSTEM esee etae ree et EAR E REER EEES E PERS 42 3 3 1 Credting dCOpy ise ie i pe RR LE E ERI Ua RYE ERE E ERR ERI E E 43 3 3 2 Dropping a Copy sie enini e Ri IIT 43 3 3 3 Moving d CODy eiusd cedi OA 43 S4 MARIPOSA NAME SERVICE 5 nina rtr ERR nO ERREUR einen IS EU CAPERE ibid 43 34 1 Setting Up Name Services titine rare der ede ee eae SENE 44 Mariposa User Manual v 1 0 4 3 4 2 Specifying A Primary Name Server eee ede i is 45 3 5 THE MARIPOSA DATA BROKER eee sent nnhnn eret eese nennen nnne nee enna 45 3 6 QUERY PROCESSING IN MARIPOSA id nennen rne rennen nennen nen 47 3 0 1 The Fragmenter ie desti edente d tadas reia 47 3 6 1 T Prasmented Query Plans it week 47 3 0 2 The Query Broker e epe pi pie ren nn UR REPRE US 49 30 2 1 Bid Curve MEC RON 50 3 6 2 2 Plan Chunks 5 rte estera eee a eti e e o interne tentent deb SR 51 3 6 2 3 Bid Protocols En Me oet eet ett EE Far sp HR Fu P Fare eu mu oua deudas 52 3 6 2 3 1 The Short Protocol net tete b tede al 52 3 62 32 The Lone Prot
31. variable rtable SEQSCAN uses this information to tell whether the scan is over a base relation or a temporary relation If the scan is over a base relation the information contains fragment storage locations and number of tuples and number of pages in the fragment If SEOSCAN is called for a temporary relation it behaves like the other procedures it calculates the cost and delay then passes its child node to GetQueryBid If SEOSCAN is called for a base relation it first checks to see if there is a copy of the fragment stored locally If so it gets the number of tuples and number of pages in the fragment from the range table If there is no local copy SEQSCAN calls subcontract Subcontract is a Tcl extension added for Mariposa and is discussed in the next section Mariposa User Manual v 1 0 58 4 ADMINISTERING POSTGRES AND MARIPOSA This section explains how to run the Mariposa processes create a Mariposa database add and delete users and perform other administrative functions This section assumes that you have already installed Mariposa on each machine on which it will run If you have not installed Mariposa refer to the Installation and Setup Guide Even if you are not the administrator of your database you will find it useful to be familiar with many of these tasks 4 1 Frequent Tasks This section discusses frequently performed administrative tasks 4 1 1 Starting the Site Manager If you did not install PO
32. with a query of this form the POSTGRES query optimizer will tend to consider far more choices than it should and may run out of memory e Jf you are really desperate to see what query plans look like you can run the postmaster with the d option and then run monitor with the t option The format in which query plans will be printed is hard to read but you should be able to tell whether any index scans are being performed 4 2 Infrequent Tasks At some time or another every POSTGRES site administrator has to perform all of the following actions 4 2 1 Cleaning Up After Crashes The postgres server and the postmaster run as two different processes They may crash separately or together The housekeeping procedures required to fix one kind of crash are different from those required to fix the other The message you will usually see when the backend server crashes is FATAL no response from backend detected in Mariposa User Manual v 1 0 61 This generally means one of two things there is a bug in the POSTGRES server or there is a bug in some user code that has been dynamically loaded into POSTGRES You should be able to restart your application and resume processing but there are some considerations POSTGRES usually dumps a core file a snapshot of process memory used for debugging in the database directory usr local postgres95 data base database core on the server machine If you don t want to try to debug the problem o
33. 00 5000 8000 4 lami 8500 p 0 200 5 ew York 2500 2000 2000 3 ew York 1800 200 750 2 iami 9300 700 5000 4 ew York 3200 p oO 0 6 ew York 1800 5000 1500 6 iami 11000 p 0 3000 2 1 5 Joins Between Classes So far this section has shown queries that access one class at a time Queries can access multiple classes at once or access the same class in such a way that multiple instances of the class are being processed at the same time A query that accesses multiple instances of the same or different classes at one time is called a join query For example to find the widgets are on hand in greater quantity in Miami than in New York SELECT W1 PART_NO W1 O0N HAND as MIAMI W2 ON_HAND as NY FROM WIDGETS W1 WIDGETS W2 WHERE W1 LOCATION Miami and W2 LOCATION New York and W1 PART NO W2 PART_NO and W1 ON_HAND gt W2 ON_HAND PART NO MIAMI xv 2 9300 3000 3 10000 1800 4 8500 3200 6 1100 1800 In this case both W1 and W2 are surrogates for an instance of the class widgets and both range over all instances of the class In relational database systems W1 and W2 are known as range variables In addition a query can contain an arbitrary number of class names and surrogates 2 1 6 Updates To update existing instances use the update command For example to ref
34. 00 o 200 5 ew York 2500 2000 2000 3 ew York 1800 200 750 E iami 9300 700 5000 4 ew York 3200 0 0 6 ew York 1800 5000 1500 6 iami 11000 o0 3000 Each fragment can also be queried individually SELECT FROM WIDGETS_MI PART_NO LOCATION ON_HAND ON_ORDER COMMITTED E Miami 10000 5000 8000 4 Miami 8500 o 200 2 Miami 9300 700 5000 6 Miami 11000 o 3000 SELECT FROM WIDGETS_NY PART_NO LOCATION ON_HAND ON_ORDER COMMITTED 1 New_York 500 1500 300 2 New York 3000 0 1000 5 New York 2500 2000 2000 3 New York 1800 200 750 4 New York 3200 0 0 6 New York 1800 5000 1500 41 Alternatively to create the two fragments WIDGETS MI and WIDGETS NY and populate them with the appropriate tuples we could have created the WIDGETS table and split it without populating it first Then we could have inserted tuples into the fragments putting the New York tuples in WIDGETS NY and the Miami tuples in WIDGETS MI 3 2 3 Moving Fragments Now suppose we want to move the WIDGETS MI fragment to Miami Mariposa server running in Miami has a hostid of 2 For an explanation of hostid s see the Installation and Setup Guide We use the MOVE FRAGMENT command to move WIDGETS MI to hostid 2 MOVE FRAGM After the fragment has been moved we can still query the WI
35. DGETS class as before SELECT F ENT WIDGE ROM WIDGE S MI Eo 2 S Say the Mariposa User Manual v 1 0 42 PART_NO LOCATION ON_HAND ON_ORDER COMMITTED 1 ew York 500 1500 300 2 ew York 3000 p 0 1000 3 iami 10000 5000 8000 4 lami 8500 O 200 5 ew York 2500 2000 2000 3 ew York 1800 200 750 2 lami 9300 700 5000 4 ew York 3200 0 0 6 ew York 1800 5000 1500 6 iami 11000 L0 3000 If the query SELECT FROM WIDGETS only retrieves the New York tuples the name server in this case Site 1 hasn t received the metadata from Site 2 If you have set up Site 1 as the name server as indicated at the beginning of this section wait until the update interval has passed and issue the query again 3 2 4 Copying a Fragment Now suppose that we want to make a read only copy of the WIDGETS MI fragment so that we can query it without the latency of network traffic We can make a read only copy by issuing the COPY FRAGMENT command COPY FRAGMENT READONLY WIDGETS MI FROM 2 UPDATE EVERY 3600 This command causes a read only copy of WIDGETS MI to be brought from Site 2 to Site 1 the home site Furthermore updates from Site 2 will be brought over and applied every hour 3600 seconds This means that the copy of WIDGETS MI at Site
36. E c SELECT oid proname FROM pg proc WHERE prname int4 abs emp oia proname 17328 int4 abs cmp Mariposa User Manual v 1 0 36 Again your OID number will probably be different and you should substitute the value you see for the value below Recalling that the B tree instance s OID is 403 and that of int4 abs opsis 17314 you can add the new instance as follows INSERT INTO pg amproc amid amopclaid amproc amprocnum VALUES 403 0id btree oid 17314 0id pg opclass tuple 17328 0id new pg proc oid ES VLNA 2 9 THE POSTGRES RULE SYSTEM Production rule systems are conceptually simple but there are many subtle points involved in actually using them Consequently this guide does explain the actual syntax and operation of the POSTGRES rule system here Instead you should read STON90b to understand some of these points and the theoretical foundations of the POSTGRES rule system before trying to use rules The discussion in this section is intended to provide an overview of the POSTGRES rule system and point you to helpful references and examples The query rewrite rule system modifies queries to take rules into consideration and then passes the modified query to the query optimizer for execution It is very powerful and can be used for many things such as query language procedures views and versions The power of this rule system
37. EFUSETOBID cost delay stale acc e expr costl cost2 expr Sdelayl delay2 expr stalel gt stale2 stalel stale2 Rpr Sacel lt accZ2 Saccl a6 c2 return list Sresponse cost delay stale acc Mariposa User Manual v 1 0 MERGEJOIN Input left sub tree right sub tree Output bid Updates nTuples and nPages guesses one match for each outer tuple proc MERGEJOIN nodeNum leftTree rightTree junk global BID REFUSETOBID global nTuples global nPages global rtable global hostid set perTupleCharge 001 set perTupleDelay 000400 set leftSubBid CostBasedBid leftTree set leftTuples nTuples set leftPages nPages set rightSubBid CostBasedBid S rightTree set rightTuples nTuples set rightPages nPages Fill in arbitrary values if nothing is known about the results of the join s children if SleftTuples 0 set leftTuples 10000 if SrightTuples 0 set rightTuples 10000 if SleftPages 0 set leftPages 100 if SrightPages 0 set rightPages 100 Each outer and inner tuple is touched once set delay expr leftTuples SrightTuples perTupleDelay set cost expr leftTuples rightTuples perTupleCharge Wild guess one match for each outer tuple set nTuples leftTuples set bid CombineBids leftSubBid rightSubBid set bid CombineBids Sbid list BID cost delay 0 0 0 0 return bid Mariposa User Manua
38. Likewise the nodes in Plan B represent more work on average than those in Plan C If the work is divided between multiple servers Plan C can achieve a higher degree of parallelism than Plan B and similarly for Plan B and Plan A Mariposa User Manual v 1 0 49 JOIN a JOIN SS TEMP1 SS TEMP2 MERGE SORT MERGE SORT SORT SORT SS TEMP1 SS TEMP2 SS TEMP3 SS TEMP4 MERGE MERGE SORT SORT SORT SORT SORT SS EMP1 SS EMP2 SS EMP3 SS DEPT1 SS DEPT2 SS EMP1 SS EMP2 SS EMP3 SS DEPT1 SS DEPT2 Fragmented Plan A Fragmented Plan B MERGE JOIN JOIN JOIN JOIN SS TEMP1 SS TEMP2 SS TEMP3 SS TEMP4 SS TEMP5 SS TEMP6 SS TEMP7 SS TEMP8 SORT SORT SORT SORT SORT SORT SORT SORT SS EMP1 SS DEPT1 SS EMP1 SS DEPT2 SS EMP3 SS DEPT1 SS EMP3 SS DEPT2 JOIN JOIN SS TEMP9 SS TEMP10 SS TEMP11 SS TEMP12 SORT SORT SORT SORT SS EMP2 SS DEPT1 SS EMP2 SS DEPT2 Fragmented Plan C Figure 4 The fragmenter can produce different plans from one unfragmented plan by inserting merge nodes at different places in the plan tree The placement of merge nodes in a plan is controlled by the SQL extension SET FRAGMENTATION lt fragmentation factor gt where lt fragmentation factor is an integer between O and 100 inclusive As the fragmenter works its way up the plan tree there are various points at which it may insert a merge node At each one the fragmenter generates a random number between 0
39. MARIPOSA DISTRIBUTED DATABASE MANAGEMENT SYSTEM USER S MANUAL Mariposa User Manual v 1 0 2 Mariposa is copyrighted by the Regents of the University of California Permission to use copy modify and distribute this software and its documentation for educational research and non profit purposes and without fee is hereby granted provided that both the copyright notice this permission notice and the following two paragraphs appear in supporting documentation Permission to use copy modify and distribute is granted provided the name of the University of California not be used in advertising or publicity pertaining to distribution of the software without specific written prior permission Permission to incorporate this software into commercial products can be obtained from the Campus Software Office 1150 Shattuck Ave University of California Berkeley California 94720 The University of California makes no representations about the suitability of this software for any purpose It is provided without express or implied warranty IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT INDIRECT SPECIAL INCIDENTAL OR CONSEQUENTIAL DAMAGES INCLUDING LOST PROFITS ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES INCLUDING BUT NOT LIMITED T
40. O THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE THE SOFTWARE PROVIDED THEREUNDER IS ON AN AS IS BASIS AND THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE SUPPORT UPDATES ENHANCEMENTS OR MODIFICATIONS Mariposa User Manual v 1 0 3 TABLE OF CONTENTS 1 INTRODUCTION tacet dents tested sect natal etra bios tousse te caco e eio d De pocos cus ba cio tese data needed ann ds dec 5 TW HAT TISINIARIPOSA ilhid rei i e EU nn En ee ne reges 5 1 2 USING EHIS MANUAE tente ete ette te eth ete tee ete te vete de et o E e Lee ae pente tene 6 1 3 OVERVIEW OF THE ARCHITECTURE cesset eene hen a nnne seen ner nan enn seen tenes nana re nennen reete 6 2 POSTGRES eee A E OAOE EIEE ISE ss etos se tesa etes s seen ss toss etas etse stessa EIET N 8 2 1 THE QUERY LANGUAGE POSTGRES SQL eene enne nennen nnne 9 2 1 1 Creating a New Class ia 10 2 1 2 Populating a Class with Instances eene eene nnne enne enne nnne nene enne 10 2 1 3 Querying a Class se iaa e eee Ere orae nee ege RR Ree eR REA EHE Re Ro Hee netii eben 11 2 1 4 Redirecting SELECT Queries iei Meuse ramener dee eed teet eee aeter 12 2 1 5 Joins Between Classes sise 13 2 LO Updates sitio et e oe t tle t e D e i e Te 13 21 7 Deletions MERE ARAN 14 2 1 8 Using Aggregate FUnctions sise 14 2 2 ADVANCED POSTGRES SQL FEATURES iii 15 BRL Inh ertian ee us eite ec er ere eee dee aee deste e d e ERR ee 15 2 2 2
41. S is that of a class A class is a named collection of object instances Each instance has the same collection of named attributes of which each attribute is a specific type Furthermore each instance has a permanent object identifier OID that is unique throughout the installation Because SQL syntax refers to tables the terms table and class are used interchangeably throughout this manual Similarly a row is an instance and columns are attributes You can create a new class by specifying the class name as well as all attribute names and their types CREATE TABLE WIDGETS PART NO int4 LOCATION charl6 warehouse Miami or New York ON HAND int4 quantity on hand ON ORDER int4 quantity on order COMMITTED int4 quantity sold but not shipped Note that keywords are case insensitive whereas identifiers are case sensitive Therefore CREATE TABLE could have been typed create table or Create Table but char16 could not have been typed any other way POSTGRES SQL supports the standard SQL types with the exceptions noted in Table 2 POSTGRES is unique in that it can be customized with an arbitrary number of user defined data types Consequently type names are not keywords For example you could define a type called CHARI6 distinct from charl6 and define attributes of type CHARIO although this would be confusing to say the least As described so far the
42. STGRES exactly as described in the installation instructions you may have to perform some additional steps before starting the postmaster process Even if you were not the person who installed POSTGRES you should understand the installation instructions The installation instructions explain some important issues with respect to where POSTGRES places some important files proper settings for environment variables etc that may vary from one version of POSTGRES to another You must start the postmaster process with the user id that owns the installed database files In most cases if you have followed the installation instructions this will be the user postgres If you do not start the postmaster with the right user id the backend servers that are started by the postmaster will not be able to read the data Make sure that usr local postgres95 bin is in your shell command path because the postmaster will use your PATH to locate POSTGRES commands Remember to set the environment variable PGDATA to the directory where the POSTGRES databases are installed This variable is more fully explained in the POSTGRES installation instructions If you do start the postmaster using non standard options such as a different TCP port number remember to tell all users so that they can set their PGPORT environment variable correctly Mariposa User Manual v 1 0 59 4 1 2 Shutting Down the Postmaster If you need to halt the postmaster proce
43. TION 1 1 What is Mariposa The Mariposa distributed database management system is an ongoing research project at the University of California at Berkeley Mariposa addresses fundamental problems in the standard approach to distributed data management We believe that the underlying assumptions traditionally made while designing distributed data managers do not apply to todays wide area network WAN environments To date distributed database management systems have been designed for local area networks LAN s with few servers operating within one administrative domain such as one company or one department within a company Furthermore these systems assume uniformity of all processors and network connections within the system Data movement in these systems is a very heavyweight operation and is performed manually by a database administrator The explosive growth of distributed computing illustrated by the World Wide Web dictates an entirely different set of assumptions Mariposa allows DBMS s which are far apart and under different administrative domains to work together to process queries Furthermore we have introduced an economic paradigm in which processing sites buy and sell data and query processing services Not only does this aproach reflect the emerging reality of a commercialized Internet it has also allowed us to address many of the problems inherent in designing a wide area distributed DBMS Mariposa has been designed with the fo
44. Time Travel eia esee teretes eee dee eee Pee Ent esee Pana e Ea debe eta dete e negare ka Eia ed 16 2 2 3 Non Atomic Values Arrays oae e EEE E EE E EE nano na ran nere nen en A E a 16 2 3 POSTGRES EXTENSIBIEITY SNR RSROHDICOSHURBERMEEBRERROR aa 18 2 3 1 The POSTGRES Type Systemi etel ette t eftt det 18 2 3 2 About the POSTGRES System Catalogs eee 18 ZA EXTENDING SQL FUNCTIONS t eese tete e eere nno entente ce ae ee ennemie 21 2 4 1 Query Language SQL Functions sise 21 2 4 1 1 SQL Functions on Base Types iii e P E PUR EO EE e e ne ee ec epus 21 2 4 1 2 SQL Functions on Composite Types Giese Jostens 08h soe suck suse dusk eie ARENA eee tude date nd oet nice gn 21 2 4 2 Programming Language Functions sisi 23 2 4 2 1 Programming Language Functions on Base Types 23 2 4 2 2 Programming Language Functions on Composite Types 25 DAD Be Caveats E 26 2 5 EXTENDING SOL TYPES Leti err E EARS TREE RAN ECERADS em enter en nine RES r ERAS tu 27 2 5 1 Functions Needed for a User Defined Type siennes 27 2 6 BXTENDINCESQE OPERATORS 5 nhe mer a Dr RE ever m EN Enid 29 2 7 EXTENDING SOL AGGREGATED seerne seeni A EEn ERE EN inicia iii 29 2 8 INTERFACING EXTENSIONS TO INDICES nre araa ra ria a A A EEE AE 31 2 9 THE POSTGRES RULE SYSTEM re tenter ides Ye EX NIS EREE EKERI EASE KE PN PE REN BEES ESS 36 3 MARIPOSA vr 37 3 1 MARIPOSA MODULES 333305 idi aped e ti tele rear tele p re eei
45. accuracy HEH HE T FE FE E HE HH EE FE FE FE HE TE HE FE FE FE HE FE HE HE FE FE HE HE TE EE EE E FE EE HE E FE EE d T ST T E E E E E E Global variables set BID 1 set REFUSETOBID 0 DR E NR ee a SNC E US ca ein EE AA TA ee oe ee LoadAverage Utility routine Input void Output 5 30 and 60 second load averages O ae EE A oe SR SES RES eS E proc LoadAverage set result exec uptime set len llength Sresult set result lrange result expr Slen 3 expr Slen 1 regsub all result res2 return Sres2 Mariposa User Manual v 1 0 68 CombineBids Input two bids bidl and bid2 Output bid that results from combining bidl and bid2 response BID if both bidl and bid2 responses are BID REFUSETOBID otherwise cost bidl cost bid2 cost delay bidl delay bid2 delay staleness MAX bidl staleness bid2 staleness accuracy MIN bidl accuracy bid2 accuracy proc CombineBids bidl bid2 global BID REFUSETOBID set set set set set set set set set set set set set set set respon respon costi cost2 delay delay2 stale stale2 accl acc2 sel lindex bid1 0 se2 lindex bid2 0 lindex bid1 1 lindex bid2 1 lindex bidl 2 lindex bid2 2 lindex bidl 3 lindex bid2 3 lindex bidl 4 lindex bid2 4 response expr responsel amp amp response2 BID SR
46. ager in San Francisco submits query using frontend application Mariposa User Manual v 1 0 e In the short protocol the query broker uses information collected from the name server to decide which sites will process the query It does not contact the processing sites After the query broker has specified the processing sites the backend s coordinator The coordinator notifies the remote sites to begin processing collects the results and returns the answer to the client program module takes over Query SELECT FROM WIDGETS CLEAR DEA 2 Frontend passes query to Mariposa process 3 San Francisco site contacts name server SELECT FROM wocers PARSER 4 Fragmented query plan is passed to query broker OPTIMIZER FRAGMENTER San Francisco 7 Query Broker selects processing sites and passes complete plan to coordinator MERGE SCANWIDGETS D SCAN WIDGETS2 QUERY BROKER MERGE in San Francisco SCAN WIDGETS1 SCAN WIDGETS2 in New York in Miami COORDINATOR Figure 1 Mariposa Architecture Example 6 Bidders at send back bids processing sites 5 Query broker sends requests for bids to processing sites in New York and Miami low much H
47. and produces a plan that reflects the underlying fragmentation of the tables In this section we describe how different fragmented plans can be produced from one single site plan and how to control the fragmentation process 3 6 1 1 Fragmented Query Plans After the fragmenter accepts a query plan from the single site optimizer it descends to the leaves of the plan tree which represent scans over base relations For example the query SELECT from EMP DEPT where EMP deptno DEPT no may be converted into the plan tree shown below in Figure 3 by the single site optimizer before fragmentation The base relations are unindexed so a sequential scan SS is used The data from each sequential scan is sorted by department number into a temporary relation which is scanned before performing the join Mariposa User Manual v 1 0 48 JOIN SS TEMP 1 SS TEMP2 SORT SORT SS EMP SS DEPT Figure 3 Unfragmented Query Plan As explained in Section 3 2 2 base relations in Mariposa can be partitioned horizontally into data fragments Each data fragment contains some fraction of the tuples in the base relation Together the data fragments represent the entire base relation Each of the scans in the plan tree is divided into one or more scans one for each data fragment In our example the EMP relation is fragmented into EMP1 EMP2 and EMP3 Similarly the DEPT relation is fragmented into DEPT1 and DEPT2 Th
48. and 100 If the number is greater than the fragmentation factor the fragmenter inserts a merge node If it is less than the fragmentation factor it does not Setting the fragmentation factor to 0 minimum parallelization guarantees that the fragmenter will produce a plan like Fragmented Plan A Setting it to 100 maximum parallelization guarantees a plan like Plan C Setting p to a value between 0 and 100 will result in the fragmenter producing a plan somewhere in between Plan A and Plan C such as Plan B 3 6 2 The Query Broker The Mariposa query broker is responsible for determining the sites at which different pieces of a query will be processed The query broker attempts to solve a user s query as far under the user s bid curve as possible First it divides the query plan up into plan chunks Then it contacts processing sites using either the short or long protocol If the SS TEMP 5 Mariposa User Manual v 1 0 50 short protocol is used the query broker selects the processing site for each plan chunk without contacting the sites first If the long protocol is used the query broker solicits bids from several processing sites for each plan chunk then selects the group of bids that will solve the query and be as far under the bid curve as possible In this section first we explain bid curves Then we will discuss plan chunks Finally we discuss the short and long protocols 3 6 2 1 Bid Curves A bid curve is a line in two dim
49. ata base At some point you may find that you wish to move one or more databases to another location e g to a filesystem with more free space If you wish to move all of your databases to the new location you can simply 1 2 Kill the postmaster Copy the entire data directory to the new location Making sure that the new files are owned by user postgres 9 cp rp usr local postgres95 data new place data Reset your PGDATA environment variable as described earlier in this manual and in the installation instructions E using csh or tcsh setenv PGDATA new place data oe using sh ksh or bash PGDATA new place data export PGDATA oe Restart the postmaster postmaster amp After you run some queries and are sure that the newly moved database works you can remove the old data directory rm rf usr local postgres95 data To install a single database in an alternate directory while leaving all other databases in place do the following 1 Create the database if it doesn t already exist using the createdb command In the following steps assume the database is named foo Kill the postmaster Copy the directory usr local postgres95 data base foo and its contents to its ultimate destination It should still be owned by the postgres user 9 cp rp usr local postgres95 data base foo new place foo Remove the directory usr local postgres95 data base foo
50. cost lindex bid 1 set cost expr cost 1 1a set bid lreplace bid 1 1 cost return bid
51. dding procedures yet just understand that there must be a set of these procedures for int2 int4 oid and every other data type on which a B tree can operate Sometimes strategies aren t enough information for the system to figure out how to use an index Some access methods require other support routines in order to work For example the B tree access method must be able to compare two keys and determine whether one is greater than equal to or less than the other Similarly the R tree access method must be able to compute intersections unions and sizes of rectangles These operations do not correspond to user qualifications in SQL queries they are administrative routines used internally by the access methods In order to manage diverse support routines consistently across all POSTGRES access methods pg am includes an attribute called amsupport This attribute records the number of support routines used by an access method For B trees this number is one the routine to take two keys and return 1 0 or 1 depending on whether the first key is less than equal to or greater than the second The amstrategies entry in pg am is just the number of strategies defined for the access method in question The procedures for less than less equal and so on don t appear in pg am Similarly amsupport is just the number of support routines required by the access method The actual routines are listed elsewhere The next class of interest is p
52. e How long New York 8 Coordinator notifies processing sites to perform work Miami 9 Each processing site passes query fragment to local single site DBMS This release of Mariposa is based on the POSTGRES extended relational database management system We have included a description of the version of POSTGRES distributed with Mariposa in the next section Mariposa User Manual v 1 0 2 POSTGRES The single site database engine distributed with Mariposa is POSTGRES The version of POSTGRES distributed with Mariposa is a pre alpha release of POSTGRES95 Not all of the features of POSTGRES95 are implemented in this version We are planning on releasing a version of Mariposa with support for POSTGRES95 in the future The commands and keywords listed in Table 1 are not supported by the version of POSTGRES released with Mariposa words if they exist We have listed equivalent commands and key Command or Key Word Description Equivalent ASC DESC Ascending Descending key USING lt for ASC words in ORDER BY clause USING gt for DESC CAST Used to typecast constants or i operator parameters COMMIT ROLLBACK Transaction commit rollback none CREATE DATABASE Create a new database CREATEDB DROP DATABASE Destroy a database DESTROYDB DELIMITERS Denotes delimiters between fields none in COPY state
53. e fragmenter starts by dividing the sequential scan of EMP into sequential scans of EMP1 EMP2 and EMP3 and similarly for DEPT Each of these sequential scans produces a tuple stream Tuple streams in Mariposa are merged together into a single stream by inserting a merge node above them in the plan tree The merge node takes multiple tuple streams as input and produces one stream as output The fragmenter can produce different plans from one unfragmented plan by inserting merge nodes at different places in the plan tree In our example any of the fragmented plans shown below in Figure 4 could be produced by the fragmenter In Fragmented Plan A the fragmenter has placed merge nodes directly above the fragmented sequential scans In Fragmented Plan B the sequential scans are first sorted then merged together Since the join in our example requires the input streams to be sorted the merge nodes must maintain the sorted order and so are merge sort nodes in this plan In Fragmented Plan C each pair of data fragments is scanned sorted and joined together and the resulting streams are merged together The placement of merge nodes affects the number of nodes in the fragmented plan and thereby affects the potential for parallel execution of the plan In Figure 4 Plan A has fewer nodes than Plan B which has fewer than Plan C Each of the nodes in Plan A represents more work than each of the nodes in Plan B with the exception of the sequential scans
54. e to get simple answers Before executing any of the queries below be sure to Mariposa User Manual v 1 0 64 execute the POSTGRES vacuum command The queries will run much more quickly that way Also note that these queries are also listed in usr local postgres95 tutorial syscat sql You can use cut and paste or the Ni command instead of doing a lot of typing This query prints the names of all database administrators and the name of their database s SE EC FR WHERE ORDER T usename datname pg_user pg_database usesysid int2in int4out datdba BY usename datname This query lists all user defined classes in the database SELECT relname FROM pg_class WHERE relkind r not indices and relname pg not catalogs and relname Inv not large objects ORDER BY relname This query lists all simple indices i e those that are not defined over a function of several attributes SEL EC a a FRO tt T bc relname AS class_name ic relname AS index_name name pg_class bc base class pg class ic index class pg index i pg at WHERE and and and and ORD tribute a att in base i indrelid bc oid i indexrelid ic oid i indkey 0 a attnum L l a SE attrelid bc oid i indproc 0 oid no functional indices ER BY class name index name attname This query prints a
55. ee comparison functions work They are named analogously to bt char16cmp substitute the appropriate type in place of char16 3 2 2 Splitting a Class into Fragments Now that we have created the WIDGETS class with the proper partitioning mode the class can be populated just as in Section 2 1 2 with the INSERT statement INSERT INTO WIDGETS VALUES 1 New York 500 1500 300 Or with the COPY command COPY WIDGETS FROM src tutorial widgets txt Now we can split the table into two fragments using the SPLIT FRAGMENT statement SPLIT FRAGMENT WIDGETS INTO WIDGETS MI WIDGETS NY AT Miami The SPLIT FRAGMENT statement creates two new relations which together make up the original relation Using key based partitioning all the tuples in which LOCATION lt Miami go in WIDGETS MI All the tuples in which LOCATION gt Miami go in WIDGETS NY This effectively splits the tuples so that the Miami tuples go in WIDGETS MI and the New York tuples go in WIDGETS NY The original relation can be queried like before SELECT FROM WIDGETS Mariposa User Manual v 1 0 PART_NO LOCATION ON_HAND ON_ORDER COMMITTED 1 ew York 500 1500 300 2 ew York 3000 p 0 1000 3 iami 10000 5000 8000 4 iami 85
56. emote sites to begin processing collecting the results and returning the answer to the client program Each Mariposa server site contains a bidder module which is part of the site manager process The bidder explained in Section Error Reference source not found responds to requests for bids from the query broker When a bidder receives a request to bid on part of a query it may either refuse to bid or return a bid to the query broker The bid includes the price to perform the work and a time bound within which the work must be completed If a bidder bids then it must process the query if it is chosen by the query broker to do so Winning bids must sooner or later be processed To execute these queries the site manager allocates an idle backend to it The number of backends controls the multiprocessing level at each site and may be adjusted as conditions warrant The local backend sends the results of the subquery to the site executing the next part of the query or back to the coordinator process Each Mariposa server site also includes a data broker The data broker was not mentioned in the example in Section 1 3 The data broker is called after each query is run whether the query originated at a remote site or locally Based on data access patterns and space considerations it engages in buying and selling fragments with data brokers at other Mariposa sites See Section 3 5 The behavior of the bidder and data broker processes are con
57. ensions cost and delay Cost can be in any unit and delay is in seconds In this discussion we will use dollars as the cost unit By defining the bid curve a Mariposa user specifies how much money he or she will pay to receive an answer within a given amount of time A user defines a bid curve using the SQL extension SET BIDCURVE ET BIDCURVE costl delayl cost2 delay2 un costl delayl and cost2 delay2 are two points which define the bid curve For example if the user were willing to pay 100 for an answer within five seconds and nothing for an answer after one minute he or she would use the command SET BIDCURVE 100 5 0 60 Which would define the curve shown below 100 cost 60 delay If the user wanted to specify that he or she was willing to pay a maximum of twenty dollars no matter how long the query took to be processed the command would be SET BIDCURVE 20 0 20 60 And would result in a bid curve like the one below 100 cost delay Mariposa User Manual v 1 0 51 3 6 2 2 Plan Chunks After the query broker accepts the fragmented plan from the fragmenter the first thing it does is to divide the fragmented plan into a set of non overlapping subplans called plan chunks Each plan chunk is sent out whole to potential processing sites Dividing a plan into many small chunks increases the potential for parallel and pipelined execution of the plan while dividing it into a f
58. es nTuples Mariposa User Manual v 1 0 if SleftTuples 0 set leftTuples 10000 set nTuples 10000 set cost expr leftTuples perTupleCharge set delay expr leftTuples SperTupleDelay set bid SBID cost delay 0 0 0 0 set bid CombineBids leftSubBid bid return bid UNKNOWN T Don t bid on plans that contain nodes we can t identify proc UNKNOWN nodeNum leftTree rightTree junk global BID REFUSETOBID return list SREFUSETOBID 0 0 0 0 CostBasedBid Input query plan Output bid Main procedure Looks at token representing the node type and calls the appropriate bidding routine proc CostBasedBid plan global rtable global hostid global contract global nTuples global nPages global BID REFUSETOBID if plan j set nodeType lindex Splan 0 set bid nodeType lindex plan 1 lindex plan 2 lindex Splan 3 lindex plan 4 eise set bid list BID 0 0 0 0 return bid Mariposa User Manual v 1 0 GetQueryBid Input query plan Output bid Main procedure Calls CostBasedBid and multiplies result by current load average proc GetQueryBid plan global rtable global hostid global contract global nTuples global nPages global BID REFUSETOBID set bid CostBasedBid plan set las exec uptime set len llength las set result lrange 1as expr Slen 3 expr Slen 1 regsub all las las set la lindex 1as 2 set
59. essed set perPageDelay 002200 Scan on a temporary relation the result of a sort join etc Just use the values of nTuples and nPages generated so far if scanIndex 1 set nTuples 10000 set nPages 100 set cost expr nTuples SperTupleCharge nPages SperPageCharge set delay expr nTuples perTupleDelay nPages SperPageDelay set bid CombineBids SBID cost delay 0 0 0 0 CostBasedBid leftTree return bid else Scan on a base relation set nTuples and nPages from rtable information Only bid if the fragment is stored at this site set rte lindex rtable scanIndex set frags lindex rte 3 set fInfo lindex frags fraglndex Determine if one of the storage sites is this one set storageSites lindex fInfo 4 set local false foreach site storageSites set storageHost lindex site 2 if storageHost Shostid set local true break Mariposa User Manual v 1 0 72 set nTuples lindex fInfo 3 set nPages lindex fInfo 2 If sequential scan is over a fragment that we own bid on it Otherwise subcontract out the sequential scan to another site if 10cal set response BID set cost expr nTuples SperTupleCharge nPages SperPageCharge set delay expr nTuples perTupleDelay nPages SperPageD elay eise set subPlan SEQSCAN nodeNum scanIndex fragIndex set subBid subcontract subPlan contract set response lindex subBid 0 set
60. ew large chunks decreases potential parallelism and pipelining Continuing with the example from Section 3 6 1 if the fragmenter produced Fragmented Plan B from Figure 4 Figure 5 shows three of the possible groups of plan chunks the query broker could produce In Plan B 1 each plan chunk consists of exactly one node in the plan tree Therefore each node in Plan B 1 could be processed by a different server In Plan B 2 the plan chunks are larger the sequential scan nodes are grouped with the sort nodes and the sequential scans on temporary relations The join node is grouped with the merge sort nodes In Plan B 3 the entire query is grouped into one large plan chunk Therefore the entire query will be sent to potential processing sites The size of plan chunks produced by the query broker can be set by the command set chunksize lt chunk factor gt where lt chunk factor gt is an integer between O and 100 inclusive As the query broker moves from the leaves of the fragmented plan tree towards the root it generates a random number between 0 and 100 at each step If the number generated is greater than chunk factor the plan is cut off at that point and a plan chunk is created If it is less than or equal the process continues If chunksize is set to 100 coarsest the user is guaranteed to get a plan like B 3 in Figure 5 If granularity is set to O finest the user will get a plan like B 1 If it is set in between the user will get a
61. expr mergeTuples SperTupleDelay set bid CombineBids Sbid list BID cost delay 0 0 0 0 return bid XIN Input nodeNum leftTree Output bid Charges a fixed price per tuple proc XIN nodeNum leftTree junk junk2 32 global BID REFUSETOBID global nTuples set perTupleCharge 001 set perTupleDelay 000400 set leftSubBid CostBasedBid leftTree set leftTuples nTuples if SleftTuples 0 Mariposa User Manual v 1 0 set leftTuples 10000 set nTuples 10000 set cost expr leftTuples perTupleCharge set delay expr leftTuples SperTupleDelay set bid SBID cost delay 0 0 0 0 set bid CombineBids leftSubBid bid return bid Input nodeNum left subtree Output bid proc AGG nodeNum leftTree junk junk2 global BID REFUSETOBID global nTuples set perTupleCharge 001 set perTupleDelay 000400 set leftSubBid CostBasedBid leftTree set leftTuples nTuples if SleftTuples 0 set leftTuples 10000 set nTuples 10000 set cost expr leftTuples perTupleCharge set delay expr leftTuples SperTupleDelay set bid SBID cost delay 0 0 0 0 set bid CombineBids SleftSubBid bid return bid GROUPBY Input nodeNum left subtree Output bid proc GROUPBY nodeNum leftTree junk junk2 0 global BID REFUSETOBID global nTuples set perTupleCharge 001 set perTupleDelay 000400 set leftSubBid CostBasedBid leftTree set leftTupl
62. extensible type system inheritance functions and production rules These extensions were in the original POSTGRES query language POSTQUEL This manual provides an introduction to POSTGRES SQL There are numerous books on SQL such as MELT93 or DATE93 consult them for a more detailed analysis of SQL Moreover many of the features of POSTGRES SQL are not part of the ANSI standard The following examples assume that you have installed Mariposa on at least one site and that you have created a database You must also have the site manager running at the site where you are issuing queries See the Installation and Setup Manual The examples in this manual can be found in src tutorial Refer to the README file in that directory for detailed instructions To start the tutorial enter these statements cd src tutorial mpsql database name Mariposa User Manual v 1 0 10 The following message will appear Welcome to the Mariposa interactive sql monitor type M for help on slash commands type Mq to quit type g or terminate with semicolon to execute query You are currently connected to the database lt database name gt lt database name gt gt Mi basics sql The Ni command reads in queries from the specified files The s option starts single step mode which pauses before sending a query to the backend Queries in this section are in the file basics sql 2 1 1 Creating a New Class One of the fundamental concepts in POSTGRE
63. ey generally correspond to what are often known as abstract data types POSTGRES can only operate on these types through methods provided by the user Furthermore POSTGRES understands the behavior of such types only to the extent that the user describes them Composite types are created whenever a user creates a class WIDGETS is an example of a composite type POSTGRES stores all instances of these types in a file Information about the attributes of composite types are stored in one of the POSTGRES system catalogs pg attribute and can by queried like any other table POSTGRES base types are further divided into built in types and user defined types Built in types like int4 are those that are compiled into the system and distributed along with the source code User defined types as the names suggests are defined by the user These are described in detail in Section 2 5 2 3 2 About the POSTGRES System Catalogs All system catalogs have names that begin with pg The following classes contain information that may be useful to the end user There are other system catalogs but there should rarely be a reason to query them directly Mariposa User Manual v 1 0 19 catalog name description pg_database databases pg_class classes pg_attribute class attributes pg_index secondary indices pg_proc procedures both C and SQL pg type types both base and complex pg ope
64. form NODETYPE NODENUM LEFTTREE RIGHTTREE where NODETYPE is a string representing the operation NODENUM is a unique identifier for the node and LEFTTREE and RIGHTTREE are the left and right subplans respectively LEFTTREE and RIGHTTREE may be empty If RIGHTTREE is empty LEFTTREE is also empty and the node is a leaf node If there is a single subplan it will be in LEFTTREE There is one exception to the above format Merge nodes which were first discussed in Section may have more than two children The format for merge nodes is NODETYPE NODENUM CHILDi CHILD2 CHILD where NODETYPE is MERGE The different node types and their explanations are listed in Table 10 recognizes if the node type isn t one Lt Mariposa User Manual v 1 0 55 Node Name Explanation Node Format MERGEJOIN Merge Join ERGEJOIN NODENUM LEFT TREE RIGHT TREE NESTEDLOOP Nested Loop Join NESTEDLOOP NODENUM LEFT TREE RIGHT TREE SEQSCAN Sequential Scan If over a base relation SEQSCAN NODENU RTABLE INDEX FRAG INDEX LEFT TREE If over a temporary
65. g opclass This class exists only to associate a name with an OID In pg amop every B tree operator class has a set of procedures one through five above Some existing opclasses are int2 ops int4 ops and oid ops You need to add an instance with your opclass name for example complex abs ops topg opclass The OID of this instance is a foreign key in other classes 6 Strictly speaking this routine can return a negative number or a non zero positive number Mariposa User Manual v 1 0 33 INSERT INTO pg opclass opcname VALUES complex_abs_ops SELECT oid opcname FROM pg opclass WHERE opcname complex abs ops oia opcname 17314 complex_abs_ops Note that the OID for your pg_opclass instance will be different You should substitute your value for 17314 wherever it appears in this discussion So now you have an access method and an operator class But you still need a set of operators the procedure for defining operators was discussed earlier in this manual For the complex_abs_ops operator class on B trees the required operators are e absolute value less than absolute value e less than or equal absolute value e equal absolute value e greater than or equal absolute value e greater than Suppose the code that implements the functions defined is stored in the file usr local mariposa src tutorial complex c Part of the code look like this note that you
66. gument For example take the function double salary that computes what your salary would be if it were doubled CREATE FUNCTION double salary EMP RETURNS int4 AS SELECT 1 salary 2 AS salary LANGUAGE sql SELECT name double salary EMP AS dream FROM EMP WHERE EMP dept toy name dream Sam 2400 Notice the use of the syntax 1 salary Usually you can use the notation attribute class and class attribute interchangeably this is the same as SELECT EMP name AS youngster FROM EMP WHERE EMP age lt 30 SELECT name EMP AS youngster FROM EMP WHERE age EMP lt 30 youngster sam However this is not always the case Function notation is important when you want to create a function that returns a single instance of a complex type You do this by assembling the entire instance within the function attribute by attribute This is an example of a function that returns a single EMP instance CREATE FUNCTION new_emp RETURNS EMP AS SELECT None text AS name 1000 AS salary 25 AS age NV none V char16 AS dept LANGUAGE sql In this case you have specified each of the attributes with a constant value but any computation or expression could have been substituted for these constants
67. if you have an attribute name which is of the type char16 the GetAttributeByName call would look like char str str char GetAttributeByName t name amp isnull The following query lets POSTGRES know about the c_overpaid function CREATE FUNCTION c_overpaid EMP int4 RETURNS bool AS usr local postgres95 tutorial obj funcs so LANGUAGE ve fs While there are ways to construct new instances or modify existing instances from within a C function these are far too complex to discuss in this manual 2 4 2 3 Caveats This section discusses the more difficult task of writing programming language functions Be warned this section of the manual will not make you a programmer You must have a good understanding of C including the use of pointers and the malloc memory manager before trying to write C functions for use with POSTGRES While it may be possible to load functions written in languages other than C into POSTGRES this is often difficult when it is possible at all because other languages such as FORTRAN and Pascal often do not follow the same calling convention as C That is other languages do not pass argument and return values between functions in the same way For this reason this discussion assumes that your programming language functions are written in C The basic rules for building C functions are as follows e Most of the header include files for POSTGRES should already be installed in
68. ify unique instances of other catalogs e There are many attributes and relationships that have obvious meanings but there are many that do not for example those that have to do with access methods The relationships between pg am pg amop pg amproc pg operator and pg opclass are particularly hard to understand are in This manual uses the words procedure and function more or less interchangably Mariposa User Manual v 1 0 20 depth in the section on interfacing types and operators to indices following the discussion of basic extensions that is a indicates an alternate primary key L_ non oid primary key unique identifier that may be used to identify an object non key Figure 2 The major POSTGRES system catalogs
69. ime staleness Reserved for future use For now it is sufficient to return the value 0 0 accuracy Reserved for future use For now it is sufficient to return the value 0 0 Table 8 The bidder s response To change the bidding policy simply redefine the procedure GetQueryBid To force the bidder to reload bidder tcl issue the Tcl command ReInitBidder from the site manager s TC1DMT prompt We have provided some global variables available to GetQueryBid which may be useful in formulating the bid These are described in Table 9 Mariposa User Manual v 1 0 54 Variable Name Value hostid An integer identifying the ID of the machine the bidder is running on contract The unique ID assigned to this contract by the Site Manager plan A string representing the plan tree rtable A Tel list which contains information about the relations and fragments accessed in the plan Table 9 These global variables are made available to GetQueryBid GetQueryBid may define other global variables and store data in them These globals will hold their values across future calls to GetQueryBid and other procedures in this TcIDMT interpreter 3 6 3 2 The plan and rtable global variables The plan and rtable variables require some explanation The variable plan is a string that represents the plan tree for which the bidder is being asked to formulate a bid It is a recursive list of the
70. in this section can be found in funcs sql and C code funcs c 2 4 1 Query Language SQL Functions Query language functions can be input by a POSTGRES user from the command line and stored in the database They require no programming experience aside from SQL 2 4 1 1 SQL Functions on Base Types The simplest possible SQL function has no arguments and simply returns a base type such as int4 CREATE FUNCTION one RETURNS int4 AS SELECT 1 as RESULT LANGUAGE sql SELECT one AS answer answer L1 Notice that the function definition included a target list with the name RESULT but the target list of the query that invoked the function overrode the function s target list Therefore the result is labelled answer instead of one It s almost as easy to define SQL functions that take base types as arguments In the example below notice how arguments within the function are referred to as 1 and S ME CREATE FUNCTION add em int4 int4 RETURNS int4 AS SELECT 1 2 LANGUAGE sql SELECT add_em 1 2 AS answer answer L3 2 4 1 2 SQL Functions on Composite Types When specifying functions with arguments of composite types such as EMP you must not only state which argument you want as you did above with 1 and 2 Mariposa User Manual v 1 0 22 but also the attributes of that ar
71. inly be different In this example the operators you are interested in are those with OIDS 17321 through 17325 The values you actually get will probably be different and you should substitute them for the values below Look at the operator names and pick out the ones you just added Now you re ready to update pg_amop with our new operator class The operators should be ordered from less than through greater than in pg amop Add the required instances INSERT INTO pg amop amopid amopclaid amopopr mopstrategy mopselect amopnpages ELECT am oid opcl oid c opoid 3 btreesel regproc btreenpage regproc 409090 FROM pg am am pg opclass opcl complex ops tmp c WHERE amname btree and opcname complex abs ops and c oprname Note the order less than is 1 less than or equal is 2 equal is 3 greater than or equal is 4 and greater than is 5 The last step is registration of the support routine previously described in our discussion of pg am The OID of this support routine is stored in the pg amproc class keyed by the access method oid and the operator class oid First you need to register the function in POSTGRES recall that you put the C code that implements this routine in the bottom of the file where you implemented the operator routines CREATE FUNCTION int4 abs cmp int4 int4 RETURNS int4 AS usr local postgres95 tutorial obj complex so LANGUAG
72. ions about moving data around One variable is called host id and identifies the local hostid This is necessary for the data broker to examine whether data is stored locally or remotely The other Tcl variable made available to the data broker is called ragments Itis a Tcl list which describes all of the data fragments accessed in the query that just ran There is one entry in the list for each class accessed in the query Each entry is of the form classid logicalid storeid pages tuples loci loco locn The items are explained in Table 7 The list will contain information about all fragments of all classes accessed in the query whether the query originated locally or not This gives data brokers running at all sites access to information about which fragments are being used most often Mariposa User Manual v 1 0 46 Item Description classid The OID Object Identifier of the fragment s class Shared by all copies of all fragments of a class logicalid The OID shared by all copies of the fragment storeid The OID of the fragment at the site where it was accessed for the current query Unique to this copy of the fragment pages The size of the fragment in disk pages tuples The size of the fragment in tuples loc loc loc The hostid s of the storage locations of all copies of the fragment locl is the hostid of the location where the fragment was accessed for the current query
73. is a single site optimizer It produces a plan as if all the data resided at a single site The single site plan is then passed to the backend s fragmenter module The fragmenter produces a fragmented query plan which reflects the fragmentation of the tables referenced in the query and is parallelizable to a greater or lesser degree The fragmenter is described in Section 3 3 The parser optimizer and fragmenter use information from a name server module running at the same or a different site The fragmenter passes the fragmented plan to the backend s query broker The query broker is explained in detail in Section Error Reference source not found A Mariposa user allocates a budget to each query The goal of the query broker is to select sites to process the query within the allotted budget The query broker decides which Mariposa site will process each node in the query plan by following one of two protocols e In the long protocol the query broker contacts the site manager s bidder module at each potential processing site The broker waits for responses from the bidders before selecting the best ones Mariposa User Manual v 1 0 38 e In the short protocol the query broker uses information collected from the name server to select the processing sites thereby avoiding the cost of contacting many remote sites After the query broker has specified the processing sites the backend s coordinator module takes over notifying the r
74. is discussed in ONG90 as well as STON90b Mariposa User Manual v 1 0 37 3 MARIPOSA This section introduces the Mariposa architecture in more detail and extends the examples from Section 2 1 to include distribution We show how to split a Mariposa class into fragments and how to move data manually from one site to another We show how Mariposa processes queries over remote data using information from the name server 3 1 Mariposa Modules Mariposa consists of the following cooperating processes 1 A single site manager daemon which supervises the backends 2 One or more backend database server processes 3 A client frontend process Users who are familiar with POSTGRES will recognize similarities between the site manager and backend processes and the postmaster and postgres processes on which they were based Referring back to Section 1 3 we briefly summarize the example illustrated in Figure 1 The client program issues queries to the Mariposa system Queries are expressed in a version of Standard Query Language See Section 2 1 for a description of SQL used with POSTGRES the single site database supplied with Mariposa The query is passed into an available backend by the site manager Inside the backend the query is passed through a parser and then through an optimizer which creates a query plan The query plan describes what operations will be performed to process the query and in what order The optimizer used in Mariposa
75. ixed length types of any size may be passed by reference For example here is a sample implementation of the POSTGRES char16 type 16 byte structure passed by reference typedef struct char data 16 char16 Only pointers to such types can be used when passing them in and out of POSTGRES functions Finally all variable length types must also be passed by reference All variable length types must begin with a length field of exactly 4 bytes and all data to be stored within that type must be located in the memory immediately following that length field The length field is the total length of the structure 1 e it includes the size of the length field itself You can define the text type as follows typedef struct int4 length char data 1 j text Obviously the data field is not long enough to hold all possible strings it s impossible to declare such a structure in C When manipulating variable length types be careful to allocate the correct amount of memory and initialize the length field For example if you want to store 40 bytes in a text structure you might use a code fragment like this include postgres h include utils palloc h char buffer 40 our source data text destination text palloc VARHDRSZ 40 destination length VARHDRSZ 40 memmove destination data buffer 40 Here are some examples of real functions Suppose funcs c look like finclude lt string h g
76. k you have another problem if it fails the system catalogs themselves were almost certainly corrupted and you will have to go back to your backups The postmaster does not usually crash it doesn t do very much except start servers but it does happen on occasion In addition there are a few cases where it encounters problems during the reinitialization of shared resources Specifically there are race conditions where the operating system lets the postmaster free shared resources but then will not permit it to reallocate the same amount of shared resources even when there is no contention You will typically have to run the ipcclean command if system errors cause the postmaster to crash If this happens you may find using the UNIX ipcs 1 command that the postgres user has shared memory and or semaphores allocated even though no postmaster process is running In this case you should run ipcclean as the postgres user in order to deallocate these resources Be warned that all such resources owned by the postgres user will be deallocated If you have multiple postmaster processes running on the same machine you should kill all of them before running ipcclean Mariposa User Manual v 1 0 62 otherwise they will crash on their own when their shared resources are suddenly deallocated 4 2 2 Moving Database Directories By default all POSTGRES databases are stored in separate subdirectories under usr local postgres95 d
77. l v 1 0 NESTEDLOOP Input left sub tree right sub tree Output bid Updates nTuples and nPages guesses one match for each outer tuple proc NESTEDLOOP nodeNum leftTree rightTree junk global BID REFUSETOBID global nTuples global nPages global rtable global hostid set perTupleCharge 001 set perTupleDelay 000400 set leftSubBid CostBasedBid leftT ree set leftTuples nTuples set leftPages nPages set rightSubBid CostBasedBid rightTree set rightTuples nTuples set rightPages nPages Each inner tuple is touched once per outer tuple set delay expr leftTuples SrightTuples perTupleDelay set cost expr leftTuples SrightTuples perTupleCharge Wild guess one match for each outer tuple set nTuples leftTuples set bid CombineBids leftSubBid rightSubBid set bid CombineBids Sbid list BID cost delay 0 0 0 0 return bid 70 Mariposa User Manual v 1 0 71 SEQSCAN Input scanIndex fragIndex left sub tree Output bid Updates nTuples and nPages based on information in range table proc SEQSCAN nodeNum scanIndex fragIndex leftTree global BID REFUSETOBID global contract global nTuples global nPages global rtable global hostid no extra charge per tuple set perTupleCharge 0 5 cents per page set perPageCharge 05 delay in seconds per tuple retrieved not including disk 1 0 set perTupleDelay 000600 delay in seconds per disk page acc
78. lect the delivery into New York of the widgets with PART_NO that were on order UPDATE WIDGETS The semantics of such a join are that the qualification is a truth expression defined for the Cartesian product of the classes indicated in the query For those instances in the Cartesian product for which the qualification is true POSTGRES computes and returns the values specified in the target list POSTGRES SQL does not assign any meaning to duplicate values in such expressions This means that POSTGRES sometimes recomputes the same target list several times this frequently happens when Boolean expressions are connected with an or To remove such duplicates you must use the SELECT DISTINCT statement Mariposa User Manual v 1 0 14 SET ON HAND ON HAND ON ORDER ON ORDER 0 WHERE PART NO 1 and LOCATION New York SELECT FROM WIDGETS WHERE PART NO 1 and LOCATION New York PART_NO LOCATION ON_HAND ON_ORDER COMMITTED i New York 2000 o 300 2 1 7 Deletions Deletions are performed using the delete command DELETE FROM WIDGETS WHERE ON_HAND 0 and ON_ORDER 0 and COMMITTED 0 All widgets with zero quantity on hand on order and sold are deleted In this example there are no such records so this DELETE statement has no effect Be wary of queries such as DELETE FROM WIDGETS
79. llowing principles in mind e Scalability to a large number of cooperating sites In a WAN environment there may be a large number of sites Our goal is to scale to 10 000 servers e Local autonomy Each site must have control over its resources This includes which objects to store and which queries to run Query and data allocation cannot be done by a central authoritarian query optimizer e Data mobility It should be easy and efficient to change the home of an object Preferably the object should remain available during movement e No global synchronization Updates and schema changes should not force a site to synchronize with all other sites Otherwise many common operations will have exceptionally poor response time e Easily configurable policies It should be easy for a local database administrator to change the behavior of a Mariposa site A Mariposa system should respond gracefully to changes in user activity and data access patterns to maintain low response time and high system throughput Mariposa User Manual v 1 0 6 1 2 Using This Manual This manual is divided into two main parts Section 2 contains a description of POSTGRES the single site database management system distributed as part of Mariposa Readers who are familiar with POSTGRES may want to skim these sections or skip over them entirely Section 3 describes the Mariposa system itself This manual assumes that you have already installed Mariposa successful
80. ly on all the sites in your system For information on how to download and install Mariposa see the Installation and Setup Manual 1 3 Overview of the Architecture In Mariposa all distributed DBMS issues query optimization data movement name service etc are reformulated in microeconomic terms Implementation of the economic paradigm involves a number of entities and mechanisms In this section we describe the architecture and process structure of Mariposa We begin with an example pictured in Figure 1 A company that sells widgets has offices in San Francisco Chicago New York and Miami The company s database includes a table called WIDGETS which contains pricing and inventory information on all the company s widgets The widgets are warehoused in New York and Miami so the company keeps half the WIDGETS table in New York and the other half in Miami In Mariposa splitting a table is called fragmentation and the pieces that make up a table are called fragments In the example the WIDGETS table is fragmented into WIDGETS1 and WIDGETS2 If the purchasing manager in the San Francisco office wanted to retrieve all the records from the WIDGETS table she would enter a query into a frontend application In SQL Standard Query Language she would enter SELECT FROM WIDGETS The site where a query is entered San Francisco in this case is the home site The purchasing manager s query is sent from the frontend application to the Mariposa pr
81. ment GRANT REVOKE Used for access control none PRIVILEGES PUBLIC EXPLAIN Explain optimizer choice of none query plan LIKE LIKE operator operator Table 1 Unsupported Commands and Key Words In addition to commands and key words the version of POSTGRES distributed with Mariposa has different built in types than POSTGRES95 The POSTGRES95 types and their equivalents are listed in Table 2 Mariposa User Manual v 1 0 9 POSTGRES95 Type Mariposa POSTGRES Type int integer smallint int2 int4 real float float4 float char length varchar length char length char16 date time abstime Table 2 POSTGRES 95 Types and their Equivalents The POSTGRES95 built in aggregates avg sum min and max have type specific equivalents in the POSTGRES distributed with Mariposa These are listed in Table 3 The built in aggregate count is the same in both versions POSTGRES95 Aggregate POSTGRES Type Specific Equivalents avg int4ave int2ave float4ave float8ave sum int4sum int2sum float4sum float8sum min int2min int4min float4min float8min max int2max int max float4max float8max Table 3 POSTGRES95 Aggregates and their Equivalents 2 1 THE QUERY LANGUAGE POSTGRES SQL This section provides an overview of how to use POSTGRES SQL to perform simple operations POSTGRES SQL is a variant of SQL 3 It has many extensions such as an
82. mi 9300 700 5000 4 ew York 3200 0 0 _6 ew York 1800 5000 1500 6 lami 11000 p 0 3000 You may specify any aribitrary expressions in the target list For example to list the number of widgets on order plus the number on hand you could type SELECT PART NO LOCATION ON ORDER ON HAND AS TOTAL OTY FROM WIDGETS PART NO LOCATION TOTAL QTY m ew York 2000 2 ew York 3000 3 iami 15000 4 iami 8500 3 ew York 4500 3 ew York 2000 2 iami 10000 4 ew York 3200 6 ew York 6800 6 iami 11000 Arbitrary Boolean operators e g and or and not are allowed in the qualification of any query For example S ELE E T FROM WIDGETS WHERE location Miami and ON HAND ON ORDER COMMITTED lt 8000 PART_NO LOCATION on_HaND ON_ORDER COMMITTED 3 Miami 10000 5000 8000 2 Miami 9300 700 5000 6 Miami 11000 o 3000 To specify the results of a select to be returned in a sorted order use ORDER BY Mariposa User Manual v 1 0 12 S ELE C T FROM WIDGETS ORDER BY LOCATION PART_NO LOCATION ON_HAND ON_ORDER COMMITTED 6 iami 11000 p oO 3000 2 iami 9300 700 5000 4 iami 8500 0 200
83. must also be reloaded by re executing the various CREATE commands You should give any new release a trial period in particular do not delete the old database until you are satisfied that there are no compatibility problems with the new software For example you do not want to discover that a bug in a type s input conversion from ASCII and output conversion to ASCII routines prevents you from reloading your data after you have destroyed your old databases This should be standard procedure when updating any software package but some people try to economize on disk space without applying enough foresight 4 3 Database Security Most sites that use POSTGRES are educational or research institutions and are generally not greatly concerned about security in their POSTGRES installations If desired you can install POSTGRES with additional security features such as the MIT Kerberos network authentication system Naturally such features come with additional administrative overhead that must be dealt with 4 3 1 Kerberos POSTGRES can be configured to use the MIT Kerberos network authentication system This prevents outside users from connecting to your databases over the network without the correct authentication information 4 4 Querying the System Catalogs From time to time you may want to find out what extensions have been added to a given database The queries listed below are canned queries that you can run on any databas
84. nitializing structures in C INSERT INTO SAL_EMP VALUES Bill 10000 10000 10000 10000 meeting lunch INSERT INTO SAL EMP VALUES Carol 20000 25000 25000 25000 talk consult meeting By default POSTGRES uses the one based numbering convention for arrays that is an array of n elements starts with array 1 and ends with array n The following query accesses a single element of an array at a time and retrieves the names of the employees whose pay changed in the second quarter SELECT name FROM SAL EMP WHERE SAL EMP pay by quarter 1 lt gt SAL EMP pay by quarter 2 name Carol The following query retrieves the third quarter pay of all employees SELECT SAL EMP pay by quarter 3 FROM SAL EMP You can also access arbitrary slices of an array subarrays The following query retrieves the first item on Bill s schedule for the first two days of the week pay by quarter 10000 25000 SELECT SAL _EMP schedule 1 2 1 1 FROM SAL_ WHERE SAL_EMP name Bill D Er schedule meeting Mariposa User Manual v 1 0 18 2 3 POSTGRES Extensibility This section discusses how to extend the POSTGRES SQL query language by adding functions types operators
85. ocols ertet er ttti A ne e Edad 53 3 6 3 The Bidder EE 53 36 31 Bidding iaa lect teen bea easton adde ma pa case edu des a e NE beet E 53 3 6 3 2 The plan and rt ble global variables onio LE RAR ee LR ete aa 54 3 6 3 3 The s bcontract Command isnie niri Me A Anne nthe nes maiden 56 3 6 3 4 Sample Bidder Script seen 56 4 ADMINISTERING POSTGRES AND MARIPOSA eeeeeeeee seen netten nettes tens stets sets sets se tnnue 58 Aol BREQUENT TASKS set ertet eroe eter dem mn er eee etes eee Pee tee odes et e ede EE 58 4 1 1 Starting the Site Manager icasi kenikir a Eia oaio T na tenes et tei PT oaia Tiek 58 4 1 2 Shutting Down the Postmaster een 59 4 1 3 Adding and Removing Users eee en nene nene enne entren 59 4 14 Periodic Upkeep aii cots toc de et et e e e the tat 59 LINE TI HE 59 4 2 INEREQUENT TASKS ertet teca sext E cute ares iet eee toe tet eere ee lese Ute E tees Ute xe aeuo een sent aene aet aee tue 60 4 2 1 Cleaning Up After Crashes inner neret nene nene nnne enne neret 60 4 2 2 Moving Database Directories ire en neret nne nnne enne nnne 62 4 2 3 Updating Databases tisser iini beet do Pee pedet doe iaa iek 63 4 3 DATABASE SECURITY ot ses e teet et eter t ire te etia ete tee aa 63 4 3 1 Ker Der OS EP E st ipui EESTE 63 4 4 QUERYING THE SYSTEM CATALOGS ccceesssseeecceeeeeesssneeeeeceecceccessssnaceeeeeeeesessaeeeeeeeeesesnaeeeesereeeeeaees 63 Mariposa User Manual v 1 0 5 1 INTRODUC
86. ogram running on the server in San Francisco The query is passed through a parser which checks for syntactic correctness and performs type checking an optimizer which produces a query plan that describes the order in which different steps in the plan will be executed and a fragmenter which changes the plan produced by the optimizer to reflect the data fragmentation The final result produced by the fragmenter is the fragmented query plan In order to do their work the parser optimizer and fragmenter need information about data types fragment location etc This information is maintained by a Mariposa name server In the example the name server is in the Chicago office The fragmented query plan describes the operations that will be performed in order to execute the query and the order in which they will be carried out In the example the purchasing manager s query SELECT FROM WIDGETS is represented by a query plan which scans the two WIDGETS fragments WIDGETS1 and WIDGETS2 and merges the result The fragmented query plan is passed to the query broker whose job it is to decide where each piece of the fragmented query plan will be executed The query broker uses one of two protocols e In the long protocol the query broker contacts the bidder module at each potential processing site The broker waits for responses from the bidders before selecting the best ones The long protocol is illustrated in Figure 1 1 Purchasing Man
87. ow to do it for your particular operating system 2 5 EXTENDING SOL TYPES As previously mentioned there are two kinds of types in POSTGRES base types defined in a programming language and composite types instances Examples in this section up to interfacing indices can be found in complex sql and complex c Composite examples are in funcs sql 2 5 1 Functions Needed for a User Defined Type A user defined type must always have input and output functions These functions determine how the type appears in strings for input by the user and output to the user and how the type is organized in memory The input function takes a null delimited character string as its input and returns the internal in memory representation of the type The output function takes the internal representation of the type and returns a null delimited character string Suppose you want to define a complex type which represents complex numbers Naturally you can represent a complex in memory as the following C structure typedef struct Complex double x double y Complex and a string of the form x y as the external string representation These functions are usually not hard to write especially the output function However there are a number of points to remember e When defining your external string representation remember that you must eventually write a complete and robust parser for that representation as your input function Complex
88. pg_amop pg_attribute amopid attrelid amopclaid attnum r pg_index amopopr atttypid indrelid amopselect indkey 8 amopnpages indproc 4 pg_class indpred oid r indexrelid relam f pg_proc p oid 3 4 pg type proname oid prorettype typrelid proargtypes 8 pg language typinput prolang oid typoutput typreceive Pl eto pg_operator typsend oid e pg_opclass oprname gt oid oprleft pg_am oprright oid le oprresult amgettuple oprcom L aminsert oprnegate H pg_amproc oprisortop Pa r amid amdelete E amgetattr oprrsortop F4 amopclaid ambeginscan oprcode amprocnum amrescan oprrest po ue amproc ioi amendscan oprjoin m ammarkpos amrestrpos ambuild KEY DEPENDENT INDEPENDENT foreign key refers to primary key Mariposa User Manual v 1 0 21 2 4 EXTENDING SOL FUNCTIONS An important part of defining a new type is the definition of functions that describe the type s behavior While it is possible to define a new function without defining a new type the reverse is not true POSTGRES SQL provides two types of functions query language functions functions written in SQL and programming language functions functions written in a compiled programming language such as C Both query language functions and programming language functions can take any type of variable as arguments and return any type This includes base types composite types or a combination of both Examples
89. plan similar to B 2 Mariposa User Manual v 1 0 52 JOIN MERGE SORT exce sor JOIN AAA MERGE SORT MERGE SORT SS TEMP1 SORT SS EMP1 MERGE SORT MERGE SORT SS TEMP1 SS TEMP2 TEMP3 TEMP 4 TEMP 5 SORT SORT SORT SORT SORT SS EMP1 SS EMP2 SS EMP3 SS DEPT1 SS DEPT2 Figure 5 Plan Chunks 3 6 2 3 Bid Protocols The query broker may follow either the short or long bidding protocol to determine the site s at which a query will be executed The bid protocol used by the query broker is set with the SET bidproto command SET bidproto short long 3 6 2 3 1 The Short Protocol In the short protocol the query broker sends each plan chunk to a single potential processing site In this case the query broker attempts to select the site most likely to have won the bidding process had the long protocol been used The query broker determines this based on advertising information and statistics it maintains about previous queries Once the query broker has determined which sites it will contact it returns the query plan back to the coordinator indicating which site is to be contacted to process each plan chunk The processing site may respond in one of two ways either by processing the subquery represented by the plan chunk or by refusing to do so In the current Mariposa implementation processing sites always agree to perform work requested by the que
90. r produce a stack trace to report the bug to someone else you can delete this file which is probably around 10MB 2 When one backend crashes in an uncontrolled way 1 e without calling its built in cleanup routines the postmaster will detect this situation kill all running servers and reinitialize the state shared among all backends e g the shared buffer pool and locks If your server crashed you will get the no response message shown above If your server was killed because someone else s server crashed you will see the following message I have been signalled by the postmaster Some backend process has died unexpectedly and possibly corrupted shared memory The current transaction was aborted and I am going to exit Please resend th last query The postgres backend Sometimes shared state is not completely cleaned up Frontend applications may see errors of the form WARN cannot write block 34 of myclass mydb blind In this case you should kill the postmaster and restart it When the system crashes while updating the system catalogs e g when you are creating a class defining an index retrieving into a class etc the B tree indices defined on the catalogs are sometimes corrupted The general and non unique symptom is that all queries stop working If you have tried all of the above steps and nothing else seems to work try using the reindexdb command If reindexdb succeeds but things still don t wor
91. r provides the information stored in the remote database catalogs to its clients by replicating the remote catalogs This replication is achieved by using the copy mechanism described in Section 3 3 A Mariposa name server is a regular Mariposa site which keeps read only copies of a subset of the system catalogs of other sites These copies are maintained by the update streams sent from the source sites to the name server as a consequence information obtained from a name server will always be out of date by a certain amount just like copies of regular data The set of sites whose system tables are on a name server is not fixed and does not have to include every existing site The DBA of a site autonomously determines if that site should also provide name service and which remote sites catalogs it should replicate One difference between catalog data and regular user data replication is that the name server site does not keep the catalog tables from each of the remote sites separated in its own local tables Instead the data of all the remote sites tables is merged into a single set of name server catalog tables In the current version of Mariposa only the data from the catalog tables pg class pg fragment and pg attribute are replicated they are stored in the name server catalog tables pg nsvcclass pg nsvcfrag and pg nsvcattr 3 4 1 Setting Up Name Service Every Mariposa site can be set up to be a Mariposa name server because ever
92. rator operators pg aggregate aggregates and aggregate functions Pg am access methods pg amop access method operators pg_amproc access method support functions pg_opclass access method operator classes The POSTGRES Reference Manual gives a more detailed explanation of these catalogs and their attributes However Figure 3 shows the major entities and their relationships in the system catalogs Attributes that do not refer to other entities are not shown unless they are part of a primary key This diagram becomes clear when you examine the catalogs contents and see how they relate to each other The main points are e Several of the following sections present various join queries on the system catalogs that display information needed to extend the system This diagram should make these join queries which are often three or four way joins more understandable because the diagram shows that the attributes used in the queries form foreign keys in other classes e Many different features i e classes attributes functions types access methods etc are tightly integrated in this schema A simple create command may modify many of these catalogs e Types and procedures are central to the schema Nearly every catalog contains some reference to instances in one or both of these classes For example POSTGRES frequently uses type signatures e g of functions and operators to ident
93. rocedures corresponding to the node types there Mariposa User Manual v 1 0 57 are two utility procedures CombineBids adds up two bids and LoadAverage returns the 5 30 and 60 second load averages The procedures which calculate a bid for each node type are similar in structure and function Each one first passes its children nodes to GetQueryBid and gets a bid back Then the delay and cost are calculated on a per tuple and per page basis In addition to calculating the cost and delay each procedure also updates the values of the global variables nTuples and nPages nTuples is an estimate of the number of tuples processed by the query node for which the procedure was called nPages is the same thing for the number of pages processed These two variables are internal to the example bidder they are not part of the required bidder interface like delay and cost The SEOSCAN procedure is more complicated than the others and we discuss it in more detail Like the other procedures it calculates cost and delay on a per tuple and per page basis However in addition to the arguments nodeNum and leftTree SEQSCAN takes two additional arguments scanIndex and fragIndex scanIndex indicates the element in the range table corresponding to the relation being scanned fragIndex indicates the element in the ragInfo list of the range table entry for the fragment being scanned They are used by SEOSCAN to access the correct entries in the global
94. rticular access method expects in this case a signed integer The final routine in the file is the support routine mentioned when you discussed the amsupport attribute of the pg_am class You will use this later on For now ignore it CREATE FUNCTION complex abs eq complex complex RETURNS bool AS usr local mariposa tutorial obj complex so LANGUAGE c Now define the operators that use them As noted the operator names must be unique among all operators that take two int4 operands In order to see if the operator names listed below are taken you can do a query on pg operator this query uses the regular expression operator to find three character operator names that end in the character amp S ELE E T FROM pg operator WHERE oprname amp text to see if your name is taken for the types you want The important things here are the procedure which are the C functions defined above and the restriction and join selectivity functions You should use just the ones used below note that there are different functions for the less than equal and greater than cases These must be supplied or the access method will crash when it tries to use the operator You should copy the names for restrict and join but use the procedure names you defined in the last step CREATE OPERATOR leftarg complex rightarg complex procedure complex_abs_eq
95. ry broker Mariposa User Manual v 1 0 53 3 6 2 3 2 The Long Protocol In the long protocol the query broker sends each plan chunk to a set of bidder sites which are potential processing sites Each bidder site responds with a bid which specifies the cost and delay required to process the subquery The query broker selects the best bid for each plan chunk and notifies the losing sites It then sends the query plan back to the coordinator as in the short protocol indicating the processing site for each plan chunk 3 6 3 The Bidder The Mariposa bidder module accepts requests from query brokers to bid on work Its job is to determine the amount that the site will charge to process the given query plan chunk and the expected processing time delay 3 6 3 1 Bidding The bidder s behavior is controlled entirely by a Tcl script much like the data broker When the site manager process starts up it looks for a file called bidder tcl in the directory PGDATA files bidder tcl contains the procedure GetQueryBid which should take no arguments and return a Tcl list of five elements response price delay staleness and accuracy as explained in Table 8 Element Description response Oor 1 1 Bid 0 Refuse to Bid price The price in dollars that this site will charge to process the query delay The time in seconds for the site to process the query from the time it starts processing Delay does not include network t
96. s that are located at an altitude over 500 feet the query is SELECT c name c altitude FROM cities c WHERE c altitude gt 500 2 Le the inheritance hierarchy is a directed acyclic graph Mariposa User Manual v 1 0 16 which returns name altitude Las Vegas 2174 Mariposa 1953 Madison 845 Here the after cities indicates that the query should be run over cities and all classes below cities in the inheritance hierarchy Many of the commands discussed so far select update and delete support this notation Other commands such as the alter command do as well 2 2 2 Time Travel POSTGRES supports time travel This feature enables you to run historical queries For example to find the current population of Mariposa city SELECT FROM cities WHERE name Mariposa name population altitude Mariposa 1320 1953 POSTGRES automatically finds the version of Mariposa s record valid at the current time You can also specify a time range For example to retrieve the past and present populations of Mariposa query SELECT name population FROM cities epoch now WHERE name Mariposa Here epoch indicates the beginning of the system clock If all of the examples have been executed thus far then the above query returns name population
97. ss you can use the UNIX ki11 1 command Some people habitually use the 9 or KILL option this should never be necessary and the POSTGRES group does not recommend that you do this because the postmaster will be unable to free its various shared resources its child processes will be unable to exit gracefully etc 4 1 3 Adding and Removing Users The createuser and destroyuser commands enable and disable access to POSTGRES by specific users on the host system 4 1 4 Periodic Upkeep 4 1 5 Tuning The vacuum command should be run on each database periodically This command processes deleted instances and more importantly updates the system statistics concerning the size of each class If these statistics are permitted to become out of date and inaccurate the POSTGRES query optimizer may make extremely poor decisions with respect to query evaluation strategies Therefore you should run vacuum every night or so perhaps in a script that is executed by the UNIX cron 1 or at 1 commands Perform frequent backups That is you should either back up your database directories using the POSTGRES copy command and or the UNIX dump 1 or tar 1 commands You may think Why am I backing up my database What about crash recovery One side effect of the POSTGRES no overwrite storage manager is that it is also a no log storage manager That is the database log stores only abort commit data and this is not enough information
98. ss for the B tree access method that sorts integers in ascending absolute value order The pg am class contains one instance for every user defined access method Support for the heap access method is built into POSTGRES but every other access method is described here The schema is described in Table 4 amname name of the access method amowner object id of the owner s instance in pg user amkind not used at present but set to o as a place holder amstrategies number of strategies for this access method see below amsupport number of support routines for this access method see below amgettuple aminsert procedure identifiers for interface routines to the access method For example regproc IDs for opening closing and getting instances from the access method appear here Table 4 pg am schema The object ID of the instance in pg am is used as a foreign key in lots of other classes You don t need to add a new instance to this class all you re interested in is the object ID of the access method instance you want to extend SELECT oid FROM pg am WHERE amname btree oia 403 The amstrategies attribute standardizes comparisons across data types For example B trees impose a strict ordering on keys lesser to greater Because POSTGRES allows the user to define operators POSTGRES cannot look at the name of Mariposa User Manual v 1
99. streams for read write replicas one from parent to child and one from child to parent Read only replicas only receive update streams from their parents 3 3 1 Creating a Copy To create a copy of a fragment in the Mariposa system use the COPY FRAGMENT command at the site requesting a copy the child site The site that owns the copy will be referred to as the parent site The COPY FRAGMENT command causes a request to be sent to the parent site to send a copy When the request has been processed the child site owns a fragment whose contents are the same as the parent as of the time of transfer A copy contract is set up at each site which will cause update streams to be sent back and forth The name of a copy is generated automatically by Mariposa The hexadecimal value of the fragment storage id of the copy which uniquely identifies it is appended to the first eight characters of the name of the parent fragment For example if a copy of the fragment WIDGET_MI were made the name of the copy would be something like WIDGET MOA3FBC23 The syntax for the COPY FRAGMENT command is COPY FRAGMENT READONLY fragment name FROM hostid UPDATE EVERY period fragment name is the name of the fragment at the parent site The parent site s hostid is hostid period is the amount of time in seconds that passes between update stream sendoffs If the READONLY option is specified then a read only copy is made 3 3
100. t include postgres h for charl6 etc include utils palloc h for palloc int add_one int arg return arg 1 charl6 concat16 charl6 argl charl6 arg2 char16 new_cl6 charl16 palloc sizeof charl6 memset void new_cl6 0 sizeof charl6 void strncpy new cl16 argl 16 return charl6 strncat new_c16 arg2 16 Mariposa User Manual v 1 0 25 text copytext text t VARSIZE is the total size of the struct in bytes EJ text new t text palloc VARSIZE t memset new t 0 VARSIZE t VARSIZE new t VARSIZE t VARDATA is a pointer to the data region of the struct E memcpy void VARDATA new_t destination void VARDATA t source VARSIZE t VARHDRSZ how many bytes return new_t On OSF 1 you type CREATE FUNCTION add one int4 RETURNS int4 AS usr local postgres95 tutorial ob3 funcs so LANGUAGE Ne s CREATE FUNCTION concat16 char16 char16 RETURNS charl6 AS usr local postgres95 tutorial ob3 funcs so LANGUAGE Se s CREATE FUNCTION copytext text RETURNS text AS usr local postgres95 tutorial ob3 funcs so LANGUAGE ar 2 On other systems you might have to make the filename end in s1 to indicate that it s a shared library 2 4 2 2 Programming Language Functions on Composite Types Composite types do not have a fi
101. t Scans using indices are much faster than sequential scans of the entire class e Run the vacuum command frequently This command updates the statistics that the query optimizer uses to make intelligent decisions if the statistics are inaccurate the system will make inordinately stupid decisions with respect to the way it joins and scans classes e When specifying query qualfications 1 e the where part of the query try to ensure that a clause involving a constant can be turned into one of the form range variable operator constant e g EMP salary 5000 The POSTGRES query optimizer will only use an index with a constant qualification of this form It doesn t hurt to write the clause as 5000 EMP salary if the operator in this case has a commutator operator defined so that POSTGRES can rewrite the query into the desired form However if such an operator does not exist POSTGRES will never consider the use of an index e When joining several classes together in one query try to write the join clauses in a chained form e g where A a B b and B b C c and Notice that relatively few clauses refer to a given class and attribute the clauses form a linear sequence connecting the attributes like links in a chain This is preferable to a query written in a star form such as where A a B b and A a C c and Here many clauses refer to the same class and attribute in this case A a When presented
102. tID This cancels the copy contract with the source site and removes its meta data from the name server s catalog tables Similarly to cancel a SUBSCRIBE NAMESERVICE command use UNSUBSCRIBE NAMSERVICE 3 4 2 Specifying A Primary Name Server Every site needs to have access to a name server in order to process queries on remote tables The primary name server of a site is specified with the command set nameserver hostID This site will cause Mariposa to contact the name server indicated by hostID for all name service information Ifa site is itself a name server it could simply supply its own host ID Note that even if a site is a name server it may use a remote site for its own name service 3 5 THE MARIPOSA DATA BROKER The Mariposa data broker performs data placement moving and copying fragments in response to access patterns The data broker is a Tcl script which means that it is easily modified by Mariposa users When the site manager process is started it looks in the directory PGDATA files for a file called databroker tcl This file must contain a procedure called DataBroker written in Tcl which accepts no arguments and has no return value The DataBroker procedure is called each time a query is finished running The query may have originated locally or it may be part of a remote query being run on behalf of another site The site manager defines a few global Tcl variables that the data broker can use to make decis
103. to recover the database if the storage medium disk or the database files are corrupted In other words if a disk block goes bad or POSTGRES happens to corrupt a database file you cannot recover that file This can be disastrous if the file is one of the shared catalogs such as pg_database Once your users start to load a significant amount of data you will typically run into performance problems POSTGRES is not the fastest DBMS in the world but many of the worst problems encountered by users are due to their lack of experience with any DBMS Some general tips include e Define indices over attributes that are commonly used for qualifications For example if you often execute queries of the form SELECT from EMP where salary lt 5000 then a B tree index on the salary attribute will probably be useful If scans involving equality are more common as in SELECT from EMP where salary 5000 7 This may mean different things depending on the archive mode with which each class has been created However the current implementation of the vacuum command does not perform any compaction orclustering of data Therefore the UNIX files that store each POSTGRES class never shrink and the space reclaimed by vacuum is never actually reused Mariposa User Manual v 1 0 60 then you should consider defining a hash index on salary You can define both though it will use more disk space and may slow down updates a bi
104. trolled through use of the Tcl scripting language Using Tcl it is straightforward to change policy decisions one simply modifies the rules and scripts by which these modules are implemented Note We have not included an explanation of Tcl Readers unfamiliar with Tcl can refer to one of the books on the subject such as An Introduction to the Tcl and Tk Toolkit by John Ousterhout Addison Wesley 1994 or to the comp lang tcl USENET newsgroup 3 2 A Distributed Example The source for all of the examples in this section can be found in src tutorial dist sql This section assumes that you have installed Mariposa on at least two sites For instructions on setting up Mariposa refer to the Installation and Setup Guide In this section e We assume that you have a site manager process running at the two sites referred to as numbers 1 and 2 in the examples e We assume that Site 1 is a name server and has subscribed to the metadata for Site 2 using the SUBSCRIBE METADATA command See Section 3 4 and the Installation and Setup Guide e In the example data is moved from one site to another and then queried immediately so the update interval for the SUBSCRIBE METADATA Mariposa User Manual v 1 0 39 statement should be relatively short for example 60 seconds See Section 3 4 1 for an explanation of SUBSCRIBE METADATA e Site l is the home site where we are issuing queries Site 2 is a remote site in this case Miami 3 2 1 Creating
105. unction gt lt attribute gt Table 6 PARTITION Clause Syntax The syntax for key based and hash based partitioning requires some explanation The lt attribute gt is one of the columns in the table lt function gt is a function that takes two arguments and returns 1 0 or 1 depending on whether the first argument is less than equal to or greater than the second In general the comparison function is an access method comparison function as described in Section 2 8 All the tuples in which attribute is less than or equal to the split value go in the first fragment All tuples in which attribute 1s greater than the split value go in the second fragment Mariposa User Manual v 1 0 40 The widgets in the example in Section 2 1 1 were kept in two warehouses Miami and New York We will keep the records for widgets in the location where they are stored so we will split using key based partitioning on the LOCATION attribute We create the WIDGETS class with the following SQL statement CREATE TABLE WIDGETS PART NO int4 LOCATION charl6 warehouse Miami or New York ON HAND int4 quantity on hand ON ORDER int4 quantity on order COMMITTED int4 quantity sold but not shipped PARTITION ON LOCATION USING btcharl6cmp The function btchar16cmp takes two arguments of type char16 and returns 1 0 or 1 as described above In general for key and hash based partitioning the b tr
106. will only show the equality operator for the rest of the examples The other four operators are very similar Refer to complex c or complex sql for the details define Mag c c x c x c y c y bool complex abs eq Complex a Complex b double amag Mag a bmag Mag b return amag bmag There are a couple of important things that are happening below First note that operators for less than less than or equal equal greater than or equal and greater than for int 4 are being defined All of these operators are already defined for int 4 under the names lt lt gt and gt The new operators behave differently of course In order to guarantee that POSTGRES uses these new operators rather than the old ones they need to be named differently from the old ones This is a key point you can overload operators in POSTGRES but only if the operator isn t already defined for the argument types That is if you have lt defined for int4 int 4 you can t define it again POSTGRES does not check this when you define your operator so be careful To avoid this problem odd names will be used for the operators If you get this wrong the access methods are likely to crash when you try to do scans Mariposa User Manual v 1 0 34 The other important point is that all the operator functions return Boolean values The access methods rely on this fact On the other hand the support function returns whatever the pa
107. xed layout like C structures Instances of a composite type may contain null fields Also composite types that are part of an inheritance hierarchy may have different fields than other members of the same inheritance hierarchy Therefore POSTGRES provides a procedural interface for accessing fields of composite types from C As POSTGRES processes a set of instances each instance will be passed into your function as an opaque structure of type TUPLE Suppose you want to write a function to answer the query SELECT name c_overpaid EMP 1500 AS overpaid FROM EMP WHERE name Bill or name Sam In the query above you can define c_overpaidas include postgres h for charl6 etc include libpq fe h for TUPLE bool c_overpaid TUPLE t the current instance of EMP int4 limit bool isnull false int4 salary Mariposa User Manual v 1 0 26 salary int4 GetAttributeByName t salary amp isnull if isnull return false return salary gt limit GetAttributeByName is the POSTGRES system function that returns attributes out of the current instance It has three arguments the argument of type TUPLE passed into the function the name of the desired attribute and a return parameter that describes whether the attribute is null GetAttributeByName will align data properly so you can cast its return value to the desired type For example
108. y site has the basic infrastructure needed to provide name service the replication protocol and the name server catalog tables In order to fill the name server tables the database administrator has to establish copy contracts with those sites whose system catalogs it wants to replicate We call these sites the source sites The command SUBSCRIBE METADATA hostID update interval sets up a read only copy contract with the source site indicated by hostID The data of the relevant system catalogs is sent from the source to the name server site and is merged into the name server catalog tables The source site periodically sends an update stream to the name server site which is also applied to the name server catalog tables A variation of the SUBSCRIBE METADATA command is the SUBSCRIBE NAMESERVICE command SUBSCRIBE NAMESERVICE hostID update interval The SUBSCRIBE METADATA command allows a name server to acquire the data from the source site s name server catalog tables This reduces the overhead of running name service because a name server can reuse the data acquired by other name servers It would be possible to create a system with one of the name servers having a contract with every existing client site and all the other name servers simply replicating that name server s data Mariposa User Manual v 1 0 45 To stop serving the meta data from a particular site use the UNSUBSCRIBE METADATA command UNSUBSCRIBE METADATA hos

Download Pdf Manuals

image

Related Search

Related Contents

RayBio Rat Cytokine Antibody Array  DNP3 communication - Directory listing of  PRÁCTICA 2 Los alumnos deberán entregar los ejercicios 5, 6 y 7  Snapper SPE361 Lawn Mower User Manual  manual - BikeInn  2-Wire Video Intercom System  DE LAMINAS DELGADAS  

Copyright © All rights reserved.
Failed to retrieve file