Home

ON GROUPING IN RELATIONAL ALGEBRA 1. Introduction The

image

Contents

1. D 7 1 gt C Lee A 20 C O Neil A 12 C Brown A 2 2 gt C Clark A 25 For this environment the boolean expression 5 lt D A D lt 20 evaluates to true This means that we proceed to evaluate the expression following the question mark Had it evaluated to false the result of the whole evaluation for this particular environment would be the empty relation with schema D V As it is we obtain a relation consisting of one tuple which is the concatenation of D 7 the value of and V 9 the value of V SUM 4 1 SUM 4 2 In total we obtain the following D V T 9 8 11 13 12 For each district we can find the number of customers that Jones have dealt with but Miller has not as follows group Jones Miller by D do V COUNT4 rc Q 1 re 2 Of course the minus here is relation difference New possibilities arise when nesting is allowed Consider division usually de fined as r r2 t t x re Cri We can write group r1 r2 by Rz do group 1 by R R2 do x r2 Cri Here grouping is with respect to more than one attribute name Rz and R R2 should of course really be written out as a sequence of attribute names The first grouping provides us with 1 s consisting of tuples from r with schema R Ro The second group runs through these tuples one by one and includes them in the result if they
2. pass the test The second grouping is basically a forall construction since we group on the entire schema of the one argument relation 3 Notation We use standard relational algebra and calculus as defined in the original paper or in textbooks In this section we specify which variants we are using We fix a domain D of all constants that can appear in relations and expres sions and an infinite set A of attribute names In this paper there is no reason to distinguish between different types so schemas are simply finite subsets of A We fix the letters we use as names for various entities e 4 1 42 ranges over D and expressions of that type e A Aj Ao B C range over A e X Y range over finite subsets of A o t ti t2 U 1 2 range over tuples r r1 T2 1 2 range over relation names and expressions e R Ri R2 E E1 E2 range over schemas of relations and expressions e C1 C2 range over Boolean expressions conditionals A tuple is a total function from a schema into D We write t A for t s value on the attribute name A Constant tuples are listed using square brackets so A1 a1 Ag ax is the tuple such that t A ai 1 lt i lt k The empty tuple the function defined on the empty schema is written A relation is a schema R and a finite set of tuples r The schema of every tuple t r must be R The notation R r is used to indicate that relati
3. International Journal of Foundations of Computer Science World Scientific Publishing Company ON GROUPING IN RELATIONAL ALGEBRA KIM S LARSEN Department of Mathematics and Computer Science University of Southern Denmark main campus Odense University Campusvej 55 DK 5230 Odense M Denmark Received received date Revised revised date Communicated by Editor s name ABSTRACT The concept of grouping in relational algebra is well known from its connection to aggregation In this paper we generalize the grouping notion by defining a simultane ous grouping of more than one relation and we discuss the application of operations on grouping elements other than just arithmetic aggregation Finally we show that this grouping mechanism can be added to relational algebra without increasing its computa tional power 1 Introduction The concept of grouping in relational algebra is well known from its connection to aggregation and grouping constructs such as group_by have been defined in order to incorporate the ideas into relational languages However there is no reason why the use of grouping should be limited to a simple addition counting or maximization of a collection of domain values It might be natural to perform relational computations on groups of a relation before or even without applying aggregation We extend the grouping mechanism to more than one relational argument and discuss grouping as a general mechanism for po
4. e wp ri U TB r2 b1 b2 b3 ba We shall write r and r2 as a combination of the B tuples Clearly r can be written as lb x la1 U b2 x laa U lbs x las U b4 x and rg can be written as b1 x U fb2 x lc1 d c2 d2 U bs x le3 d3 U ba x lea da We refer to this as the grouping of r and r2 by B We define a class of environments from a grouping As usual an environment is simply an association of values with identifiers As an example consider the environment Niba b Q 1 gt la2 2 e1 d c2 d2 In this example we have three identifiers 1 and 2 each of which has an associated value This environment consists of the tuple b2 together with the relations consisting of the tuples from r and r2 which contain b2 except that in 1 and Q 2 the b2 parts of the tuples have been removed Similarly we have Miba gt ba gt 2 c4 d4 We can evaluate expressions in different environments As an example 1 x 2 evaluated in np this is denoted 1 x 2 7p is a2 c1 d1 a2 c2 da Similarly 1 x 2 np Because of the way the four environments nje Nibs gt Nbs aNd nNp are defined 1 x 2 7n will have the same schema no matter which of the four environments we use for 7 Thus the union of all four partial results is well defined The expression g
5. e translated into formulas with exactly one free variable When a composite expression is to be translated we assume that the first expression is translated into a formula where the only free variable is z and the second expression is translated into a formula where the only free variable is x2 For the only free variable in the result we use the variable x When translating a formula recursively this means that after a subexpression has been translated the free variable x must be renamed to z other occurrences of x should first be renamed to some new variable exp T exp Domain expressions a a A ty A Tuple expressions 1 z A a u A T a ty Relation expressions Ti LET Z 0 false u Tu Az z c e T c Adz Tle A z e1 U e2 dai T e1 Az z1 V Ave T e2 Ax 22 e1 2 dry Tle Ax z1 Aar T e2 Ax 22 e1 X 2 Jz Jz T e1 AT e2 A E1 z1 A E2 z2 TAoal 1 Sti The Ac a Az Aba oop er dey Tle Art 2 A2 A02 B Ty e1 dai T e1 Az zx Y ba B e1 3x1 T e1 Ax x E1 A A z B z1 A Conditional expressions ay Saz T a1 lt T a2 e C eg Voi T e1 gt Jrz T e2 A z1 T2 C1 mx C2 T cr A T c2 C1 V C2 T c1 V T ca nci aT c1 Figure 2 Translation into relational calculus Example 1 Consider the
6. ent m by n tf t 1 gt ri rt n a re Note that is always bound to a tuple value with schema X and i is a relation with schema R X 5 Computational Power Is Preserved We show how a grouping expression can be translated into relational calculus Since relational algebra and calculus are equivalent 7 this shows that grouping expressions can also be translated into relational algebra Since the grouping ex pressions contain a complete version of relational algebra grouping expressions are equivalent to relational algebra An expression group 11 by A1 A do e is translated into x Jtg VGt ti ET At X ty A T e where the translation of e T e is defined in Fig 2 In that table we use t4 for values corresponding to and t for tuples belonging to the relation r Additionally we use X for the set Aj Ax i e the schema of ty and With regards to scope of variables we assume that in each step of a recursive translation parenthesis are used around the entire formula in the right column of the table The scope of universal and existential quantification extends as far to the right as it is meaningful i e to the first unmatched right parenthesis Finally we use the following convention in order to make the table readable The translation is compositional so the translation of a composite expression is given in terms of the translations of the components Expressions ar
7. following expression group 71 72 by R N Rz do Q 1 x 2 In relational algebra this would be expressed as T R UR2 Rin Re r1 M r2 In both cases the attributes on which to group or project would be given explicitly though Let X be the set of attributes Ri N R Changing variable names whenever necessary the translation defined above re sults in x ste Viti ti ri A ti X t A ADs 309 abr ti E r1 At X Aa ti R X A ate te re A tg t2 X NA T2 t2 R2 X A z R Ra T1 A z R2 Ri z2 We state the following result Theorem 1 For any sequence of relations r1 Tn any set of attribute names X Cf Ri and any relation expression e group r1 fn by X do e x Aty Gt ti Er A til X t A T e Proof Note that ty U Tx r if and only if V dti ti ri A ti X ty is true Thus the result can be established by showing that for all environments ne where t LU mx ri we have that e ny x ty t AT e This equality can be shown by an induction argument by going through the translations defined in Fig 2 The proof is simple but lengthy and has been omitted 6 Nested Groupings In this section we briefly consider the necessary and desired additions when nested groupings are allowed and we give the extensions to the semantics and to the translation into relational calculus First the BNF grammar fr
8. gt lt lt atom gt lt rel gt C lt rel gt lt cond gt A lt cond gt lt cond gt V lt cond gt lt cond gt Figure 1 Grammar for the extended language lt atom gt lt tup gt lt rel gt and lt cond gt stand for atomic expression tuple expression relational expression and conditional expression respectively An atomic expression can be a constant or a tuple selection A tuple expression can be the empty tuple a constant tuple or a grouping tuple A relational expression can be a relation name the empty relation with schema Z a singleton relation containing only the one specified tuple a gate expression which evaluates to its relational argument if the condition evaluates to true group ing relations or a standard relational operator applied to relational expressions There is the additional syntactical restriction that must appear in the scope in the do part of a grouping expression and i must appear in the scope of a grouping expression with at least 7 arguments A conditional expression can be a comparison between domain values or rela tional values or it can be a Boolean formula over conditional expression In order to focus on what is essential we have left out unnecessary constructions For instance all the usual six comparison operators can be expressed using lt and the Boolean connectives Similarly for containment of relations C Also tuple concatenation has been used in the examp
9. les but since the ex pression in the grouping construction must be relational tuples can be converted into singleton relations sooner and Cartesian product can be used instead For instance R SUM 1 SUM 4 2 from an earlier example can be written x R SUM 4 Q 1 SUM 4 Q 2 Finally in the examples an attribute name A has denoted the value of the current grouping tuple on A In the grammar we require that the explicit form A is used instead For now until we allowed nested applications of grouping we discuss the lan guage where we have only one grouping construct with a relational expression defined by the BNF grammar in the do part group 11 1 by Aj Az do lt rel gt We require that every attribute A in the by part appears in every schema Rj In other words A1 Ak C1 Ri We now formally define the meaning of such an expression We do this in terms of the standard relational operators but notice that this does not define a standard relational algebra expression since among other things we use a data dependent number of unions Compare with the first example in Section 2 Definition 1 Let F group r1 fn by Ai A do e where X Aj An Cf Ri The semantics of F is denoted F and defined by IFI U len tEbF where br Tx ri and for each t by and i 1 n re Trax or ri and finally for each t by we define the environm
10. om Fig 2 is extended with the following lt rel gt group lt rel gt lt rel gt by A A do lt rel gt 1 Q2 Eea Wee lt tup gt So now groupings can appear anywhere a relational expression is expected When a grouping is used in the do part of another grouping two sets of group ing tuples and grouping relations are active at a time The intention is that and the z s refer to the nearest grouping and and the 7 s refer to the grouping one level further out So we must require that if for instance i is used then the grouping one level out has at least i arguments In general the number of s specifies the level relatively to the current As an example consider the query group 11 72 by Y do group Q 2 r3 by Z do 2 x 1 where Y and Z are some sets of attribute names Here the first 2 comes from r2 the second 2 comes from r3 and 1 comes from r To be precise about the meaning of a nested query we extend the definition of the semantics of grouping First if 7 is an environment then we let 7 denote the environment where all names have been equipped with an extra e g if 7 was defined on the set of names Q 1 1 2 then 7 is defined on 1 1 2 Assuming that 7 and 7 are defined on disjoint sets of names we use 7 n y to denote the combination of the two environments where 77 is defined
11. on r has schema R As usual we sometimes overload notation and let r refer to the relation R r We need a notation for empty relations with different schemas if Z is a set of attribute names a schema then we let Z denote the relation with schema Z but no tuples Thus is the empty relation with the empty schema There are exactly two different relations with the empty schema namely and the latter is the neutral element with respect to Cartesian product and natural join If e is a relational expression then we always let the corresponding capital letter E denote the schema of the relation the value of which the expression e denotes If t is a tuple t denotes the corresponding singleton relation We use standard symbols for the relational operators U x a m and 6 for union difference Cartesian product selection projection and renaming respec tively Select conditions consist of a single equality or inequality between two at tribute names or an attribute name and a constant If t A1 ai Ax ax and Ai Ax C R then we use g r as short for 04 a 0Ay a2 0A a This could equivalently be written t X r We also use the derived operator natural join X Finally it will be convenient for us to allow projection on an empty set of attribute names The obvious generalization is that 7g r is 0 if r is empty and O otherwise For relational calculus we use a tu
12. on the union of the two sets of names and the value on a given name is the value from either 7 or n When extending Definition 1 the crucial change is that grouping can now ap pear in the expression e Thus we must be able to evaluate a grouping con struction in an environment Using the notation just defined the semantics of group 71 7n by X do e 7 should be Uses fel 7 m When defining the translation of nested queries into relational calculus we must introduce a variable ty for each nested grouping This should be a new variable name each time since we want it to be in the scope of the ty s defined at levels further out Translation of s preceded by a number of s is then simply a matter of using the correct ty Grouping relations are handled similarly 7 Aggregation and Computation on Domain Values We extend the BNF grammar from Fig 1 with the following lt atom gt AGGR4 lt re1l gt lt atom gt op lt atom gt Thus an atomic expression can now also be an aggregation of a given attribute over a given relation Here AGGR could be any of the usual aggregate functions i e SUM COUNT MAXIMUM etc Furthermore an atomic expression could be some operation applied to other atomic expressions e g one of the usual arithmetic operations addition subtraction multiplication or division Since we have seen how we can translate any relational expression into relational calculus we can also express
13. ple based variant The atomic formulas are t r where r is a relation as well as t A0t B and t Ada where A and B are attribute names a is a constant and is one of lt gt lt and gt A formula is an atomic formula or one of ap pA gq pV q 3t p or Vt p where p and q are formulas If X is a subset of the schema of some tuple t then t X denotes the tuple t restricted to X i e its schema is X and for every A X t X A t A Asa special case t is the empty tuple A tuple relational query is an expression of the form t p where p is a formula and t is the only free variable in p We use the convention that t is defined on any attribute mentioned in connection with in the query e g if t A and t X appears in p then is defined on A UX 4 Relational Algebra with Grouping To make the presentation clearer we initially omit nested groupings and com putations on domain values including aggregation In the last sections we discuss how these restrictions are removed We define the extension of relational algebra which is specified in Fig 1 using BNF notation lt atom gt a A lt tup gt A lt atom gt lt rel gt r Z lt tup gt lt cond gt lt rel gt 1 2 lt rel gt U lt rel gt lt rel gt lt rel gt lt rel gt x lt rel gt aac lt rel gt 74 4 lt rel gt 6448 lt rel gt lt cond gt lt atom
14. rmal Form Re 10 11 lations In Proceedings of the Ist ACM SIGACT SIGMOD SIGART Symposium on Principles of Database Systems 124 138 ACM Press 1982 Anthony Klug Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions Journal of the ACM 29 3 1982 699 717 Kim S Larsen RASMUS User s Manual MD 60 Computer Science Department Aarhus University 1992 Kim S Larsen Michael I Schwartzbach Erik M Schmidt A New Formalism for Relational Algebra Information Processing Letters 41 3 1992 163 168 Akifumi Makinouchi A Consideration on Normal Form of Not Necessarily Normalized Relations in the Relational Data Model In Proceedings of the Third International Conference on Very Large Data Bases 447 453 IEEE Computer So ciety Press 1977 Jan Paredaens Dirk van Gucht Converting Nested Algebra Expressions into Flat Algebra Expressions ACM Transactions on Database Systems 17 1 1992 65 93 J D Ullman Principles of Database and Knowledge Base Systems Vol 1 Com puter Science Press 1988 11
15. roup 11 172 by B do Q 1 x 2 denotes exactly this value a2 c1 di a2 c2 d2 ag c3 d3 In fact by this ex ample we have given an informal semantics of the grouping operator In stan dard relational algebra the relation just computed could have been expressed by wx 11 X r2 where X R R2 U R2 Ri We move on to an example using aggregation We use the notation SUM4 r where r is a relation which has an integer attribute A This expression evaluates to the sum of the integers in column A of r Consider the relation Sales with schema SalesPerson District Customer Amount which contains information about sales by different salespersons i e which sales person which district to whom and the total value of the sale In the following we use the initial letters S D C and A Also for brevity let Jones denote TDCA Tg Jones Sales and let Miller denote tpc4 7g_Miller Sales We use the following example data Jones Miller DIC A D C A 2 Smith 17 7 Clark 25 7 Lee 20 8 Morrison 9 7 O Neil 12 8 Kent 9 7 Brown 2 13 Hansen 12 8 Chang 7 Now for each of the districts numbered from 5 through 20 we want to find the difference of their sales This can be expressed by group Jones Miller by D do 5 lt D A D lt 20 V SUMa4 Q 1 SUM4 2 A typical environment looks like this n
16. sing queries A query language 8 which offers this more general grouping mechanism has been used for a number of years for teaching and minor administrational tasks at some Danish universities In this paper we discuss the essence of such a query language focusing on the grouping mechanism and the extra possibilities it offers as an addition to relational algebra Since we extend relational algebra we also show that the computational power is unchanged It is of great interest to extend relational algebra in the direction of adding more computational power but this is a separate issue it should not be a side effect of the decisions concerning the issues under consideration here We show that the computational power is unchanged by giving a translation of our grouping mechanism into relational calculus Since the nesting operator 5 also involves grouping a translation for unary grouping can be derived from 1 2 Examples In this section we give some examples All concepts and notation appearing in this section will be defined formally in the following sections First we discuss the concept of grouping of relations Let two relations r and rg with schemas R and R be as listed below Fa T ATB BTC D ay by b2 cr di az be bz c2 d2 ag b3 bz c3 d3 ba ca d4 First we note that Ri R2 B and that the tuple values in the B columns ar
17. the result of an aggregation assuming that relational calculus is extended with some appropriate construction for this We could use a construction such as Jtsum p where t is the only free variable in p The value of this expression is the sum of the t A s for all t which fulfill p The translation does not in principle become any harder However since rela tion expressions can now appear inside tuple expressions via domain expressions we cannot just use a compositional translation as in Fig 2 Instead we have to de fine a separate analysis and translation of tuples This is simple but cumbersome and has been omitted Acknowledgments The author would like to thank an anonymous referee for suggesting that the translation of the extended relational algebra be made into relational calculus in stead of into relational algebra References 1 E F Codd A Relational Model of Data for Large Shared Data Bases Commu nications of the ACM 13 6 1970 377 387 10 E F Codd Relational Completeness of Data Base Sublanguages In Data Base Systems Randall Rustin ed 65 98 Prentice Hall 1972 Peter M D Gray The GROUP_BY Operation in Relational Algebra In Databases S M Deen P Hammersley eds 84 98 Pentech Press Limited 1981 Peter M D Gray Logic Algebra and Databases Ellis Horwood Limited 1984 5 G Jaeschke H J Schek Remarks on the Algebra on Non First No

Download Pdf Manuals

image

Related Search

Related Contents

CT-414WR 取扱説明書ダウンロード  0570-088988  USER MANUAL  Instalação e configuração do scanner  Fellowes 320, 320C, 320-2HS, 380, 380C Paper Shredder User Manual  Istruzione d`uso  DTT+SATELLITE HD RECEIVER RC5320HD  DNET66-Z 取扱説明書  MSI Megabook CX600 CX600-048  SILICONE OIL (M1203143 - 2006-06  

Copyright © All rights reserved.
Failed to retrieve file