PAGE 1
PERFORMANCE OF A PARTITIONBASED ALGORITHM FOR VALIDTIME EQUIJOIN by ZAIHUA JI A thesis submitted in partial fulfillment of the r eq uirements for the degree of Master of Science in Computer Sci ence Department of Computer Science and Engineering College of Engineering University of South Florida Dec ember, 1997 Major Professor: Michael D. Soo Ph.D.
PAGE 2
Graduate School U niver s ity of South Florida Tampa, Florida CERTIFICATE OF APPROVAL Master's Thesis This i s to certify that t he Master s Thesis of ZAIHUA JI with a major in Computer Science h as been approved by the Examining Committee on October 25, 1997 as satisfactory for the thesis requirement for the Maste r of Science in Computer Science degree Examining Committee: Major Professor: Michael D. Soo, Ph.D. Member: N. Ranganatb:n, Ph.D. Member: J. Christensen, Ph.D.
PAGE 3
ACKNOWLEDGMENTS I a m deeply grateful to my major professor Dr. Michae l Soo for his va luabl e gu idan ce in my pursuing the research and my organ izin g and editi ng the thesis. I also appreciate his readiness and pati ent for s u ggest ion s a nd exa mples. I can not imagin e how I could finis h t hi s thes i s without his h e lp a nd enco urag e ment Thanks Dr. Rangananthan and Dr. Christensen for their reviewing of my thes i s and being members of my t h es i s defense committee. 1 a l so t h a nk o n e of my best friends, Yantain Lu for his e ffort in r ea ding through my whole thesis and his valuable suggestions. The last but no t the l east, my famil y my go r geo us wife Janyong and lovel y so n H o n gz h ao provides me t h e most precious s upp o r t a nd without their lov e and com f o r t it is impossible for me to accomplis h this t h es i s.
PAGE 4
TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES ABSTRAC T CHAPTER 1 I NTRODUCTION 1.1 T emporal Databases 1.2 Query Processing . 1.3 Con ve ntion a l J o in Operation 1.4 Va lidTime J o in Operation 1.5 Summary . . . . . CHAPTER 2 VAL IDT IME EQUIJOIN 2.1 D efi ni t ion 2 .2 Evaluation 2.2.1 NestedLoo p A l gor i thm. 2.2.2 Sor t Me r ge Algorithm 2.2.3 PartitionBased A l gorithm lV v Vll 1 3 4 5 7 10 11 11 1 4 1 4 1 6 1 8 2 3 Summa r y . . . . . . . 2 1 C HAPTER 3 PREVIOUS T I MESTAMP PARTI TIOl\I N G A LGORITHMS 23 3.1 PreYious work b y L eung and Muntz 23 .
PAGE 5
3.1.1 Partitioning the Input Relations 3.1.2 Joining the Partitions ..... 3.2 Previous work by Soo Snodgrass and J e nsen 3.2.1 Determining the Partition Inter va ls 3.2.2 Partitioning the Input R e lations 3.2.3 Joining the Partitions . 3.3 Previous work by Lu: Ooi and Tan 3.3.1 Partitioning the Input Relations 3.3.2 Joining the Partitions 3.4 Summary .......... CHAPTER 4 ALGORITHM MODIFICATION 4.1 Determining the Partition Intervals ... 4.1.1 Correlating Start Times and Durations 4.1.2, Reducing Sampling Cost ..... . 4.1.3 Constructing the Partition Intervals 4.2 Partitioning the Input Relat ions 4.3 Joining the Partitions 4.4 Summary . . . CHAPTER5 PERFORMANCE 5.1 Parameters 5.2 Experiments 5.2.1 5.2.2 5.2.3 Uniform Distribut ion of V5 and 10 Chronon Duration Uniform Distribut ion of Ys and 500 Chronon Duration Uniform Distribution of Ys and m e mory Siz e of 4MB 11 24 25 26 26 30 31 32 32 34 34 36 37 37 39 45 48 48 49 51 51 53 54 56 57
PAGE 6
5 2.4 Normal Distribution of Vs and 10 Chronon Duration 5.2.5 Normal Distribution of Vs and Memory Size of 4MB 5.3 Summary . . . . . . . . . . . . . . CHAPTER 6 CONCLUSIONS FUTURE \iVORK LIST OF REFERENCES lll 59 61 62 63 66
PAGE 7
Table 301 Table 501 Table 502 LIST OF TABLES Comparison of Previous Work in Timestamp Partition Join Common Experiment Paramet ers 0 0 0 0 0 Generic I/0 and Memory Operation Costs lV 34 52 53
PAGE 8
LIST OF FIGURES Figure 1.1 An example of conventiona l relation 2 Figure 1.2 An example o f tempor a l relation 3 Figure 1.3 An example of conventiona l join 6 Figure 1.4 An examp l e of val idtime j oin 8 Figure 2.1 Schemati c block n ested lo op eq uij o in 15 Figur e 2.2 Schematic sortmerge eq uijoin ... 17 Figur e 2.3 Schematic partitionbas ed equijoin 19 Figure 3.1 Rangepartitioning a long time 24 Figure 3.2 Allocation of memory space 28 Figure 3.3 Two dimensional partitioning 33 Figure 4.1 Negat ive corre lation between start time a nd duration 38 Figure 4 2 Independent distributions of start time and duratio n 38 Figur e 4.3 Positive correlation b etwee n start time and duration 39 Figure 4.4 Real and sampled distributions o f valid starttime 4 1 Figure 4.5 Distribution of the sampl e e rr ors .... ...... 43 Figure 4.6 Sample sizes with fixed main memory and va ri able e rror space 45 Figure 4 .7 Sample s i zes with va ri able main memory 46 Figure 4.8 D y namic a ll y allocate d memory buffers . 47 Figure 5.1 I/0 cost for unif orm Vs and D v = 10 chronons 55 v
PAGE 9
Figure 5.2 Inmemory cost f o r uniform V5 and D v = 10 chronons 55 Figure 5.3 I / 0 cost for uniform V5 and D v = 500 c hronons 56 Figure 5.4 Inm emory cost for uniform Vs a nd D v = 500 c hronons 57 Figure 5.5 I / 0 cost for uniform V5 and M =4MB ... 58 Figure 5 6 Inme mor y cost for uniform V5 and M = 4MB 59 Figure 5.7 I/0 cost for normal distribute d V5 and D v = 10 chronons 60 Figure 5.8 Inm emory cost for normal di stributed Vs and D v = 10 chronons 60 Figure 5.9 I / 0 cost for normal distributed Vs and M = 4MB .... 61 Figure 5.1 0 Inm emory cost for normal dis t ributed V5 and M = 4M B 61 Vl
PAGE 10
PERFORMA NCE OF A PARTITIO N BASED ALGORITHM FOR VALIDTIME EQUIJO I N by ZAiljUA JI An Abstract Of a thesis submitted in partial fulfillm e n t of the requirements for the degr ee of Master of Science in Computer Scienc e Department o f Computer Science and Engineering University of South Florida Decemb e r 1997 Majo r Professor: Michael D. Soo, Ph.D. Vll
PAGE 11
:\. database r eco rds information about some aspect of the r eal world. Temporal databases generalize conventional databases by recording t he evolution of objects over time. Joins are fundamental operations which r ecombi ne information that has been separately stored in a database. Join operations occur frequently and are potentially expensive to execute. Temporal jo i n operators, which typically include inequality predicates on t ime are not supported efficie ntl y by conventi onal algorithms which are optimized for equality predicates In this t hesis we co nsider techniques for eva l uating join operations over temporal databases Our approach is to consider a single class of algorithms so called partitionbased algorithms and adapt them to exploit the time dimension inherent in the te mporal databases. An existing a lgorithm with sampling and cac hing strategy is described in this thesis. Based on this impl eme ntation a modified join algorithm with partitioning on time is dev e loped which gives better p er formance when available memory space is small co mpared to the sizes of the joining operands. Abstract Approved: Major Professor : Michael D Soo Ph.D. Assistant Professor Department of Computer Science and Engine e ring Date Approved: 1 Vlll
PAGE 12
CHAPTER 1 INTRODUCTION A database is a collection of related data which are r eco rd e d facts about some aspect of the realworld. We use t he term miniworld to refer to the portion of the real world being modeled. For example, a university database may includ e faculty in formation such as name, social security number rank, and department. A package of generalpurpose software is used to facilitate the processes of defining constructing and manipulating the database. This software package is referred to as a database management 'System (DBMS). Manipulating a database includes such functions as querying the database to retrieve specific information and updating the database to refl ect changes in the miniworld. The combination of a database and its correspond ing DBMS is called a databas e system. If information in a database is limited to facts at a particular point of time the database is referred to as a conventional database. An instance, or state, of such a database is its current content which may be different from the current status of the miniworld The relational data model [Cod70] is the dominant commercial data m odel. In a relational databas e data is organized into mathematical relations [ EN94], each of which is a set of identi ca lly structured e l eme nts termed tuples. Attributes are used to represent the properties of the component fields that comprise the tuple structure, termed schema, of each relation. Components of tuples draw their values 1
PAGE 13
I Name I DeptName Bill Computer Tom Mathemati c s Figure 1.1: An example of conventional relation from domains that correspond to the individual attributes Typically relations are represented as tables whose rows correspond to tuples and whose columns correspond to attributes. For example a relation in a conventional university database is shown in Figure 1.1. Name and DeptName are attributes that form the relation schema; each tuple holds the name of a professor and the name of the department he or she belongs to. The domains of Name and DeptName are character strings of faculty names and university departments, respectively This relation records for each professor, his or her name and the current department in which he or she is employed Database technology is widely applied in almost all areas of modern society, ineluding business data processing, library information retrieval systems, multimedia applications with images and sound, computeraided design and manufacturing realtime process control, and scientific computation [Gra93]. Efficient database query processing plays a critical role in management of large databases with sizes ranging from several megabytes to many terabytes. This is typical for database applications at present and in the near future [ SSU91, Doz92]. We started this chapter by introducing the definitions and semantics of the conventional databases. We generalize the notion of a database to incorporate time : resulting in a temporal database in Section 1.1. Query processing, retrieving in2
PAGE 14
I N arne I DeptN arne II T Bill : v lathematics 5/87 5 /90 Bill Computer 6/90 now Tom \1athematics 8/93 now Figure 1.2: An example of temporal relation formation from a relational database, is discussed in Section 1.2 One of the most important query operations, the relational join is presented in Section 1.3. The join operation in temporal databases is introduced in Section 1.4. Motivation for the work in this thesis is also discussed in this section. Finally, a s hort summary of this introduction and a outline for the remainder of this thesis are given in Section 1.5. 1.1 Temporal Databases Time is an< essential aspect of the constantlyevolving realworld. A deficiency of the conventional relational model is its la c k of integrated support for timevarying information. For example, in t he relation shown in Figure 1.1 the first tuple of the relation tells us that Bill s current department is "Co mputer," but there is no information about what Bill 's department was in the past, much less about what it is predicted to be in the future. Temporal databases alleviate this problem by retaining "outdated" and/or future information in add ition to current information. Consider the tempo ral relation shown in Figure 1.2. By associating time periods with tuples, the temporal relation retains past information e.g., that Bill was in the Mathematics Department from May 1987 to :\lay 1990 as well as cur r ent information e.g., that Bill has been in the Computer Departm e nt since June 1990. 3
PAGE 15
A temporal database is a database that records the evolution of objects over time (JCG+92]. Typically, such temporal database management systems (TDBMSs) support the storage and retrieval of past, current and/or future inf ormation. A temporal database is semantically a generalization of a conve ntional database, where a conventional database can be regarded as a temporal database restricted to the cur r ent state of the mini wo rld. Time supported in temporal database systems i s multidimensional [ SA86]. Valid time is the time w hen facts were true in the modeled reality. Transaction time is the time when facts were current in the database. These two time dimensions are orthogonal and toget her induce four types of databases. Snapshot databases support neither valid time nor transaction time. A conventional database can be co n sidered a particular snapshot of a corresponding temporal database [ SA86]. Transactiontime databases s upport transaction time, va lidtime databases support valid time, and bitemporal databases support both va lid and transaction time. The relation shown < in Figure 1.2 i s actually a val id time relation, because the attribute T represents the valid period of the associated tuples. Most current applications demand va lid time support [JSS95]; therefore, we concen t r ate on val idt im e databases in the rest of this thesis. 1.2 Query Processing A query is a r eq u est from a u se r to r etrieve information in data storage An example of a query for the relation shown in Figure 1.1 might be "What department is Bill in?" Query processing is the set of steps by which a database management system exec u tes a userspecified query. Typically, a user poses a query in a highlevel declarative query language such as SQL [MS93]. The query processor of the DBMS 4
PAGE 16
transforms this query into a procedural eq uivalent usuall y exp ressed as an algebraic statement, which is then optimized and executed The relational algebra provides operators such as UNIO N, I N TERSECTIO N DIFFERENCE, CARTESIAN PRODUCT, SELECT, PROJECT, and JOIN [ EN94 ] An individual query usually incorporates many of these operators. The relational join operator is one of the most important a lgebraic operators due to its prevalence and comp l exity. The goal of this thesis is to describe efficient algorithms to implement the validtime join operator, which generalizes the relational join operator. Befor e progressing, we first describe the conventional join operator and then generalize t his operator to support valid time. 1.3 Conventional Join Operation Normalization of relations demands that database information be stored in separate relations [Cod72]; hence it is common for a query to require information from < multiple relations. The join operator denoted by 1><1, is used to combine tuples from two input relations into tuples of an output relation according to matching conditions on their common attributes, which a re referr e d to as join conditions and join attributes, respe ct ively For example, consider Figure 1.3 which shows two input relations DEPT, recording in which departm e nt each professor is employed, and LOCATION showing in which building eac h department is located and the join result between DEPT and LOCATION. The join result s hows for each professor the building in which his or her office is loca ted and the department he or she belongs to. For example the first tuple in the resulting relation DEPT lXI LOCATI0 1 is produced by com bining t h e first tuple in DEPT and the first tuple in LOCATIOA' since they agree on the value of their common attribute Dept Name. This fundam e ntal binary5
PAGE 17
DEPT LOCAT IO N I Name I DeptNa me D eptName Location Bill Computer Co mputer Bld2 Tom Mathematics Mathematics Bld 1 DEPT LOCATION Name DeptName Oept Na me Location Bill Computer Computer Bld 2 Tom Mathematics Mat hematics Bldl Figure 1.3: An example of convent ional join matching function of combining data from two relations is frequently and widely u sed in the database query proc ess ing. In addition to its prevalenc e and complexity, JOIN operators can be very expensive, since, in the worst case, they require order of O(nm) comparisons, where nand m are the carqinalities of the input relations. Efficient evaluation of join operations is critical to large databases, because poor implementations may lead to such quadratic evaluation cost. The techniques and methods used to implement conventiona l j oi n operators fall into t hre e basic catego riesnestedloop so rtmerge, and partitionbased join algorithms [ME92, Gra93]. Nestedloop is the simplest implementation using ex haustive com pari so n to find all pairs o f matching tuples. The quadratic cost of this method makes it unsuitabl e for joining large inpu t relations. The sortme rge algorithm is typ ically less expensive because it reduces the number of needed comparisons by first sorting the input relations on their join attributes; the sorted relations are then seq u entially scanned to find matching tuples. The eva lu ation cost of this a l gorithm depends o n t h e particular a lgorithms used for sorting and 6
PAGE 18
merging. In general t he overall exec ution time of this algorithm is dominated by the sorting time, which is u s ually 0 ( n log n + m log m). For a main memory space that is smaller than the input relation sizes : the cost of sorting phase is dominated by disk paging costs. Like sort m e rges partitionbased algorithms have also two distinct phases. In the first phase, the input relations are partitioned into small groups, termed buck ets, us ing a common has h function. The partitioning is performed so that tuples in a given bucket of one r e lation need only to be compared to the tuples in a corresponding bucket derived from the other relation. In the second phase tuples in corresponding bucket pairs are compared to find the matching tuples. Hence the number of compar isons is greatly reduced Effici ency requires that the size of the generated buckets fit in main memory during the joining phase ; otherwise recursive partitioning is required resulting in poor performance. The performance of both the sortmerge based and partitionbased algorithms is sensitive to the size of main memory. Neither the input relation sizes nor the memory sizes determine the choice between the sortmerge based and partitionbased algorithms [Gra93]. The performance of the partitionbased algorithm however, out performs the sortmerge algorithm if one input relation i s much smaller than the other one, since the level of recursive partitioning depends only on the smaller relation in the partitionbased algorithms, while sorting is required for both the input relations in the sortmerge algorithms. 1.4 ValidTime Join Operation A validtime join i s a generalization of the conve ntional join with additional match ing conditions on the timestamp attributes. For example consider Figure 1.4 which 7
PAGE 19
DEPT I Name I DeptName I I ValidTime I Bill Mathematics 5 9 Bill Computer 10 now Tom Mathematics 5 now LOCATION I DeptName I Location II ValidTime I Computer Bld 2 5 now Mathematics Bld1 0 19 Mathematics Bld 3 20 now DEPT LOCATION I Name I DeptName I I DeptName I Location I I ValidTime I Bill Mathematics Mathematics Bld 1 5 9 Bill Computer Computer Bld2 10 now Tom Mathematics Mathematics Bld 1 v 20 Tom Mathematics Mathematics Bld 3 20 now Figure 1.4: An example of validtime join shows a validtime version of the input relations, DEPT and LOCATION and the relation which results from their validtime join. The join result shows for each professor, the building his or her office is located in and the department he or she belongs to during a certain period of time. For example, the first tuple in the result relation DEPT LOCATION is produced from combining the first tuple in DEPT and the second tuple in LOCATION since they agree on the value of their common attribute, DeptName and their validtime intervals overlap. In addition to the equality condition on the joinattributes, Dept N ame, a c onjunction of in e quality conditions determining if the input tuples overlap is performed The validtime join operation is necessarily more difficult than a conventional join for two reasons. First, additional effort is requir e d to determine overlapping of 8
PAGE 20
timestamps. In current commercial DBMSs, join operators are implemented so as to efficiently execute equality conditions, rather than inequality conditions. Second, validtime databases are usually much larger than their conventional counterparts due to historical information. New techniques are therefore required to evaluate the validtime join operators. Algorithms developed for the validtime joins have concentrated on refinements of the existing conventional algorithms by including timestamp inequality conditions in join evaluations [SSJ94]. For validtime joining, the partitionbased algorithms may outperform other implementations due to the same reason as described for conventional joining. In addition, relative to the sortmerge algorithm the partitionbased strategy for valid time joining is proved favorable when longlived tuples are present [SSJ94] Therefore we consider a partitionbased strategy for the validtime join evaluation in this thesis. The additional validtime attributes provide alternative partitioning variables for partitionbased algorithms. In this thesis we consider a specific partitioning strategy, termed timestamp partitioning [Soo96], by using the timestamp attributes as partition variables, instead of the explicit join attributes termed explicit partitioning strategy. This approach is beneficial if the relations cannot be efficiently partitioned using their explicit attributes. Partitioning on timestamps adds an interesting complication to the partition' based algorithms because our timestamps are intervals, i.e. range data, rather than discrete values. Tuples with attributes of time intervals can conceivabl y overlap multiple partitions, and these tuples termed longlived tupl e s [SSJ94], must be present in each partition they overlap when the join of that partition is computed A straightforward solution to this problem is to simply replicate tuples across all overlapping 9
PAGE 21
partition [LM91 ]. Soo et al., proposed a different solution which guarantee that all longlived tuples are present in each partition they overlap during joining, while avoid ing replication of these tuples in secondary storage and also reducing the complication of update operations [SSJ94]. This algorithm, however does not provide satisfactory effectiveness of joining evaluation when the available memory space is relatively small compared to the input relation sizes. In this thesis, we describe a modified implementation that addresses this problem. Experiments show the improved performance of this algorithm when the input relations are large and the m e mory space is relatively small. 1.5 Summary A temporal database is a database that records evolution of objects over time. The validtime join is one of the most important operators in validtime databases due to its prevalence and complexity. In this thesis, we consider algorithms for eval uating joining operation over validtime databases. We concentrate on a timestamp partitioning strategy for the validtime join, because we believe that proper implementation of this strategy may outperform other a lg orithms. A general discussion on the definitions and different implementations of the validtime equijoin a specific join operator, are given in chapter 2. Previous work on timestamp partitioning equijoin is introduced in Chapter 3. Improvements to the partitionbased algorithm for the valid time equijoin are described in chapter 4. Experiments testing the performance of the modification are described in Chapter 5. The conclusions of this thesis and suggestions for future work on this topic are given in Chapter 6 10
PAGE 22
CHAPTER 2 VALIDTIME EQUIJOIN In this chapter, we describe t h e va lidtime equijoin, and discuss and compare basic strategies for its implementation The va lidti me equijoin plays the same role in validtime databases as the conventional equijo in does in co nventional databases. Similarly, the validtime equijoin is expensive to evaluate for the same reasons as t he conventional equijoin. We begin by formally defining the equijoin operator in Section 2.1. Different implementation strategies, including nestedloop sortmerge, and partitionbas e d a l gorithms, are discussed and compared in Section 2.2. A short summary of this chapter is given in Section 2.3. 2.1 Definition To define the validtime equijoin we first define the conventional equijoin, and then generalize this operator for valid time databases. Let R and S be conventional relation schema R (A 1 ... An, B 1, ... B k) S (C1, ... ,Ck,D1, ... ,Dm) where Bi a nd Ci, 1 ::; i ::; k are the join attributes, and Ai, 1 ::; i ::; n, and Di 1 ::; i ::; m are additional, nonjoin attributes. In the following we will use 11
PAGE 23
A, B C, and D as a shorthand for {A1 ... ,An } {B1 ... ,Bk}, {C1 ... ,Ck}, and {D1, ... Dm}, respectively. Let r and s be instances of R and S, respectively. The conventional eq uijoin of r and s, r !Xlr.B=s.c s, is defined as follows r l> t2 t2 otherwise 12
PAGE 24
The function overlap computes t he maximum inter va l co ntain ed in its argument intervals U and V. If the computed int erva l is invalid then overlap returns a distinguished undefined va lu e .L l (U V) { [last(Us, Ys), first(Ue, Ye)] if l ast(U5 Ys) :::; first(Ue, Ve) over ap _1_ otherwise Let Rand S be va lidtime relation schema S (C1, ... Ck. D1, ... Dm, Vs, Ve) wher e A, B C, and D are explicit attributes, semantica ll y the same as the ones defined in the pr ev ious conventional sc hema and Vs a nd Ve a re validtime attributes which repr esent a validtime period that starts at Vs and e nds at Ve. We will u se V as a shorthand for the p e riod (V s, Vel Let r and s be instances of R and S, respectively. The validtime eq uijoin o f r and s r txl;,B=s.c s, is defined as follows r [Xlv s = { zn+2 k+m + 21:3x E r :3y E s r.Bs.C (z(A] = x[A]I\ z[B] = x[B ]I\ z(C] = y(C]/\ z[D] = y[D]I\ x[B] = y[C]I\ z[V] = overlap(x[V], y[V]) 1\ z[V] #_i)} Two tuples x E r and y E s produce an output tuple if they ag r ee on their join attributes and overlap in their va lidtim e interval s x(V] and y[V]. The output tuple z contains the co ncatenation of the explicit attribute values of x a nd y, and i ts timestamp is the overlap of the timestamps of x and y, that is, overlap(x[V], y(V]). Figur e 1.4 on page 8 is actually an exa mple of va lidtime eq uijoin. The schem a of the two input r e lations a r e DEPT = (Na m e, DeptName, V) and LOC ATION = (DeptName, Location, V). The r es ult sche m a is (Na m e, DeptName DeptNam e Location V). The first tuple in the result i s produced b y co n catenating t he first tuple in DEPT with the seco nd tuple in LOCATIO N s ince they ag r ee o n a common value 13
PAGE 25
"Mathematics, in their DeptName attribute and their validtime periods overlap that is, overlap([5 9], [5, now]) = [5, 9]. 2.2 Evaluation As in conventional equijoin evaluation, techniques for the validtime equiJOm are classified into three categories, nestedloop, sortmerge and partitionbased algorithms. In the following sections we describe and compare these algorithms by first introducing their conventional implementations and then adapting them to support valid time. A complete discussion of different validtime equijoin implementations can be found in Soo's dissertation [Soo96]_. To effectively analyze and compare these implementation strategies, we describe some common conditions before progressing. In the following sections, we consider only I/0 evaluation costs, since the cost of transferring data between secondary storage and main memory is usually more expensive than the cost of inmemory process ing. A read or write of a disk page is used as the unit of I/0 cost. The sizes of input relations r and s are represented by irl and lsi, respectively, in units of disk, or memory, pages. In addition, the cost of writing the output relation is ignored because it is essentially uniform for all algorithms. 2.2. 1 NestedLoop Algorithm Nestedloop is the simplest implementation of the equijoin operator in conventional database systems. Input relations r and s are termed the outer and inner relations, because they are controlled by t he outer and inner loops of the algorithm, respectively. For each outer tuple, the inner relation is entirely scanned to find all matching tuples. The exhaustive nature of this algorithm prohibits its use to evaluate equijoin of large input relations. 14
PAGE 26
Figure 2.1: Schematic block nestedloop equijoin In practice, a block nestedloop method [Kim80] is u sua ll y used to improve the performance of this algorithm, as illustrated in Figure 2.1. For a K pag e memor y space, K1 pages are assigned to the outer relation and one page is reserved for the inner relation K1 consecutive pages of the outer relation are read into the main memory in turn. For each such group of outer pages, the inner relation is sequentially scanned page by page, and tuples from the outer group and tupl e from the inner page are compared. Consequently, to complete the join operation, we need to load the outer relation into main memory once and load the inner relation O(lri/(K 1)) times. Therefore, the I/0 cost of the block nestedloop algorithm is estimated as follows. This implementation reduces significantly the I/0 evaluation cost, si nc e the inner relation is sca nned for eac h outer group rather than each outer tuple. For outer relations occupying less than K1 pages the eva luation cost i s linear in the number 15
PAGE 27
of pages occupied by the input relations. This a lgorithm works fine for small outer relations, but its evaluation cost grows q uadratically w h en the input re l ation sizes become larger than the memory space. Usin g a nest e dloop algorithm to eva luate the va lidtim e equijoin is as simp l e as its conve ntional counterpart Essentially, overlap in t h e timestamp attribut es can be checked sim ultaneou s l y when the equality predicate on the explicit j oin attributes is eva lu ated [SG89 ] This direct implementation of the validtime equijoin does not re q uir e much modification of the conventional algorithm nor does it increase execution comp l exity However for large va lid time re l ations the nestedloop a l gorithm is still not a good cho i ce due to its exhaustive nature. 2.2.2 SortMerge Algorithm The conve ntional sortmerge algo ri t hm is comprised of two distinct phases. The input relations are first sorted on their j oin attributes, and the two sorted relations are then seq uenti a ll y scanned to find matching tuples, as illustrated in Figure 2.2. In effect, the so rtin g operation pre pr ocesses the input relations with the intention of reducing the number of unsuccessful compa ri sons when the input relations are scanned. External sorting [Knu73] is required if an input r e lation is larger than t h e ava ilabl e main memory. External so rtin g first divides a n input r e lati on into a set o f groups, termed runs, w her e each run fits in the memory space. The initial runs are sorted separately, and t hen t h e sorted runs are recursively merged to produce the so rted input relation. Quicksort and replacement selection [Knu73] may be used to generate the initial set of sorted runs [Gra93 Tsa96] . For t h e sort m erge based algor i thm, the sorting phase normally dominates t h e eval u ation cost. For externa l sorting using a Kpage memory spa ce, the number 16
PAGE 28
Unsorted Relation Temporary Runs Sorted Relation Sorted Outer Relation Sorted Inner Phase I : Sorting on both input re lat ions ,, O u ter Buffer : O utput Relation : II I I I IS II I I t : I } t I I I _________ J Main Memory Phase 2: Merging the two sorted r elatio n s Figure 2.2: Schematic sortmerge equijoin of runs may be W = lrl/ K. If the size of each input buffer is C, the maximal number of runs that can be merged at one time in memory, termed the fan in, is F = LK/Clj, leaving one buffer f or t h e output run. Therefore, the number of merge levels i s logarithmi c in the number of runs, namely logpW, which represents how many times relati on r needs to be l oaded int o main memory [Gra93]. A similar sorting cost is obtain ed for the other input relation s Therefore the I/0 cost of equijoin of r and s can be estimated as follows. This a l gorithm is typically much more efficient than the nestedloop algorithm 17
PAGE 29
Complications aris e how eve r if the join attr:butes are not k ey attributes In this case, multipl e tuples in r and s may have id e ntic a l join attribute va lues. Hen ce, a given t upl e in r m ay join w i t h multiple t u ples in s, requiring r epeated scans of those t uples in the inner relation In the worst. case, t h e sortm e rge a lg orit hm degen erates into e xhau stive comparison. Simil a r to t h e neste dloop a lgorithm we ca n eva lua te t h e va lidt im e e quijoin b y modif y ing t h e co nventional so rtmerge algorithm to simultaneousl y c h eck the overlap of the t im e intervals when eva luating the equality predicat e on the ex plicit join at tributes [SG89, Tsa96]. The exp licit join attributes are still u sed as so rting variables. In addition we ca n u se t h e ti m esta mp s as a seco nd a r y so r ti n g va riable. The technique of so rting primaril y on ex plicit join attributes and seco ndarily on timestamp was e mploy e d by S egev a nd Gunadhi in the ir impl e m e ntation of the entityjoin [SG89]. This technique may r e duce the number of compari so n s performed during t h e mergin g phase. Alternatively, t he timestamp attributes can be us ed primarily o r even so l e ly, to order the input relation s [Soo96] However o rd e ring o n timestamps may also degenerate into ex h a u s tive co mpari so n during t h e merging phase if eac h tuple in r overl aps man y t upl es in s. 2 .2.3 PartitionBased Algorithm Lik e the sor tmerg e a l gor i t hm t he partitionbased algorithm attempts to r educe the number of comparisons n eeded to find matching tuples [ Bra94 DK0+84, ZG90]. As b e f o re, we u se the te rm s "o u ter" a nd inn e r to differ entiate t h e r oles played b y the input r e lation s Partition base d e quijoin eva luati o n consists o f two di s tinct phases as illustr ated in Figure 2.3. In t h e firs t p h ase, t h e input relation s r a nd s are parti t i o n e d into n bu c k ets for each relation. Typ i cally a commo n h as h function i s u se d as t h e partitionin g 1 8
PAGE 30
w Buckets '"";.;,, 0 e0 partitioned mput from input relation relation 0 Phase I: Partitioning each input relation Outer buckets lnner buckets Output Relation t><:1 Td><:1 r 21><1 s2 t. r n 1><1 Sn Phase 2: Joining the partitions Figure 2.3: Schematic partitionbased equijoin agent. The partitioning is performed so that tuples contained in a g1ven bucket produced from one input relation need only be compared to the tuples contained in a corresponding bucket derived from the other input relation. In the second phase the equijoin is evaluated by comparing tuples in corresponding buckets of the input relations to find the matching tuples In the partitionbased algorithm each outer bucket is intended to maximize but not overflow, the memory space. If any of the generated buckets overflows its allocat e d memory buffer it is recursively partitioned until all generated buckets fit in the available memory. If the partitioning is successfully performed without recursive partitioning, the I/0 cost of this algorithm is estimated as follow s Cpart = O(lsl + lrl) 19
PAGE 31
The effici e nc y of this algorithm h owever may be dramatically r ed u ced if many generated outer buc k ets are l arger than the memory space. Similar to the sortmerge based strategy the performance of the partitionbased a l gorithms is also sens i tive to the memory size. The smaller the memory space is, the higher the number of recur s ive partitioning may be required In t h e worst case, t h e partitionbased algorithm degenerates also into exha u stive comparison when sufficiently small partitions can not be co nstru cte d. Essentially, neither the input relation sizes nor t he available memory s izes are dom inan t factors to c hoo se between the so rtm erge based a nd partitionbased a l gor i thms. The performance of the partition joining however outperform the sort merges if one of the inpu t relatio ns is smaller than the other since the level of recursive partitioning d epends on l y o n the s mall e r r e lati o n in the partitionbased algorithms, w hil e sortin g is required for both the input relations in the sortmerge algorit hm s [Gra93 ] In ad dition, by compa ring Cpart with Csort, we can see that the partitionbased a l gorithms may potentially outperform the sort m e rge a l gorithms if we can partition the outer relation into b u ckets to fit in the main memory space without recursive partitioning as Soo et al. did in their algorithm by using sampling method [SSJ94]. We describe a l gorithms to accomplish thi s in Section 4 .1. Similar to t h e n estedloop and sortmerge a l gorit hm s, we can a l so modify the conve n tional partitionbased a l gorithm to eva lu ate the va li dtime equijoin by testing the overlap of timestamps simu l taneous l y when e va luatin g the equality predicate on t h e exp li cit join attribu tes. This is a direct strategy to modify the e xist in g partition based a l gorit hm s for conv e ntiona l equijoin to implement the validtim e equijoin for validtime databases. 20
PAGE 32
The valid tim e attribu tes provide alternative partitioning var i ables to the explicit join attributes. The technique of timestamp partitioning will be beneficial when partitioning directly on the explicit join attributes l eads to expensive evaluation cost as in the case that s ufficient small partitions ca n not be constructed due to low div e r sity of the explicit join attribute values [Soo96]. For the partitionbased a l gorit hms partitioning on timestamp is normally more complicated t h an partitioning on the explicit attributes because the timestamps a r e intervals i.e., range data, rather than discrete values. A rangepartitioning str ategy is normally emp lo yed for partitioning on timestamps in which a set of consecutive, non overlapp in g int erva l s termed partition intervals are determined along t h e lifespan of the input relations [ LM91]. Partitioning on timestamps may incur extra evaluat i on cost b eca use there may ex ist tuples with validtime intervals overlappin g multiple partitions; these tuples are termed longlived tuples [SSJ94]. In this thesis, we consider the partitionbased str ategy for the validt ime equi join eva lu ation, because we believe that proper implementation of this strategy may outperform other a l gorithms for the same reason as we discussed for the conven tional partitionbased strategy. In add i tion, the partition joining algorithm is proved outperforming the sort merge when s i g nifi cant numb er of longlived tuples present compared to the inpu t r e lation cardin a lities [SSJ94]. 2 3 Summary In va lidti me databases the validtime equijo in plays the same important role as the conventiona l equijoin does in conventiona l databases. The exist in g convent i ona l im plementations can be modified to handle the val id time equijo in by simultaneously eva lu ating overlap of the va lidtime intervals w h en the equality condit i on o n the co n2 1
PAGE 33
ventional join attributes i s evaluated. These impl e mentations resul t in similar eva lua tion cost as compared with their conventional count erparts Alternativ e ly, validtime attributes ca n b e used as sort ing and p a rtitionin g va riables in t h e sortm e r ge base d and partitionbased algorithms respectively; these approaches have been investigated in the previous work [Soo96]. In the rest of this t hesi s, we concentrate on one particular class of these implementations the t imestamp partitioning strategy, to evaluate the validtime equijoin. We are going to dis c us s some ex isting timestamp partitioning implementations in Chapter 3. 22
PAGE 34
CHAPTER 3 PREVIOUS TIMESTAMP PARTITIONING ALGORITHMS The semantics, definition, and different implementation strategies for the va li dtime eq uijoin were described in t he previous c h apters In this chapter, we discuss and compare some existing implementati bns s pecificall y for timestamp partitioning strategy. We b eg in by introducing L e ung and Muntz' implementation [ LM91 ] in Section 3.1. Soo et al. described another timestamp partit ioning strategy [SSJ94]. We discuss their work in Section 3 2 A twodimensional timestamp rangepartitioning algorithm for computing validtime developed by Lu et al. [LOT94], is discussed in Section 3.3. Finally we summarize this chapter in Section 3.4 3 1 Previous work by Leung and Muntz Leung and Muntz developed a multiprocessor timestamp partitioning a lgorithm for the validtime equijoin In t heir a lgori t hm the r e lation lifespan is assumed to be [0, now), where 0 is the initial time of t he considered database a nd now is a specia l marker that r ep r esents t h e current time; "[" and ) indicat e that the relation lif espan includes 0 a nd excl udes now With now as the upp er bound of the life span no future information is supported in their a l gorithm. As the database evo lves, they ass um e that a ll n ew in serted tuples must be current tuples a nd start in the last partition 23
PAGE 35
P; time t n + l =now Figure 3.1: Rang e partitioning along time interval ; the r e fore, no hi sto rical tuples can b e inserted into the database. We refer to this as to the lim itedinsertio n assumption In t he follo w ing sect ions we discuss t heir algorithm in two phas es, part i t ioning the inpu t relations and joinin g the generated partitions. 3.1.1 Partitioning the Input R elations Leung and Muntz, in their algorithm [LM91], introdu ced a t im esta mp partitioning strategy, as illustrated in Figur e 3.1. The relation lifespan i s di v id ed into n partitio n inte rvals. As the r e lation lifespan is assumed to be [0, now), b y convention t1 = 0 and tn+l = now. For the parti t ion Pi, 1 ::; i ::; n its partit i o n in te rval is [ti, ti+t) with ti and ti+l being i ts lowe r and upper partition boundaries, respectively. A tuple of the input r e lation i s stored in the bu c ket correspo ndin g to partition Pi if this tuple starts during the interval [ti, ti+t). The current tim e now i s the upp e r partition boundary of the last partition Pn As time pro g r esses, now is a lwa ys t he cur r ent t im e a nd c urr ent t uples w i t h startin g time greater than tn a r e in serted int o Pn, until its s ize surpasses the predefined parti t ion 24
PAGE 36
size. Update of the last partition interval is then needed. The direct way to do this is to split the last partition into two partitions[tn, tn+l) and [tn+l, now). Some other strategies for upgrading the partition intervals are also introduced by Leung and Muntz [LM91]. Consider two input relations r and s that are partitioned using the same set of partition intervals. There may exist tuples in both r and s with validtime intervals overlapping multiple partitions. Two longlived tuples x E r and y E s, matching in their explicit join attributes and overlapping in their timestamps may be actually stored in different partitions; therefore computing the validtime equijoin requires replicating each longlived tuple into all partitions it overlaps. This replication re quires additional secondary storage space, and results in extra I / 0 evaluation cost for the duplicated tuples. For simplicity, Leung and Muntz homogeneously partition the input relations with the same set of partition intervals. As a temporal database evolves however, it is hard to maintain the same partition boundaries for all relations while keep bucket sizes fitting the main memory space. 3 .1.2 Joining the Partitions The replication of longlived tuples in the partitioning process guarantees that each partition can be evaluated independently in a single proc esso r. This simplifies the joining phase of the algorithm. The replication of longlived tuples however, incurs additional I / 0 evaluation cost, and may lead to poor performance of the algorithm if many longlived tuples need to be replicat ed. The lack of support for futur e time the limitedinsertion assumption t he replica tion of longlived tuples, and the difficult y of maintaining the homogeneous partition boundaries are distin c t shortcomings of this algorithm 25
PAGE 37
3.2 Previous work by Soo, Snodgrass and Jensen T o improve on the s hor tco min gs of Leung and M un tz: algorithm [LM91], Soo et al. described another t im esta mp par t itionin g a lgorithm for t h e validtime eq ui join [SSJ94]. This algorithm i s a uniproc:essor algorithm, rather than a multiproces sor algorithm as did in Leung and a l gorithm. Semantically they remov ed the limitation on both the upp er time boundary of the relation lif espa n and the inser t ion of new tuples; therefore this algor ithm 1nay be applied to a general version of valid t im e databases that supports update of the post and future, as well as the c urrent tuples. Operationally, they dynamicall y migrate rathe r than r ep l icate longliv ed tupl es b etwee n partitions. This algorithmsto res the migrated tuples in a temporary memory buffer, termed the tuple cache, thereby retaining them in main memor y and reducing the I / 0 evaluation cost. D eter mining the partition inter va ls i s a c riti ca l step for properly parti tioning the input into buck ets that maximize, but do not overflow, the available memory space. This a l gorithm separates the partitioning phase into two s ubphases determin ing the partition intervals and partitioning the inpu t relations. These two subphases, as well as the last phase w hich join s the generated partition s are di sc ussed in the followin g sect ions. 3.2.1 Determining the Partition Intervals A simple strategy to construct the partition inte rval s i s to sort the input r e l ations on their timestamp attributes and then choose the partition boundari es in a sequential scan Whil e this exact method y ields an optima l s olution, it is prohibitively expe n s ive to execute due to the cost of sort ing. 26
PAGE 38
Instead, this algorithm determines a set of partition intervals that with high prob ability are close to t hose that would have been chosen with the exact method. To do this they randomly sample tuples from the outer relation and based on these samples, choose a set of partition boundaries from which the partition intervals are constructed. The same partition intervals are used to partition both the outer and inner relations. Sampling is potentially expensive; therefore, the algorithm e stimates the total evaluation cost given different sample sizes, and c hooses the sample size with the smallest estimated evaluation cost. The estimated evaluation cost consists of three components that is, Ctotal = Csample + Cpart + Cjoin where Csample is the cost of drawing random samples from the outer relation r; Cpart is the cost of partitioning both the outer and inner relations, r and s, respectively, into n buckets; and Cjoin is the cost of joining the generated partitions. Usually, Cpart is linear in the sizes of the input relations, if r e cursive partitioning is avoided [ME92 Gra93]. Therefore this cost is ignored during the evaluation analysis. The available memory space, M, in units of pages is allocated as shown in Fig ure 3.2 [SSJ94]. Mr, M5 Me, and 1\lfout are memory buffers assigned to the outer bucket ri, the inner bucket si, the tuple cache Ci, 1 i n and the output relation result respectively One page of memory space is reserved for 1\115 and a fixed number of pages are r e served for 1\lfc and 1\lfout Efficienc y requires that the outer bucket and tuple cache fit in their allocated memory buffers allowing them to be retained in memory without being flushed to disk. Since sampling provides only an approximate partitioning a portion of the avail able memory has to be reserved to accommodat e errors that could possibly result 27
PAGE 39
I I I 0+1 I M, l>
PAGE 40
This sampling cost dominates the evaluation cost of dete rmining the partition intervals since each sampled tuple require s a random I/0. A shortcoming of this formulation is that Nf e needs to b e large enough to accommodate the maximal accumulated errors for the sequence of outer buckets starting from the first partition, instead of the intended maximal estimated error for a single outer bucket. We will ex plain this in mor e detail in S ect ion 4.1 where we will see that this formulation may result in drawing more samples than necessary if input relations are large and a limited amount of main memory is available. The evaluation cost, Cioin, to join the derived partitions is the sum of the random I/0 cost lOran, ofreading the first pag e in each bucket the sequential cost, lOseq, to read each of the remaining pages, and any I/0 cost associated with possible overflow of the tuple caches. lOseq is usually a small fraction of lOran since random I/0 requires mechanical movement of the disk readhead. The total joining cost, Cioin, is estimated as Cjoin = 2n[l0ran + (Prl)IOseq] + Ccache where the factor 2 represents the I / 0 cost for both the input relations; n is the number of partitions; and Pr is the estimated size of the outer buckets which is the same for each partition. The I/0 cost for the longlived tuples is estimated as C { 2n(l0ran + (CsMc)lOseq) if Cs > Afc cache 0 if C < /II[ sc where Cs is the est imated size of the tuple cache with the assumption that the distributions over valid time, of the tuples in as well as the sizes of, the outer and inner relations are similar. The most direct way to reduce Cjoin is to in c reas e lVlr. A larger 1 "1r results in fewer partitions of large r size. A given tuple is therefore less lik ely to overlap multiple 29
PAGE 41
partitions. Consequently fewer random reads are needed and the likelihood of tuple cache overflow i s r e duced. However for a fixed J \1page memor y s pace a larger Afr results in a s mall e r j \!Je, which requires more randoml ydraw n samples and l ea ds to a higher Csample The optimal solution is to obtain a sample size that minimizes Csample + Cjoin After the sample set has been drawn, this algorithm co unt chro nons the essent ial time points [DS93, SDS95], to derive the partition intervals. The c hronons covered by each sampled t upl e are collected, sorted and divided into n groups with the same count of chronons in each group. The at t he group boundaries are chose n as the partition boundaries, and t he partition intervals are const ru cte d. A shortcoming of this chrononcounting technique is that possible correlation be tween the distribution of starting time of the sampled tuples a nd that of their co r responding duration is not considered when the partition interval s are determined. Consider a stock trading e xample in which the stocks, issued b y a company, are re co rd e d in a r elatio n with a sch ema (StockiD, Holder ValidTime). The stock hold ers may keep the stocks for a long period of time (long durat ion) without trading if the stocks' market va lue increa ses steadily. Howev er, h eavy t rading (short duration) may occur when t he stock market is not steady; mor e tuples with shorter validtime intervals may be created during this unsteady p e riod while the density of chronons along the relation lifespan does not change. Consequently c hronon counting can n o t provide appropriate information to co nstruc t t he partition intervals in this example. 3.2.2 Partitioning the Input Relations With the construct ed set of part i t ion inter va ls, the input r elat ion s are partitioned in such a way that each tuple is physically sto r ed in t h e last pa rtiti o n it over laps [SSJ94]; joining the d e rived partitions proceeds in order from the last partition 30
PAGE 42
to the first. Tuple caching ensures that longliv ed tuples are dynamically migrated during the joining phase to all partition they overlap and does so without introducing unnecessar y replication in secondary storage. GRACE partitioning is assumed in the partitioning procedure [ME92], whi c h evaluates join operation in two phases i.e., partitioning and joining. Since the partition intervals are determined from the sample information the outer buckets usually fit their allocated memory buff ers; therefore, recursive partitioning is normally not needed and the evaluation cost of the partitioning procedur e is linear in the sizes of the input relations. However, the l o nglived tuples are left for the joining phase to handle, which is described in the next section. 3 .2.3 Joining the Partitions During the joining phase of the algorithm, the main memory space is allocated as described in Figure 3.2 The procedure starts from the last partition interval. For each partition Pi, 1 i n, its corresponding outer bucket ri is dynamically constructed by retaining the tuples in the previous outer bucket that overlap the current partition interval and reading in the physical partition ri from disk. Similarly, the current tuple cache is constructed by retaining the tuples, in either the previous inner bucket or the tuple cache that overlap the current partition interval. The tuple cache may be flushed to disk if it overflows its allocated memory buffer. The join of the current partition is done by joining the outer bucket with both the inner bucket and the tuple cache. :\To additional I / 0 cost is incurr e d if the tuple cac he is memory resident For all partitions except the first the join of the next partit ion interval simply repeats the same procedure. The output relation is produced by concatenating the join results from all n partitions. 31
PAGE 43
Another shortcom in g in thi s impl ementation is that t h e r e may be duplicate matches between the longlived tuples i n t h e o u ter a nd inner relations. Essentially, the long li ved t upl es retained from t h e previous o u ter bucket a n d those in t h e current tuple cac he, retain ed from the previous inner bucket may h ave a lr eady bee n joined. In ad dition a fixed tuple cache buff e r s ize m ay a l so l ea d to poor performance w hen many tuple cac hes overflow t h e ir allocated buffer. 3.3 Previous work by Lu, Ooi and Tan Lu et a l. proposed a no t h e r algorit hm that maps va lidtime inter v als to poi nts in a two d im e n s ion a l space [LOT94]. clustered the t uple s based on a spatial mapping, which we are going to describ e below. Like Leun g and Muntz this algorithm ass umes that time i s bounded b y now. Hen ce, future time is not s upport ed. We organize the discussion o f their a lgorithm in the f o llowing two sect ions as we did for the d esc rip tio n of Leung and Muntz' algorithm. 3.3. 1 Partitioning the Input Relations Lu et al. in their a lgorithm mapp e d the timestamp attribu te [V5 Ve] o f eac h tuple to a discr ete point (x, y) in a twodimensional space where the x and y axes repr esent t h e starting time and the dur a tion of [Vs, Vel, r espect ively, i.e., x = V5 the valid starttime, andy= VeV5 the duration o f t h e given validtime in te rv a l. Since each tuple is m apped into a s ingl e point in t h e two dim ens i o nal p l ane, duplication i s avoide d for l ong liv ed t upl es As illustrated in Figure 3.3 the twodim ensio n a l space i s bounded by x = 0 y = 0 and x + y =now, whe r e now i s t h e current time. This space is partition ed in to subspaces with time bo und a ries x = Ti a nd x + y = where 0 i n and To = 0 32
PAGE 44
T n x=V 5 Y=VV e 5 T nI Valid StartTime Figure 3.3: Twodimensional partitioning and Tn =now. x + y is actually the valid e ndtim e Ve. Eac h partition is bounded as 1i x Ti+l and Ti x + y Tj+l 0 i j n. There ar e a total of n(n + 1)/2 partit ion s. For simpli city, they assume a constant incr e ment in both the va lid starttime and valid e ndtime for all partitions that is, Ti+l T i = 1i Ti_1 and h e nc e the partition intervals are indep e nd e nt of the distributions of both the va lid starttim e and the va lidtime duration. There for e the bucket sizes may va r y from one partition to another In order to avoid partitioning cost, Lu et al. s u ggeste d that t h e tuples wh e n they are ins erte d in to the database b e clustered u sing the g iven mapping so that tuples logi cally belonging to the sam e partitio n are ph ys i ca ll y stor ed together. 33
PAGE 45
Table 3.1: Comparison o f Previous Work in Timestamp Partition Join I Leung&:..Iuntz I Soo et al. I Lu et al. I Historical tuple insertion No Yes Yes Support of future time No Yes No Partitioning strategy Implicit Explicit Explicit Tuple r e plication Yes N o No Homogeneous partition boundaries Probabl y Yes Yes 3.3.2 Joining the Partitions Tuples with long duration are sto red i n a single partition, but the y may still overlap t uples in other partitions ; therefore each outer bucket may ne e d to join with multiple inner buckets Hence, many inner buck ets corres ponding to different outer buckets may b e repeat e dly read into main m e mory, resulting in additional I/0. Furthermore the sizes of the outer buckets are not assured to be similar; therefore, overflowed buckets may exist. If this happens, the effectiveness of their imp lementation may be greatly reduced. 3.4 Summary In this chapter, we discussed previous work in timestamp partitioning equijoin algorithms. The advantages and disadvantages o f these impl ementations are summarized in Table 3.1. Leung and Muntz proposed an implementation for timestamp partitioning validtime equijoin. But the lack of support for future time, the limitedinsertion ass umption t he replication of longlived tuples, and the difficult y of maintaining homogeneou s partition boundaries may decrease the effectiven ess o f their algorithm. Lu et al., in their algorithm, mapped tuples into points in a twodimensional space to avoid replication of l onglived tuples. This algorithm, how eve r does not support futur e time and may need multiple matches b etwee n outer and in34
PAGE 46
ner buc k ets. The efficiency o f this algorithm m ay a lso b e reduc e d if the s i zes o f the outer buc kets a r e nonuniform. We beli ev e that t he algorithm developed by Soo et al. [SSJ94] provides a better basis for impl em e n t in g t h e partitionbased va lidtime equijoin. Their algorithm s upports futur e time determines t h e partition in terva l s dynamic a ll y, and avoid replication of longliv e dtuples. However so m e shortcomings still ex ist in samplin g caching an d buff er in g as we m e n t i oned in Sect i o n 3.2. VVe desc rib e modifications to t hi s a lg o rithm in t h e next chapter. 3 5
PAGE 47
CHAPTER4 ALGORITHM MODIFICATION Previous t imestamp p a rti t ionin g a lgori t hms to compute the va l idtime eq uij o in w e r e discussed in t he prev i ous chapter. In this c h apte r we int r od uce so m e improve m ents to the original a l gorithm developed b y Soo et al. [ SSJ94]. To more effect ively use the samp l e information we consider possible correlati on b etwee n the starting times of the s ampl e d tupl es and their corres ponding durations. In addition a new formulation i s d e ri ve d to cal c ulate t h e numb er o f sa mples to draw. Compared witb the formulati o n use d in the o rigin a l a lgorithm t hi s new f o rmul atio n significant l y reduces the s ample s ize w h e n input r elat ion s are l arge and r elat iv ely li t tle memory is avail a b le. We a lso ex t e nd t h e ca ching polic y to r etai n o u ter l o n g liv ed tupl es in a se p a rate tupl e cac he termed outer caches Thi s new cac hin g polic y e li m inate s duplicat e matches b etwee n longlived tuples in the outer and inn e r r e l ations Whe n joining the generated partiti o ns this n e w a lgori thm e mpl oys a d y nami c mem o r y a llocation strategy to better retain t h e l o n g li ved t upl es in memo ry and so r e duce the I / 0 eva lu ation cost. The m o difi e d algorithm h as t hree phas es dete rminin g t he partiti o n inte r vals par t it i o nin g the inpu t r e lati o n s a nd j o inin g t h e ge n e rated partitions as d id t h e o ri gina l a l go rithm. In the f o llowing, we desc rib e o nl y the r eq uir e d modifications to the o ri gin a l a l gorithm. The modific a tion s to d ete rmin e the parti tion in te r va l s a r e discussed in 36
PAGE 48
Section 4.1; a short discussion on the cha nges needed to partition the input relations is given in Section 4.2; and improv e m ents on joining the co nstructed partitions i s dis cussed in Section 4.3. Lastly, a short summary of this chapter is given in Section 4.4. 4.1 Determining the Partition Intervals As discussed in the previous chapter, determining the partition intervals is a criti cal step for the effectiveness of t he t imestamp partitioning algorithms. In the following sections we describe modifications to determine the partition intervals 4.1.1 Correlating Start Times and Durations The chrononcounting method in the original algorithm does not exp loit possible correlations between the starting times of the sample d tuples and their corresponding durations. For the sampled tuples, both their valid starttime and duration may be distributed nonuniformly along the r elat ion lifespan The correlation between these dis t ributions may be negative zero (independent), or positive. Examples are given in the following to argue the importance of considering t h e correlation. A negativ e ly co rrelated exampl e is shown in Figure 4.1. vVe choose two intervals p1 and p2 with the same duration, i .e e 1s1 = e2s2 = 10 chronons where si and ei, i = 1 or 2, are the lower and upper time boundaries of t he intervals. Six tuples overlap p1 among which five tuples start during the interval and one is longlived t upl e that overlaps the previous interval, while four tuples overlap p2 among which two tuples start during the interval and t h e other two are longliv e tuples although the same number 25, of c hronons is counted for both the intervals. This example i s similar to the stocktrading example discussed in Section 3.2.1. For t his negativ e correlation, the chronon co unting method does not provide t he correct sample information to determin e the partition inte rval s 37
PAGE 49
I I I I .... 5 I 1 8 I I 4 I I I 9 I s I .... I I ... I I I I I I 8 I I 4 I I 1 9 I s I I I I I I I I I I I s1 P1 e1 Sz Pz ez Figure 4.1: Negat ive correlation between start time and duration I I I 9 I .... 8 I .... I I I I 5 I I ... 4 I I ... I I 9 I I 8 I I I I ,... I 5 I I I I I I I I I I I I I I I I I t s1 P 1 e1 Sz Pz ez Figure 4.2: Independent distributions of start time and duration Another example, in Figure 4.2 shows zero correlation (independent) where the same layout is chosen for the two inte rvals, p1 and p2 as in the first exampl e. The two intervals contain simi l ar tuple information, i.e., three starting tuple s one longlived tuple and 23 chronons. This is a typical example when the chrononcounting method may provide correct information for determining the partition intervals. The third example in Figure 4.3 shows a positive correlation. As before the same layout is a l so chos e n for p1 and p2 In this example P1 contains two starting tuples one l ong li v e d tuple and 8 chronons, whi l e p2 c ontains three starting tuple s two longlived tupl es, and 32 chronons If partiti on intervals are con structed by counting 38
PAGE 50
I I 18 I I 1 I I .. I ... I I 9 .. I I I I I 3 .. I I 8 .. I I I I 9 I .. I 14 ... I 81 I I I I .. I I I I I I I I .. s, p, e, s2 P2 e 2 Figure 4.3: Positive corre lation betwee n start time and duration chronons, we have to di v id e p2 in to four part i t ion interva l s to matc h p1 which lead s to more partition interval s than n ecessa ry. In the modified algorithm, w e e xploit any possible c orrel a tion betwee n t h e star ting times of the s ampled tuples and the ir associated durations when constr ucting partitio n inte rval s. T o do this, we construct a n initia l set of p artition intervals according onl y to starting tin;es of the sampl ed tuples The partition intervals are t h e n a djusted by co n sidering t he timestamp durat i o n s o f t h ese tupl es. Details of t his procedure are provid ed in S ec tion 4. 1.3. 4.1.2 Reducing Sampling Cost In the original a l gorithm, the error space Me i s us e d t o accommodat e the maximum estimate d e rrors a cc umula ted for the seq u e n ce of outer buckets starting from t h e first partition, instead of the m aximum estimated error for a s in gle bucket. T hi s may r equire drawing more samples t h a n ne cessary, if input relations are large and memory space is r e lati ve l y s mall. A n e w formulati o n to compute t h e number o f samples to draw i s deri ve d in t h e f ollowing. 39
PAGE 51
During partitioning of an input relation r its lifespan T is divided into n nonoverlapping partition intervals and then all tuples in r are clustered on their valid starttime V5 into buckets associated with these n partition intervals. In Figure 4.4(a), the real, unknown density function f(t) of V s is compared with its corresponding sampled density function s(t) computed from the randomlydrawn samples. til and ti are time boundaries of the partition interval Pi, wher e 1 :::; i :::; n. Each rectangle in the figure represents a bucket estimated from the sample information and its area is proportional to the estimated bucket size ; the wider a rectangle is the longer the partition interval and the lower its sampled density are and vice versa. The sampled density function of V s can be expressed as N s(t) = nf::::..t for til :::; t < ti (4.1) where N is the cardinality of r, f::::..t = titi_1 and i = 1 2 ... n. For each bucket the estimated error for its size, f::::..p, in number of tuples is as follows. f::::..p = (f(t)s(t))t:::..t (4.2) Integrating f(t) and s(t) we obtain their accumulated distribution functions F(t) =lot f(t)dt and S(t) =lot s(t)dt (4.3) respectively, as illustrated in Figure 4.4(b). Let T m be the maximum positive difference between F(t) and S(t), which happens at some time tm, and guarantees a 99 % confidence The one side Kolmogorov test statistic [Con80] 1.52N Tm = sup[F(tm) S(tm)] = Vm (4.4) establishes a relationship between T m and the number of sample s, m. We choose a 99 % confidence to statistically minimize the chance of buffer overflowing. 40
PAGE 52
(a) f(t) & s(t) (b) F(t) & S(t) S(t ) Figure 4.4: Real and sampled distributions of va lid starttime Let be the difference between the accumulated number of tuples starting before tm and its corresponding numb er estimated from the sampl e set. The maximum positi ve under a 99% c onfidenc e, is eq u a l to the Kolmogorov test statistic in equat i on ( 4.4). = Tm = 1.52 N Vm (4.5) If we con s id er a portion of memor y space to accommodate this accumulated er ror the er ror space shou ld be (4.6) wher e St i s the tupl e s ize and I< i s the page size, both in unit of bytes; so, Me is in unit of pages. As before we let jrj be the size of r, in unit of pages. Combining ( 4 .5) 4 1
PAGE 53
and ( 4.6), we obtain a formulation to decide the minimum number of sample to draw as follows. (4.7) This is the formulation used in the o riginal algorithm [SSJ94]. The coefficient 1.5 2 is used from the one side of Kolmogo rov s t atistic test instead of 1.63 of two s id es of the statistic t est, s ince we concern only t he overflow of memory buff er during the joinin g evaluation. Actually partitionbas e d algorithm .ioins one partition each time; therefore we only need to consider 1\lfe for the estimated error !:1p for a single bucket instead of the accumulated error !:1. Consider the di s tribution of !:lp for each bucket as illustrated in Figure 4.5(a). The distribution of !:1p can be easily transferred into the distributi o n as shown in Figure 4.5(b), where each rectangular has the same width, while its area remains the same as that in (a); therefore, !:1p is measured b y height only. A normal distribution fits !:1.p with standard deviation cr, as shown in Figur e 4.5(c) ; this distribution has a density func tio n 1 (t.pt f (!:1p) = !
PAGE 54
(a) t1p(i) r1 .. (b) (c) f(t1p} 0 Figure 4.5: Distribution of the sampl e errors To fit every bucket in memory, we should a ll ocate the error space at least a size as follows. (4.10) Combining (4.9) and (4.10), we obtain the standard deviation that is linear in lvfe. (4.11) If a relationship can be established between a in (4.11) and 6. in (4.5), we can derive a new formulation for computing the number of samples to draw. This new relationship is established in the following derivation. 43
PAGE 55
A single !::J.p could either be positive or negative. The possible count of positive 6.p out of a certain number of 6.p follows a binomial distribution [Con8 0]. The worst situation occurs when all n/2 positive 6.p accumulate before any negative !::J.p appears. This extreme situation is unlikel y to happ e n practically. By the binomial distribution test, a quantile value Y+ = n/4+wrq (4.12) represents the maximum number of positive 6.p out of n/2 under 99 % co nfidence where w is the same quantile value as us ed in (4. 9). To simplify the derivation we let 6.p be the average of its all positive values. 1 100 (J 6.p = fiC 6.pe2u d(6.p) = fiC y 27r
PAGE 56
18000 16000 14000 c 12000 = 0 u 10000 ., Q. 8000 E .. V) 6000 4000 2000 ' ' ',, ,,,''' Original Formulation New F orm ul ation 200 250 300 350 400 450 500 Error Space (KB) Figure 4 6: Sample sizes with fixe d ma in memor y and va riable error space n eg l ecte d for simplicity. The sample s i zes compu te d for both the new a nd origi n al formulations are p lotted against !vie in Figure 4.6. At hfe = 300KB a typ i cal e rror space measuring 30% of !vi in this exa mple, the s ample s iz e computed from t h e new and origina l formulations a r e 3611 and 7557, res pecti ve ly. We need to draw only half the numb er of
PAGE 57
c 0 u 0 Q. E "' 100000 10000 .... ...................... 1000 100 10 0.25 0 5 Original Formulation New Formulation ......... __ ........ ..... ___ I 2 4 8 Main Memory Size (MB ) Figure 4. 7: Sample size:; with variable main memory 1 6 between l o nglived t uples in outer and inn er relations we retain the outer l ong l i ved tuples in an outer cache. We also dynami ca ll y adjust m emo r y allocati o n s between the tuple cac he buffers and the outer bucket buffer to retain the l ongli ved tuples in memory when they are n eeded. Further exp l a nation of these two modifications are given in Section 4.3. M i s now assigned to different buffers as illustrated in Figure 4.8 The Mr, !VIer, M3 Mcs, and Mout are memory buffers assigned to the outer bucket ri, the outer cache Cri, the inner bucket s i t h e inne r cac he Csi, 1 i n and t h e output relation r esult, re spectively. A fixed memory s pac e is allocated t o !Vfout, whi l e the remaining memory is dynamically a ll ocated to all other buffers. The size of t h e er r or space is as follows. \ V ith all thes e modifications. we utilize the randomlydrawn samples from the outer r e lati on r to determine the partition intervals in a threestep procedur e In the first step, we assume that the cache s i zes are zero because there is no inf orma46
PAGE 58
,1 I I Main Me m ory Mcs I I I Figure 4.8: D y namic allya llo cated memor y buffers tion o n l o n g li ved tuples befo r e sampling. The sampl e size m i s est imated from an initial buffer a llocation by minimizing Ctotal with Ccache = 0. The m samples are randoml y drawn from r and so rted on the ir V8 Then the initial partition int e r vals are determin e d b y dividing them starting times into n g r oups and choosing the group boundary times as the partition boundaries. In the seco nd step, a n ave rage size o f t h e outer cache Cr for each partition i s estimated from the samp l e set The I/0 cost f o r both the outer and inner caches, with the assumpt ion that C5 has a similar size to that of Cr, is g iven as C { 4n(IOran + (ICrlMcr)IOs eq) if ICrl >Mer cache 0 if I C I < M r cr where t h e fa cto r 4 repr esents the I/0 cost to read and wr i te of both the outer and inner caches. To r etain the caches in memory we in crease the size of cache buffers by r educ ing M e and A ir until a minimal Ctotal i s reach ed. If the new samp l e s ize m i s l arger than what we obtained in the first step m o r e samp les are r andomly drawn from the o uter relation and t h e newlydrawn sa mples a r e so r ted a nd me r ged with the o l d sampl e set. 47
PAGE 59
In the third step t he first partition interval is determined without considering the longlived tuples, since all tuples that overlap this partition interval start during this interval. For each subsequent partition, its corresponding cache size is estimated by counting the number of sampled tuples in the previous partition that overlap the current partition interval. Then J\!Ir is determined by subtracting the cac he sizes from M. The partition boundaries can be estimated according to Mr for eac h partition. Therefore, the partition intervals are determined by considering both the starting times of the sampled tuples and their corres ponding durations. 4.2 Partitioning the Input Relations The outer and inner relations r and s, are respecti\el y partitioned into n outer buckets, ri, and n inner buckets, s i 1 ::; i ::; n. In contrast with the original algorithm we physically store each tuple in the first partition it overlaps since we join t he generated partitions in order from p1 to Pn B eg inning with the second partition, longlived tuples may exist to form the outer cac he Cri and the inner cac he C Si, 2 ::; i ::; n. The cache size for each partition is computed by counting the numb e r of tuples in the previous partition that overlap the current partition int e r val. The cache sizes are required for the dynamic allocation of memor y buffers in the joining phase, although the caches themselves and their allocated buffers a re only physically created before and remov e d after the eq uijoin operation for each partition. 4.3 Joining the Partitions The va lidtime equijoin for eac h partition Pi consists of three s ubeq uijoin opera tions, r i [X)= Csi ri [X)= si, and C1'i [X)= Si Efficiency requir es hi ::; Mr IC?'il ::; J\!Icr, and ICsil ::; .lies to avoid buffer thrashing. We allocate the available memory space 48
PAGE 60
M as illustrated in Figur e 4.8. A fixed number of page s is reserved f or Mout and t h e remaining memor y s pace i s d y n a mi ca ll y a llocated to a ll other buffers to accommodate the buckets and caches whos e s iz es may differ from t heir estimate d sizes wh il e Ms is allocated at l eas t one page. N ote t hat the t uples in the outer c ache s are not match e d with the t uples in the inner caches beca u se t hey would have matched when contained in Mr a nd M s When t her e i s not enough m e m ory space for ri, Cri and C si to be memory resident s imul taneous ly, iVIes is first flush e d to disk and reduced to o ne page; flus hing o f Jv f c r may also be needed if t h e r e i s still not e nough m e mor y to retain ri a nd Cri The techniqu e of d y namicall y a llocatin g t h e memor y space minimi zes the I / 0 evaluat ion cost by effectively retainin g t h e l o n g li ved t uples in memor y In addition t his strategy guarantees that the availabl e memory space be full y us e d during t h e j o ini ng phase. In est imating the total I / 0 eva lu ation cost to determin e the partitio n int erva l s, we assume that the sizes and the d istributions of both r e lations, ove r validtime, are simi lar. The strategy of d y nami cally allocat in g t h e memory space may g rea tly reduc e t hi s r est ric tion Becau se 1\fe i s not full y u se d m ost o f the t im e during the j o inin g phase, we normally hav e r oo m to acco mmoda te any unexpectedl y l arg e Csi. 4.4 Summary Modificat i o n s o n the or i g in a l a lgorithm [SSJ94 ] were d escr ib e d in t hi s chapte r. The va lid starttime is initi a ll y c h ose n as t h e parti t i on va ri a bl e and t he va lidt im e duration is then considered to d ete rmin e t h e partition intervals. T hi s modificati on u ses sa mpl e in f ormation m o r e effect ivel y t han t h e o rigi nal c hr ononco unting method. A new sample f o rmul a tion is u sed to compute b ette r sampl e size w h e n the memor y space i s sca rce compare d to the input r e lati o n sizes Outer a nd inn e r l ongl ived tu49
PAGE 61
ples are retained in outer and inner caches respectively, to avoid duplicate matches between them. The dynamic buffer allocation minimizes the I/0 evaluation cost by trying to retain the outer bucket and both the outer and inner caches in memor y dur ing the joining phase. Experiments designed to test the performance of this modified algorithm are described in the next chapter. 50
PAGE 62
CHAPTER 5 PERFORMANCE The modifications to the o rigin a l t imestamp partitioning algor i thm deve loped by Soo et al. ( SSJ94] w e r e discussed in t h e previous chapter. In thi s chapter, we describe experiments to test the performanc e of the modifica t i ons we did to t h e ori ginal a lgorithm. W e b e gin by describing ex p e riment a l c onditions and paramete r s in Section 5. 1. We compare the performance of the n ew algorithm with that of the orig in a l algorithm in Section 5 2 : Finally a s h ort summa r y is given in Section 5.3. 5.1 Parameters Both the o riginal and modifi ed a lgorithm s, named TP c (temporal partitioning with caching) and TP _EC ( t empora l p a rtitioning with extende d cac hin g), respec tive ly, w e re impl e mented in t h e C programming lan guage The code was d eve loped b y usin g t h e TIMEIT temporal database prototyping pac k age ( KS95] The instances of the input r e lation s wer e ge n erated via a databasegenerating function in T I MEIT. In order to avo id t he duplicate matches betwee n t upl es retained in the o uter and inne r cac h es, we a l so a ppli e d the outer cac h e str ategy to im plementation o f T P _C. Som e common ex p e rim e n t par amete r s u sed in different experiments a r e listed in Table 5. 1. Two equa l s ized 16MB, r e lation s wer e u sed as input r e lati o n s, that i s, 5 1
PAGE 63
Table 5.1: Common Experime n t Parameters Parameter \"alue Page size 1 KB Relation size 16 :VIB Tupl e size 16 B ytes Join attribute size 4 B ytes Nonjoin attribute size 4 B ytes Validtime attribute size 8 B ytes Relation lif espan 100000 c hronon s Memory space .5, 1 2, 4, 8, 16 MB a 1:1 ratio between the input relation s i zes, which is the most unfavorable ratio for partitionbased algorithms (Gra93]. Practically we use ide ntical outer and inner input relations to precisely control the eval uati on. We also var ied t h e memory size from 0.5MB to 16MB to provide an effective ratio of memor y space to the input relati on from 1:32 to 1:1, a ty pical variation that corr es ponds from small to large relative memory settings. In addition, without l osi n g genera lity, we use integer domains for all attributes o f the input rela tions. Two kinds of evaluation costs were cons id ered in the experiments: the I/0 cost of reading and writing of disk files and the cost of operations w ithin main memory. We differentiated between t he se costs in each test. Table 5.2 lists the types of operat i ons and t h e ir correspo ndin g costs in seco nds. Notice that Table 5.2 differentiates between rand om and sequential I/0 operations; a five t im e cost factor is assumed, whic h approx im ates t he average performance of the current l y available hard disks. Variat i on of t hi s ratio may slightly affect the partition in g procedure. Further exploration is necessary on t h e effect of the variation of this ratio and the ratio between I / 0 cost and memory ope r ation cost. 52
PAGE 64
Table 5.2: Generic I/0 and Operation Costs Parameter Value Random I/0 of one page 25 msec Sequential I/0 of one page 5 msec Explicit join atribute comparison 1 JtSeC Time comparison 1 JtSeC Poin te r comparison 1 JtSeC Pointer move 1 JtSeC Tuple mov e 4 JtSeC Explicit attribute move 2 JtSeC Three controlling factors were co nsider ed in the experiments, the memory size the validtime duration and the va lid starttime distribution. The m emory size is a hardware environment factor while the other two concern the states of the input relations. The starting time distribution is also a potential factor that may cause partitioning skew. The combination of these three variables controls the numb e r of longlived tuples that need to be handl ed when joining the generated partitions. In order to simplify the eq uijoin eva luation and f oc u s on the timestamp partitioning, we chose a key join co ndition for the ex plicit join attribute throughout the experim ents. The comparison between t he t imestamp partitioning a nd exp licit partitioning strategies can be found in Soo's dissertation [Soo96]. 5.2 Experiments We com pare the performanc e of TP _EC and TP _C in this sect ion to demonstrate the effect iveness of the new sampling method a nd the dynamic buffer a llocation strategy. The I/0 and inm e m ory costs are eva lu ated and compared between TP _EC a nd TP _C via five tests in the foll owing. 53
PAGE 65
Lon gli ved tuples may sign ifi cantly in crease t h e eva luat i on cost o f timestamp par titionin g a l gorithms, si n ce they ove rlap mul t ipl e partitions and thereby require addi t ion al operati ons. H owever the e ffect of the l onglived tuples may be g reatly reduced if they a r e handled effect ively. T h e va li dtime d uration Dv, is one of the dominant factors that control the numb e r of lon g li ved tuples in each parti t i on. In order to emphasize t hi s effect, we c hose a u niform distribution f or the va lid starttime, 1 5 in the first three tests. 5.2.1 Uniform Distribution of V5 and 10 Chronon Duration The r esults of the firs t test a r e show n i n Figures 5. 1 and 5.2 for I / 0 and inmemory costs, respec t iv ely. Notice that the x axis is loga ri thmically scaled in t h ese figures. In this test, we se t D v = 10 chro nons f o r the input relation r and let t h e memory size vary from 0.5 to 16MB With a uniform distributio n on V s and a co nstan t Dv, the ave r age c ache size f o r eac h partition is about 2KB, which is only a small fraction of t h e fix ed cac h e buff e r (32 KB ) of TP C; the refore, w e wer e not surprised to obtain similar p e rformance for both the a lgorithms for either I/0 o r inme m o r y cost wh e n the memory size i s relative large. When t h e memory s i ze i s small (0.5MB), the numbe r of part i tio ns is l a r ge. At t h e same samplin g cost, we n eed a smaller reserved error space usin g t h e new formulation than t o the o riginal formulation. This r ed uction in error space leads to l a r ge r buck e t s iz es, and con sequently fewer parti t i ons for TP _EC than for TP _C. At J\1 = 0 5, t h e difference in I / 0 cost i s about 175 seconds between TP _C and TP _EC, t hat i s, a 20% overa ll improveme n t for TP _EC. As the memory size in c r eases, t h e improve m e n t is r educe d to o nl y 3% (a 20 second difference) which r esults from dynamic allocat ion of t h e inn e r bucket buffer. The difference in in m emor y cost reaches its maximum (3 seco n ds about 4%) at Af = 0.5MB due to inmemory sorting on larger buckets for TP EC, negligible compared with the I/0 54
PAGE 66
u Ill (I) .u Ul 0 u 0 ...... .... u Ill (I) .u Ul 0 u 850 800 750 tp_c ...__ tp_ec +0.5 1 2 4 8 16 Size of Memory Space (MB) Figure 5 1: I / 0 cost for uniform V s a nd D v = 10 c hron ons 95,, 90 tp_ec + tp_ c +0.5 1 2 4 8 1 6 size of Memory Space (MB) Figure 5.2: Inm e mory cost f o r uniform V5 and D v = 1 0 chronons improvement. For small e r memor y s pace we need to s plit t h e input r e lation s into more partit i ons, which requires more random I/Os. The r e fore the I/0 cost in c r eases as the m e m o r y size de c reases. H owever, the total I/0 cost for TP _.EC is o nl y about 10 % more at M = 0.5MB compared wit h that at l\I = 16MB, w hil e t h e correspond ing I/0 cost in crement i s about 50% f o r TP _c. In co ntrast to t h e I / 0 cos t t h e inmemory 55
PAGE 67
u ., u Ill 0 (.) 0 .... H 1100 1050 1000 950 900 850 800 750 700 650 600 0 5 tp_c <> tp_ec <... ____ ___________ _ ..,. ____ _ _ _________ ..,. ________________ ... ______ ___________ 1 2 4 8 16 Size o f Memory Space (MB) Figure 5.3: I / 0 cost f o r un i f o rm V5 a nd D v = 5 00 c hr ono n s eva luati o n cost in c r ease s a s m e m o r y s i ze i n c r eases T h e in c r ease in in m e m o ry cost f o r larg e r m emor y s p ace is m a inl y du e to t h e in c r e a se d costs for in me m o r y sorti n g on l a rger oute r bu c k ets. 5.2. 2 Uniform Distribution of Vs and 500 Chronon Duration In the second t est, we in c r ease d D v t o 500 chro nons w hil e m a int a inin g a ll othe r c onditio n s in t h e fir s t test. The r esults of this tes t a r e s hown in Figures 5.3 and 5.4 f o r t h e 1/0 a nd inm em or y costs, r espect ively. Tu p les with l ong durati o n h ave a g reater c hance o f ove rl a pping multipl e par t i t i o n s. F o r a s m all m e m o r y s p ace, w h e r e p artit i ons a r e n ecess aril y s m a ll t h e lik e lih oo d o f overl app in g m a n y p a r titions i s in c r eased f or t h ese tuples. TP _EC dy n amica ll y a ll oca t es t he ava il a bl e m e m o r y s p ace so t hat i t h a ndles l o n g liv e d t u p l es m o r e effect ivel y t h a n TP _C; 3 3 % impr ove m e n t (35 6 seco nd s) is ac hi eve d w h e n t h e m e m o r y s ize i s small (0.5M B ) This dec r ease i n I/0 cost i s t h e combin e d saYings from t h e new samplin g meth od a nd t h e dy n a mi c b uff e r a llocati o n Whe n me m o r y i s l a r ge (16M B ), however T P E C s h ows a m e r e 3 % i m p rove m e n t over TP_C. 5 6
PAGE 68
104,, 102 u 100 41 !J) 98 96 u c 94 41 92 :& k 90 \..88 tp_ec + tp_ c +0. 5 1 2 4 8 16 Size of Memory Space (MB) Figure 5.4: Inmemor y cost for uni form 1 's and D v = 500 chronons A tradeoff for t h e im provement i n I / 0 cost for TP _EC is a s li g h t increase in its inm emory cost s hown in Figure 5.4 For a unif or m distribution on Vs a nd a co nstan t D v for a ll t uples, t h e cache s i ze for each partition is approximately a co nstant a nd is ind ependent of t h e memor y size. A s M becom es small, a small e r amount of m e m o r y is available for the oute r bucket in TP _EC, s ince a certain amount of memory may be assigned to the cache buffer s As a result TP _EC has to handle more partitions a nd incurs mo r e inm emo r y cost t han does TP _c. H owever the incr ease in inmemor y cost for TP _EC i s offset by its dec r ease in I / 0 cos t. For e xampl e, at lvf = 1MB the difference in inm emory costs is 5 seconds. This increment in inm emory cost for TP _EC is only abo u t 5% o f the correspondin g improvement in I / 0 c ost. 5 .2.3 Uniform Distribution of Vs and m emory Size of 4MB In the t hird test we fixed the m emory size at m = 4M B and let D v vary f r om 1 to 30,000 chr o nons. The I / 0 results of this test are shown in Figure 5.5 wher e t h e va lidt im e duration i s a l so plotted in a logari t hmi c sca l e vVe set a medium memor y s i ze at 4 MB to r ed uce t h e effect of the new samp ling f o rmul ation a nd con cent rate o n 57
PAGE 69
u Ql 1400 (/) 1200 ... (/) 0 u 1000 0 .... H 800 600 ... l 10 100 500 lK 5K lOK 30K (Chro n on) Figure 5.5: I/0 cost for uniform Vs and Jvf = lMB testing the effect of dynamic memory allocation strategy. We chose 30,000 chronons as the upper timestamp duration bond, since for this long Dv, the outer cache size for each partition is about 6MB and is a multipl e o f the average size o f outer buckets which is about 2.3MB in this test. The difference in I/0 cost between TP _C and TP _EC increases as D v incr eases, until D v reaches 5,000 chronons. At Dv = 5, 000 chronons t he cache size is about 860KB, much larger than the fixed cache buffer (32K B ) of TP _C. TP _EC adjusts memor y space to retain this large cache in memory by r educ in g the outer bucket size and increasing the number of partitions. A 8% improvem e n t in I / 0 cost is achieved for TP _EC at this point. When Dv is in c reas ed to 10,000 chronons, however the cache is too large to be retained in memory. TP _EC flushe s t h e cac he and reduc es the cache buffer to a fixed 32KB. Therefore the I / 0 eva lu ation costs become similar for both TP _EC and TP _C. As Dv cont i nues to incr ease, the cac h e s iz es incr ease corresponding ly, and the improvem ent in I / 0 cost for TP _EC incr ease s again, but its re l ative percentage i s not high . \t Dv = 30 000 the im provement is about 4%. 58
PAGE 70
300 0 250 . ... 150 0 E
PAGE 71
0
PAGE 72
u
PAGE 73
obtained when t h e average cac h e s ize i s clos e to t h e buc ket s i ze, when D v = 100 or 20 0 c hron o ns As D v beco mes long e r the average cac h e s i ze b eco m es l a rg e r than t h e bucket size a nd the I / 0 cost for TP _EC decreases relativ e to t he correspo ndin g I/0 cost of TP c s ince TP _EC p e rforms mor e effect ivel y than TP _c on larg e cac h es. A 6% improv e m ent for TP EC i s achieved at D v = 1000 c hronon s. 5.3 Summary Experiments wer e described in this chapte r to test the modifications we did for the timestamp partitioning a lgorithm of Soo et al. [ SSJ94]. The imp l ementat i o n TP __EC, was execute d und er va r y ing m e m ory allo catio n s and timestamp durati o n s. T P _EC outperforms the o riginal impl eme ntation when main memory is sca rce relati ve to the input r e l ation sizes, especiall y when the cac h e s iz e is significant. This impro ve m ent is achieved without sacrificing the effectiveness of partitionbased algorithms in larg e memory settings. 62
PAGE 74
CHAPTER6 CONCLUSIONS AND FUTURE WORK Equijoins are important in database query processing due to their prevalence and potential cost. Validtime equijoins play the same role in validtime databases as their conventional counterparts do in conventional databases to recombine information sep arately stored in database. In addition, validtime equijoin operator is more difficult to evaluate, since it typically requires inequality predicates on timestamp attributes, and such predicates are not efficiently supported by conventional algorithms. In this we investigated a specific class of partitionbased implementations, the timestamp partitioning strategy, to evaluate the validtime equijoin. We modified an existing algorithm [SSJ94], which was shown to have advantages over other timestamp partitioning algorithms, to provide better performance when little memory space is available compared with the input relation sizes. Our experimental results show the effectiveness of the modifications we did to the original algorithm. This modified algorithm does reduce the evaluation cost when input relations are large and memory space is relatively small, and does so without sacrificing the effectiveness of the partitionbased algorithm when the main memory size is relatively large. The contributions of this res ea rch are as follow. Any correlation between the starting times of the samp led tuples and t h e ir associated duration are us e d by the new algorithm to reduce the likelihood that 63
PAGE 75
buckets overflow thei r a llo cated buffers. The algorithm first constructs an initial set of partition intervals using the starting times of the samp l ed tuples, and then adjusts these int ervals b y considering the duration of these tuples. We derived a new samp ling formula t ion which reduces the number of samp les to draw when the input relations are l arge and on l y l im i ted buffer spa ce i s avai l ab le. This reduction is achieved w i thout l oss of accuracy when determining the partition intervals. Furthermore, this sampling formulation is not limit ed to the scope of this impleme ntation and it can be used in any partitioning procedures that employ sam pling technique. We extended the original caching strategy to retain longlived t uples from t h e outer relation in a separate tuple cache. This strategy eliminates unnecessary comparisons and matches between outer a nd inner l ong li ved tuples and both the I / 0 and inmemory costs may be reduced if the cache sizes are larg e com pared with their associated bucket s izes. Without duplicat e matches in the joining result, t hi s new caching strategy guarantees that the joined output re l ation is coa l esced if both input relations are initia ll y coalesced [BSS96 Soo96]. We dynamically a ll ocate buff e r space whereas the origina l method used a fixed buffer allocation. This modifi ca tion reduces I/0 cost by retaining the outer bu c ket and both the outer and inner caches in memory when joining the par titions, until the cache s izes are too l arge to be retained in the memory space. This method effectivel y utilizes the avai l able memory space during joining t h e partitions. In addition, t his modification may reduce the restriction we assumed throughout o ur expe rim ents that input relations must be simi lar in r e lation sizes and timestamp distributions. 64
PAGE 76
For f u ture work we plan to e liminate some of t he assumptions and s im plifications we mad e in our study. Vve chose a 99% co nfid ence in derivin g t h e new samplin g formu l ation to statistically minimi ze the c h a nc e o f buffer overflow A low er co nfid ence may l ead to a hi g h e r c han ce of buff e r ove rflow s, but a s mall er sam ple s i ze. Furthe r study i s n ee d e d to determine the tradeoff. In the description of the modifi e d a l gorithm, we a l so assumed that the inn e r input r e lation i s distributed s imil a rl y to t h e outer r e l ation. Further stud y i s n ecessary to d eter min e the effect o f the difference b etween t h e input r e lati ons w i t h r espect to t h e d y nami c a ll ocation strategy. T h e var iati ons of t h e ratios betwee n the I / 0 a nd m e m o r y costs and between the random a nd seq u e n tial I/0 cos t s may affect t h e eva luation cost o f t h e algor ithm. More stud y on t hi s s u b j ect i s also n ecessary to justif y the appl icab ilit y of the t imestamp partitioning strategy. Two of the modifications we listed previously a r e no t e xplicitly teste d thro u g h experiments. Co n s id e ring the temporal correlation betwee n the sampl es start times and duration i s inte nd e d t o r educe t he chance o f buffer ove rfl ow ing. Furth e r exper im ent i s need e d to test t h e modified algor i t hm on input re l ations in w hi ch tuples a r e t e mporall y corre la ted. The oute r cac hing str ategy is assumed to improve the eva lu atio n b y avoid in g duplicat e matc hes betwee n l o n g li ved t uples in t h e outer a nd inner relations. Additi on a l ex p er im ents are a l so req uir ed if we want to quantitativel y jus tify this impr ove m ent. F o r our l ong te rm goa l we s h o uld stud y the a pplicabili ty o f these modi fications to oth e r parti t i o nba se d algorithms. In this t h esis, we focu sed on timestamp partitioning for the va lidtim e eq uijoin. All t h e modificat i ons listed above may be appl i ed to partitionb ase d join a l gorithm for spati al datab ases Two of t h e modifications t h e new samplin g m ethod a nd t h e dynami c m e mory a llocati on strategy, may be a l so appl i ed to co nvention a l parti t i o nbased a l gor i thms 65
PAGE 77
LIST OF REFERENCES [Bra94] K. Bratbrgseng e n. Hashing methods and relational algebra operations. In Proceedings o f the I n ternational Conference on Very Large Databases, 1994. [BSS96] M. H. Bohlen, R. T. Snodgr ass, and M. D. Soo. Coa l escing in temporal databases. In Proceeding s of the Int ernati onal Conference on Very Large Databases September 1996. [Cod70] E. F. Codd. A relational model of data for larg e share d data banks. Communications o f ACM, 13(6 ) :377387 June 1970 [Cod72 ] E. F. Codd. Further normalization of data base relational model. In Volume 6 of Courant Computer Symposia Series pages 6598. Prentice Hall, Englewood Cliffs N .J. 1972. [Con80 ] W. J. Conover. Practical Nonparametric Stat is tics. John Wiley & Sons Texa s T e ch University, second edition, 1980. [DK0+84] D. J. DeWitt, R. Katz, F Olken L. Shapiro Yl. Stonebraker and D. Wood. Implementation techniques for main memory database systems. In Proceedings of ACM SIGMOD Conference, pages 1 8, New York 1984. [DNS91] D. DeWitt, J. Naughton, and D. Schn e ider. An evaluation of nonequijoin a lgori thms. In Proceedings of the 17th Int e rnational Conference on Very Large Databases pages 443 452, Septemb e r 1991. [Doz92] J. Dozier. Access to data in nasa 's earth observation sys tems. In Pro ceedings of ACM SIGMOD Conference pages 443 452. ACM Septe mb e r 1992. [DS93] C. E. Dyreson and R. T. Snograss. Timestamp semantics and representa tion. Information Systems 18(3), September 1993. [EN94] R. Elmasri and S. B. Navat he. Fundam entals of D atabase Syst e ms. The Benjamin/Cummings Publishing Company In c., U niv e rsit y of Texas at Arling t on second edition, 1994. [Gra93] G. Graefe. Query eva luation te c hniques for larg e databases. ACM Com put ing Surveys, 25(2):73 170 June 1993. 66
PAGE 78
[JCG+92] C. S. Jensen, J. Clifford, S K. Gadia A Segev and R. T. Snodgrass. A glossary of temporal database concepts. AC!vf SIGMOD Record 21(3):35 43 September 1992. [JSS95] C. S. Jensen, R. T Snodgrass, and M. D. Soo. The tsq l 2 data model. In R. T. Snodgrass editor, The TSQL2 Temporal Query Language pages 156240. K lu wer Academic Publishers 1995. [Kim80] W Kim. A new way to compute the product and join of relations. In Pro ceedings of ACM SIGMOD pages 179187 New York 1980. [Knu73] D. Knuth. The Art of Computer Programming. Vol. III, Sorting and Searching. AddisonWesley Reading Mass. 1973. [Kol41] A. Kolmogorov. Confidence limits for an unknown distribution function Annuals of Math e matical Statistics 12:461 463 1941. [KS95] N Kline and M. D. Soo Timeit: the time integrated testbed Version 0.1 available via anonymous ftp at ftp. cs. arizona. edu, 1995. [LM91] T. Y. Leung and R. R. Muntz. Temporal query processing and optimiza tion in multiprocessor database machines. Technical report, Los Angeles November 1991. [LO T94] H. Lu, B C. Ooi, and K.L. Tan On spatiall y partitioned tempora l join In Proceedings of the Conference on Very Large Databases pages 546557, Chile, 1994 [ME92] P Mishra and M. H. Eich. Join processing in relational databases A CM Computing Surveys 24(1):63113, March 1992 [MS93] J. Mielton and A R. Simon. Understanding the New SQL: A Compl e t e Guide. Morgan Kaufmann Publisher, Inc ., 1993 [SA86] R. T. Snodgrass and I. Ahn. Temporal databases. IEEE Compute r 19(9) :3542, September 1986. [SDS95] M D. Soo, C. E. Dyrcson and R. T Snodgrass. Tempora l data typ e s. In R. T. Snodgrass editor The TSQL2 Temporal Query Language, pages 2264. Kluwer Academic Publishers 1995. [SG89] A. Segev and H. Gunadhi. Eventjoin optimization in t e mporal relational databases. In Proceedings of th e Inte rnational Confe r e nce on Very Larg e Databases Amsterdam 1989. [Soo96] :M.D. Soo. Constructing a Temporal Databas e Manag e m ent Sys t e m. PhD thesis The Universit y of Arizona 1996 67
PAGE 79
[SSJ94] M. D. Soo R. T. Snodgrass, a nd C. S. Jenson. Efficient eval u ation of the va lidtime natura l join. In Proceedings of the International Conference on D ata Engineering, pages 282 292, 1994. [SSU91] A. Si lb erschatz, M. Stonebreack e r a nd J Cllman. Database systems: Ac hie ve m ents and opportunities. Communications of ACM, 34(10):110 October 1991. [ T sa9 6 ] W Y. Tsang Sort m e r ge a l go ri t hm for t emporal jo in eva luati o n Maste r 's thes i s, Un i versity o f South Florida Tampa, Fl orida, 1996. [ZG90] H Zeller and J. Gray. An adaptive hash join a l gorithm for multiuser e nvironm ents. In Proceedings of Conference on V ery Larg e Databases, pages 186 197 Brisbane Aust r a lia, 1990 68
