USF Libraries
USF Digital Collections

Filecules

MISSING IMAGE

Material Information

Title:
Filecules a new granularity for resource management in grids
Physical Description:
Book
Language:
English
Creator:
Doraimani, Shyamala
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla
Publication Date:

Subjects

Subjects / Keywords:
Caching
Data management
File grouping
Grid
Scientific computing
Workload characterization
Dissertations, Academic -- Computer Science -- Masters -- USF   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Abstract:
ABSTRACT: Grids provide an infrastructure for seamless, secure access to a globally distributed set of shared computing resources. Grid computing has reached the stage where deployments are run in production mode. In the most active Grid community, the scientific community, jobs are data and compute intensive. Scientific Grid deployments offer the opportunity for revisiting and perhaps updating traditional beliefs related to workload models and hence reevaluate traditional resource management techniques. In this thesis, we study usage patterns from a large-scale scientificGrid collaboration in high-energy physics. We focus mainly on data usage, since data is the major resource for this class of applications. We perform a detailed workload characterization which led us to propose a new data abstraction, filecule, that groups correlated files. We characterize filecules and show that they are an appropriate data granularity for resource management.^ In scientific applications, job scheduling and data staging are tightly coupled. The only algorithm previously proposed for this class of applications, Greedy Request Value (GRV), uses a function that assigns a relative value to a job. We wrote a cache simulator that uses the same technique of combining cache replacement with job reordering to evaluate and compare quantitatively a set of alternative solutions. These solutions are combinations of Least Recently Used (LRU) and GRV from the cache replacement space with First-Come First-Served (FCFS) and the GRV-specific job reordering from the scheduling space. Using real workload from the DZero Experiment at Fermi National Accelerator Laboratory, we measure and compare performance based on byte hit rate, cache change, job waiting time, job waiting queue length, and scheduling overhead.^ Based on our experimental investigations, we propose a new technique that combines LRU for cache replacement and job scheduling based onthe relative request value. This technique incurs less data transfer costs than the GRV algorithm and shorter job processing delays than FCFS. We also propose using filecules for data management to further improve the results obtained from the above LRU and GRV combination. We show that filecules can be identified in practical situations and demonstrate how the accuracy of filecule identification influences caching performance.
Thesis:
Thesis (M.S.C.S.)--University of South Florida, 2007.
Bibliography:
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
Statement of Responsibility:
by Shyamala Doraimani.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 80 pages.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001925915
oclc - 191803722
usfldc doi - E14-SFE0001905
usfldc handle - e14.1905
System ID:
SFS0026223:00001


This item is only available as the following downloads:


Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001925915
003 fts
005 20080204135424.0
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 080204s2007 flu sbm 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0001905
035
(OCoLC)191803722
040
FHM
c FHM
049
FHMM
090
QA76 (ONLINE)
1 100
Doraimani, Shyamala.
0 245
Filecules :
b a new granularity for resource management in grids
h [electronic resource] /
by Shyamala Doraimani.
260
[Tampa, Fla] :
University of South Florida,
2007.
3 520
ABSTRACT: Grids provide an infrastructure for seamless, secure access to a globally distributed set of shared computing resources. Grid computing has reached the stage where deployments are run in production mode. In the most active Grid community, the scientific community, jobs are data and compute intensive. Scientific Grid deployments offer the opportunity for revisiting and perhaps updating traditional beliefs related to workload models and hence reevaluate traditional resource management techniques. In this thesis, we study usage patterns from a large-scale scientificGrid collaboration in high-energy physics. We focus mainly on data usage, since data is the major resource for this class of applications. We perform a detailed workload characterization which led us to propose a new data abstraction, filecule, that groups correlated files. We characterize filecules and show that they are an appropriate data granularity for resource management.^ In scientific applications, job scheduling and data staging are tightly coupled. The only algorithm previously proposed for this class of applications, Greedy Request Value (GRV), uses a function that assigns a relative value to a job. We wrote a cache simulator that uses the same technique of combining cache replacement with job reordering to evaluate and compare quantitatively a set of alternative solutions. These solutions are combinations of Least Recently Used (LRU) and GRV from the cache replacement space with First-Come First-Served (FCFS) and the GRV-specific job reordering from the scheduling space. Using real workload from the DZero Experiment at Fermi National Accelerator Laboratory, we measure and compare performance based on byte hit rate, cache change, job waiting time, job waiting queue length, and scheduling overhead.^ Based on our experimental investigations, we propose a new technique that combines LRU for cache replacement and job scheduling based onthe relative request value. This technique incurs less data transfer costs than the GRV algorithm and shorter job processing delays than FCFS. We also propose using filecules for data management to further improve the results obtained from the above LRU and GRV combination. We show that filecules can be identified in practical situations and demonstrate how the accuracy of filecule identification influences caching performance.
502
Thesis (M.S.C.S.)--University of South Florida, 2007.
504
Includes bibliographical references.
516
Text (Electronic thesis) in PDF format.
538
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
500
Title from PDF of title page.
Document formatted into pages; contains 80 pages.
590
Adviser: Adriana Iamnitchi, Ph.D.
653
Caching.
Data management.
File grouping.
Grid.
Scientific computing.
Workload characterization.
690
Dissertations, Academic
z USF
x Computer Science
Masters.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.1905



PAGE 1

Filecules: A New Gran ularit y for Resource Managemen t in Grids b y Sh y amala Doraimani A thesis submitted in partial fulllmen t of the requiremen ts for the degree of Master of Science in Computer Science Departmen t of Computer Science and Engineering College of Engineering Univ ersit y of South Florida Ma jor Professor: Adriana Iamnitc hi, Ph.D. Gabriele Garzoglio, Ph.D. Ken Christensen, Ph.D. Date of Appro v al: Marc h 26, 2007 Keyw ords: Cac hing, Data Managemen t, File Grouping, Grid, Scien tic Computing, W orkload Characterization c r Cop yrigh t 2007, Sh y amala Doraimani

PAGE 2

T ABLE OF CONTENTS LIST OF T ABLES iii LIST OF FIGURES iv ABSTRA CT vii CHAPTER 1 INTR ODUCTION 1 CHAPTER 2 W ORKLO AD DESCRIPTION 5 2.1 Data Managemen t in Grids 5 2.2 Data in the DZero Exp erimen t 5 2.3 The DZero W orkload 7 2.4 Num b er of Jobs P er Da y 8 2.5 Data Accessed P er Da y 11 2.6 Summary 11 CHAPTER 3 RELA TED W ORK 14 3.1 W orkload Characterization 15 3.2 Data Grouping 17 3.3 Stac k Depth Analysis 18 3.4 Cac hing in W eb 19 3.5 Cac hing in Data Grids 20 CHAPTER 4 FILECULES AND THEIR CHARA CTERISITICS 22 4.1 Size Characteristics 23 4.1.1 File Size 24 4.1.2 Filecule Size 27 4.2 P opularit y Characteristics 27 4.2.1 File P opularit y 27 4.2.2 Filecule P opularit y 29 4.3 Lifetime Characteristics 30 4.3.1 File Lifetime 30 4.3.2 Filecule Lifetime 32 4.4 Correlation Bet w een Size, P opularit y and Lifetime 33 4.5 Impact on Resource Managemen t 34 4.6 Summary 34 CHAPTER 5 CA CHING 35 5.1 Stac k Depth Analysis 36 5.2 Cac he Replacemen t and Job Sc heduling Algorithms 37 5.2.1 Least Recen tly Used Cac he Replacemen t Algorithm 38 5.2.2 First-Come First-Serv ed Job Sceduling Algorithm 38 i

PAGE 3

5.2.3 Greedy Request V alue Algorithm 39 5.2.4 Queue F reezing 40 5.3 Metrics 41 5.4 Exp erimen t Setup 43 5.5 Exp erimen tal Results 43 5.5.1 Byte Hit Rate 44 5.5.2 P ercen tage of Cac he Change 48 5.5.3 Job W aiting Time and Queue Length 50 5.5.4 Sc heduling Ov erhead 55 5.6 Summary 61 CHAPTER 6 IMP A CT OF HISTOR Y WINDO W ON FILECULE IDENTIFICATION 63 6.1 Filecule LR U Using 1-mon th Windo w 64 6.2 Impact of Windo w Size in Filecule LR U 68 6.3 Summary 70 CHAPTER 7 CONCLUSION 72 REFERENCES 74 APPENDICES 78 App endix A Probabilit y Distributions 79 A.1 Log Normal Distribution 79 A.2 Log Logistic Distribution 79 A.3 Generalized P areto Distribution 79 A.4 Hyp er Exp onen tial Distribution 80 A.5 Extreme V alue Distribution 80 A.6 Zipf Distribution 80 ii

PAGE 4

LIST OF T ABLES T able 2.1 Data in W orkload from Jan 2003 to Mar 2005 7 T able 2.2 Characteristics of T races Analyzed P er Data Tier 8 T able 2.3 Data Requested P er Da y 9 T able 4.1 Statistics of Size, P opularit y and Lifetime 23 T able 4.2 Co ecien ts of Correlation for File Prop erties 33 T able 4.3 Co ecien ts of Correlation for Filecule Prop erties 33 T able 5.1 Stac k Depth Analysis Statistics 36 T able 5.2 Av erage Num b er of Files in Cac he 37 T able 5.3 Summary of Results on Cac hing and Sc heduling Algorithms 62 T able 6.1 Comparison of 1-mon th Filecules and Optimal Filecules 64 T able 6.2 Comparison of Filecules Iden tied in 2 Consecutiv e 1-mon th Windo ws 64 T able 6.3 Comparison of Byte Hit Rate of Filecule LR U Using 1-mon th Windo w with File LR U and Filecule LR U Using Optimal Filecules 65 T able 6.4 Comparison of P ercen tage of Cac he Change of Filecule LR U Using 1-mon th Windo w with File LR U and Filecule LR U Using Optimal Filecules 67 T able 6.5 Comparison of Byte Hit Rate of Filecule LR U Using 6-mon th Windo w and Filecule LR U Using 1-mon th Windo w 69 T able 6.6 Num b er of Jobs P er Mon th 70 iii

PAGE 5

LIST OF FIGURES Figure 1.1 The System Conguration 2 Figure 2.1 Num b er of Jobs P er Da y 8 Figure 2.2 Num b er of Files P er Job 9 Figure 2.3 Distribution of the Num b er of Files P er Job 10 Figure 2.4 Num b er of T otal Files and Distinct Files Requested P er Da y 10 Figure 2.5 Distribution of T otal Num b er of Files P er Da y and Distinct Files P er Da y 11 Figure 2.6 T otal Num b er of Bytes P er Da y and Distinct Bytes P er Da y 12 Figure 2.7 Distribution of T otal Num b er of Bytes P er Da y and Distinct Bytes P er Da y 12 Figure 4.1 Num b er of Files P er Filecule 23 Figure 4.2 File Size Distribution 24 Figure 4.3 File Size vs. Rank 25 Figure 4.4 File Size Distribution P er Data Tier 25 Figure 4.5 Filecule Size Distribution 26 Figure 4.6 Filecule Sizes in Decreasing Order 26 Figure 4.7 File P opularit y Distribution 28 Figure 4.8 File P opularit y vs. Rank 28 Figure 4.9 Filecule P opularit y Distribution 29 Figure 4.10 Filecule P opularit y vs. Rank 30 Figure 4.11 File Lifetime Distribution 31 Figure 4.12 File Lifetime vs. Rank 31 Figure 4.13 Filecule Lifetime Distribution 32 Figure 4.14 Filecule Lifetime vs. Rank 33 iv

PAGE 6

Figure 5.1 Stac k Depth Analysis of File Requests 36 Figure 5.2 Av erage Byte Hit Rate 44 Figure 5.3 Byte Hit Rate for Cac he Size of 1 TB 45 Figure 5.4 Byte Hit Rate for Cac he Size of 5 TB 46 Figure 5.5 Byte Hit Rate for Cac he Size of 10 TB 46 Figure 5.6 Byte Hit Rate for Cac he Size of 25 TB 47 Figure 5.7 Byte Hit Rate for Cac he Size of 50 TB 47 Figure 5.8 Av erage P ercen tage of Cac he Change for Dieren t Cac he Sizes 48 Figure 5.9 P ercen tage of Cac he Change for Cac he Size of 1 TB 49 Figure 5.10 P ercen tage of Cac he Change for Cac he Size of 5 TB 49 Figure 5.11 P ercen tage of Cac he Change for Cac he Size of 10 TB 50 Figure 5.12 P ercen tage of Cac he Change for Cac he Size of 25 TB 51 Figure 5.13 P ercen tage of Cac he Change for Cac he Size of 50 TB 51 Figure 5.14 Av erage Job W aiting Time for Dieren t Cac he Sizes 52 Figure 5.15 Job W aiting Time for Cac he Size of 1 TB 53 Figure 5.16 Job W aiting Time for Cac he Size of 5 TB 53 Figure 5.17 Job W aiting Time for Cac he Size of 10 TB 54 Figure 5.18 Job W aiting Time for Cac he Size of 25 TB 54 Figure 5.19 Av erage Queue Lengths for Dieren t Cac he Sizes 55 Figure 5.20 Queue Length for Cac he Size of 1 TB 56 Figure 5.21 Queue Length for Cac he Size of 5 TB 56 Figure 5.22 Queue Length for Cac he Size of 10 TB 57 Figure 5.23 Queue Length for Cac he Size of 25 TB 57 Figure 5.24 Av erage Sc heduling Ov erhead for Dieren t Cac he Sizes 58 Figure 5.25 Sc heduling Ov erhead for Cac he Size of 1 TB 59 Figure 5.26 Sc heduling Ov erhead for Cac he Size of 5 TB 59 Figure 5.27 Sc heduling Ov erhead for Cac he Size of 10 TB 60 Figure 5.28 Sc heduling Ov erhead for Cac he Size of 25 TB 60 v

PAGE 7

Figure 5.29 Sc heduling Ov erhead for Cac he Size of 50 TB 61 Figure 6.1 Dierence in Byte Hit Rate Bet w een Filecule LR U with 1-mon th Windo w and File LR U 66 Figure 6.2 Dierence in Byte Hit Rate Bet w een Filecule LR U with Optimal Filecules and Filecule LR U with 1-mon th Windo w 66 Figure 6.3 Dierence in P ercen tage of Cac he Change Bet w een Filecule LR U with 1-mon th Windo w and File LR U 67 Figure 6.4 Dierence in Byte Hit Rate Bet w een Filecule LR U with 6-mon th Windo w and 1-mon th Windo w 68 Figure 6.5 Dierence in P ercen tage of Cac he Change Bet w een Filecule LR U with 6-mon th Windo w and 1-mon th Windo w 70 vi

PAGE 8

FILECULES: A NEW GRANULARITY F OR RESOUR CE MANA GEMENT IN GRIDS Sh y amala Doraimani ABSTRA CT Grids pro vide an infrastructure for seamless, secure access to a globally distributed set of shared computing resources. Grid computing has reac hed the stage where deplo ymen ts are run in pro duction mo de. In the most activ e Grid comm unit y the scien tic comm unit y jobs are data and compute in tensiv e. Scien tic Grid deplo ymen ts oer the opp ortunit y for revisiting and p erhaps up dating traditional b eliefs related to w orkload mo dels and hence reev aluate traditional resource managemen t tec hniques. In this thesis, w e study usage patterns from a large-scale scien tic Grid collab oration in high-energy ph ysics. W e fo cus mainly on data usage, since data is the ma jor resource for this class of applications. W e p erform a detailed w orkload c haracterization whic h led us to prop ose a new data abstraction, le cule that groups correlated les. W e c haracterize lecules and sho w that they are an appropriate data gran ularit y for resource managemen t. In scien tic applications, job sc heduling and data staging are tigh tly coupled. The only algorithm previously prop osed for this class of applications, Greedy Request V alue (GR V), uses a function that assigns a relativ e v alue to a job. W e wrote a cac he sim ulator that uses the same tec hnique of com bining cac he replacemen t with job reordering to ev aluate and compare quan titativ ely a set of alternativ e solutions. These solutions are com binations of Least Recen tly Used (LR U) and GR V from the cac he replacemen t space with First-Come First-Serv ed (F CFS) and the GR V-sp ecic job reordering from the sc heduling space. Using real w orkload from the DZero Exp erimen t at F ermi National Accelerator Lab oratory w e measure and compare p erformance based on b yte hit rate, cac he c hange, job w aiting time, job w aiting queue length, and sc heduling o v erhead. vii

PAGE 9

Based on our exp erimen tal in v estigations, w e prop ose a new tec hnique that com bines LR U for cac he replacemen t and job sc heduling based on the relativ e request v alue. This tec hnique incurs less data transfer costs than the GR V algorithm and shorter job pro cessing dela ys than F CFS. W e also prop ose using lecules for data managemen t to further impro v e the results obtained from the ab o v e LR U and GR V com bination. W e sho w that lecules can b e iden tied in practical situations and demonstrate ho w the accuracy of lecule iden tication inruences cac hing p erformance. viii

PAGE 10

CHAPTER 1 INTR ODUCTION Sustained eort is ongoing to supp ort v arious scien tic comm unities and their largescale data-sharing and data-analysis needs through a distributed, transparen t infrastructure. This eort is part of a researc h area kno wn as Grid computing, an area whose primary ob jectiv e is to pro vide an infrastructure for seamless and secure access to a globally distributed set of shared soft w are and hardw are resources. This infrastructure necessarily includes comp onen ts for le lo cation and managemen t as w ell as for computation and data transfer sc heduling. There is little information a v ailable on the sp ecic usage patterns that emerge in these data-in tensiv e, scien tic comm unities. This researc h analyzes usage patterns in a t ypical Grid comm unit y follo w ed b y an exp erimen tal in v estigation of ho w these patterns can b e exploited for data managemen t. W e analyze the c haracteristics of a pro duction mo de datain tensiv e high-energy ph ysics collab oration, the DZero Exp erimen t [2 ], hosted at F ermi National Accelerator Lab oratory (F ermiLab). Figure 1.1 sho ws 3 sites that participate in the DZero Exp erimen t. Eac h site has storage and computing resources. The storage resources are managed b y Storage Resource Managers (SRM), whic h tak e care of transferring data from storage at other sites to execute a job. SRM has a disk cac he where the data is stored. In this thesis, w e discuss tec hniques that reduce the amoun t of data transferred b et w een sites and also increase the throughput at eac h site. In the DZero Exp erimen t, users submit requests to analyze data to obtain ph ysics measuremen ts. Hundreds of jobs are submitted eac h da y b y scien tists from all o v er the w orld. These jobs p erform read-only op eration on the data. Our analysis sho ws that the usage of data in the DZero Exp erimen t exhibit go o d temp oral lo calit y and that scien tic 1

PAGE 11

Figure 1.1 The System Conguration data usage translates in to requests for groups of correlated les. This fact suggests a new gran ularit y for data managemen t, le cules [l'-eh-kyul'], whic h can b e exploited to design solutions that signican tly outp erform the traditional solutions based on single le gran ularit y W e dene a lecule as an aggregate of one or more les in a denite arrangemen t held together b y sp ecial forces related to their usage. W e study the prop erties most relev an t to data managemen t using lecules, namely size, p opularit y and lifetime. W e compare these c haracteristics with the corresp onding c haracteristics of data from traditional systems (suc h as le systems and the In ternet). W e also study the correlation b et w een these prop erties in order to see if size or lifetime of lecules can b e used to mak e decisions ab out data managemen t. F or example, if there is strong correlation b et w een size and p opularit y cac he replacemen t algorithms that evict les based on le sizes can b e used in these systems. The nature of jobs in Grid are suc h that all the data requested b y a jobs needs to b e lo cated on disk b efore the job can run. Th us job sc heduling and loading data on disk are tigh tly coupled. W e explore the space of algorithmic design for cac hing algorithms along t w o in tert wined dimensions: cac he replacemen t and reordering the jobs in the w aiting 2

PAGE 12

queue. W e compare tec hniques that are a com bination of one cac he replacemen t and one job reordering algorithm. These tec hniques are com binations of Least Recen tly Used (LR U) and Greedy Request V alue (GR V) [42 44 43 ] from the cac he replacemen t space with FirstCome First-Serv ed (F CFS) and the Greedy Request V alue-sp ecic job reordering from the sc heduling space. W e use GR V for our comparison b ecause it is the only cac he replacemen t p olicy that has b een prop osed for jobs in Grids that use m ultiple les. The GR V algorithm uses a job sc heduling algorithm that c hanges the order in whic h jobs are pro cessed based on the con ten ts of the cac he. The algorithms are compared based on the follo wing metrics: b yte hit rate, p ercen tage of cac he c hange p er job, job w aiting time, queue length and n um b er of computations to sc hedule a job. Byte hit rate p er job is a measure of the eectiv eness of the cac he replacemen t p olicy P ercen tage of cac he c hange p er job sho ws ho w m uc h data is transferred in and out of the cac he in order to pro cess a job. This is particularly relev an t in a grid-lik e scenario where data transfer is time consuming due to wide-area transfers and size of data (m ulti-GB or more). Dela y in data transfer causes dela y in job execution. If the data prefetc hed is utilized eectiv ely the p ercen tage of cac he c hange p er job will b e lo w. Job w aiting time and queue length sho ws the eectiv eness of the job sc heduling algorithm. The n um b er of computations to sc hedule a job is a measure of sc heduling o v erhead. Our results sho w that Filecule LR U pro vides b etter b yte hit rates than GR V. Also based on our results, w e iden tify a new com bination of LR U cac he replacemen t with GR V job sc heduling algorithm that tak es adv an tage of temp oral lo calit y in the w orkload and the con ten ts of the cac he. The GR V job sc heduling p olicy impro v es the throughput of the system b y sc heduling jobs that utilize the con ten ts of cac he. When the com bination of LR U cac he replacemen t algorithm with job sc heduling using GR V is used with lecules for data prefetc hing, it ma y pro vide signican t impro v emen ts to hit rates while reducing the amoun t of data that needs to b e replaced in cac he. In order to understand the impact of history of jobs in iden tifying lecules that impro v e the p erformance of the system, w e study the v ariations in b yte hit rates and v olume of cac he c hange for dieren t lengths of history W e see that with increasing history lengths, the lecules are closer to those iden tied with the en tire w orkload. But the time tak en 3

PAGE 13

to iden tify these lecules is long. With small history lengths, the lecules iden tied are large in size. Finally w e discuss the impact of this new gran ularit y for other resource managemen t services. T o summarize, this thesis: 1. Pro vides a quan titativ e ev aluation of user b eha vior in terms of data usage patterns and compares it with traditional mo dels from the literature (Chapter 4). Our analysis sho ws that while some traditional mo dels w ork for the DZero w orkload (suc h as le p opularit y observ ed in w eb pro xy [11 ] and w eb clien t [15 ]), others are inappropriate (suc h as le sizes observ ed in Windo ws le system [25 ], P2P [52 ], In ternet [9 24 ]). This study pro vides imp ortan t information ab out user requests in the past whic h can b e utilized to predict future data requests and hence reduce dela ys in pro cessing their requests. 2. Characterizes the prop erties of lecules (Chapter 4). 3. Discusses the impact of the pattern w e disco v ered from our trace analysis on resource managemen t (Chapter 4). 4. Prop oses and ev aluates exp erimen tally a new com bination of cac he replacemen t p olicy and job sc heduling (Chapter 5). 5. Analyses the impact of size of windo w history used for lecule iden tication (Chapter 6). These results sho w that relativ ely short history is sucien t for iden tifying le groups that b enet cac he p erformance. W e also observ e that a sliding windo w adapts b etter to c hanging usage patterns. 4

PAGE 14

CHAPTER 2 W ORKLO AD DESCRIPTION In Grid terminology [26 ], the DZero Exp erimen t [2 ] is a virtual organization consisting of h undreds of ph ysicists in 70+ institutions from 18 coun tries. It pro vides a w orldwide system of shareable computing and storage resources that can b e utilized to solv e the common problem of extracting ph ysics results from sev eral P etab ytes of measured and sim ulated data. The w orkload analyzed in this thesis is from this pro duction-mo de data-in tensiv e high-energy ph ysics collab oration, the DZero Exp erimen t. In this c hapter, w e pro vide details ab out the w orkload and the in tuition that led us to prop ose a new gran ularit y for data managemen t, lecules. 2.1 Data Managemen t in Grids Storage Resource Managers (SRM) [51 ] are middlew are comp onen ts that pro vide space allo cation and data managemen t on the Grid. The Grid uses heterogeneous storage resources. SRMs reserv e and sc hedule storage resources b y pro viding standardized uniform access to these heterogeneous storage resources. SRMs ha v e disk cac hes whic h can range from few h undreds of gigab ytes to tens of terab ytes. F or example, disk cac hes v ary from 1 TB to 5 TB in DZero, are up to 150 TB [27 ] in CDF [1 ] and ab out 70 TB [27 ] in DESY [3 ]. The disk cac hes store data that are requested b y clien ts and th us mask failures due to link failures. Among man y other resp onsibilities, SRM administer t w o p olicies: job sc heduling and data cac hing. 2.2 Data in the DZero Exp erimen t Mo dern high-energy ph ysics exp erimen ts, suc h as DZero, t ypically acquire more than 1 TB of data p er da y and mo v e up to ten times as m uc h. T o giv e an example, during the 5

PAGE 15

past y ear the more than half a p etab yte of data w as stored at F ermi National Accelerator Lab oratory Aside from the stream of data from the detector, v arious other computing activities con tribute to the 1 TB of deriv ed and sim ulated data stored p er da y In this system, data les are read-only and the t ypical jobs analyze and pro duce new, pro cessed data les. Three main activities tak e place within the DZero Exp erimen t: data ltering (called data reconstruction in the DZero terminology), the pro duction of sim ulated ev en ts, and data analysis. This third activit y mainly consists of the selection and the statistical study of particles with certain c haracteristics, with the goal of ac hieving ph ysics measuremen ts. The rst t w o activities are indisp ensable for the third one. During data reconstruction, the binary format of ev ery ev en t from the detector is transformed in to a format that more easily maps to abstract ph ysics concepts, suc h as particle trac ks, c harge, spin, and others. The original format is instead v ery closely dep enden t on the hardw are la y out of the detector, in order to guaran tee the p erformance of the data acquisition system, and is not suitable for data analysis. On the other hand, the pro duction of sim ulated ev en ts, also called Mon te Carlo pro duction, is necessary for understanding and isolating the detector c haracteristics related to hardw are, suc h as the particle detection eciency or to ph ysics phenomena, suc h as signal to bac kground discrimination. T racing system utilization is p ossible via a soft w are la y er (SAM [38 ], [55 ]) that pro vides cen tralized le-based data managemen t. The SAM system oers four main services: rst, it pro vides reliable data storage, either directly from the detector or from data pro cessing facilities around the w orld. Second, it enables data distribution to and from all of the collab orating institutions. Third, it thoroughly catalogs data for con ten t, pro v enance, status, lo cation, pro cessing history user-dened datasets, and so on. And nally it manages the distributed resources to optimize their usage and enforce the p olicies of the exp erimen t. SAM categorizes t ypical high energy ph ysics computation activities in application families (reconstruction, analysis, etc.). Applications b elonging to a family are iden tied b y a name and a v ersion. This categorization is con v enien t for b o okk eeping as w ell as for resource optimization. Due to the data in tensiv e nature of the high energy ph ysics domain, applications almost alw a ys pro cess data. Suc h data is organized in tiers, dened according 6

PAGE 16

T able 2.1 Data in W orkload from Jan 2003 to Mar 2005 Num b er of jobs 234,069 Num b er of users 561 Num b er of data tiers 32 Num b er of les 1,134,086 to the format of the ph ysics ev en ts. Relev an t data tiers, some of whic h are discussed in this w ork, are the ra w, reconstructed, th um bnail, and ro ot-tuple tiers. The ra w tier iden ties data coming directly from the detector. The reconstructed and th um bnail tiers iden tify the output of the reconstruction applications, in t w o dieren t formats. The ro ot-tuple tier iden ties t ypically highly pro cessed ev en ts in ro ot format [19 ] and are generally input to analysis applications. F or the data handling middlew are, an application running on a dataset denes a job. Jobs are initiated b y a user on b ehalf of a ph ysics group and t ypically trigger data mo v emen t. 2.3 The DZero W orkload The studies presen ted in this researc h utilize data from the SAM data pro cessing history database b et w een Jan uary 2003 and Marc h 2005. Tw o t yp es of traces ha v e b een selected for our studies: le traces and application traces. File traces sho w what les ha v e b een requested with ev ery job run during the p erio d under study These traces are used to study the presence of lecules in the DZero computing activit y Application traces list summary information for the jobs. The information includes metadata for the application (application name, v ersion, and family), for the dataset pro cessed (data tier). The application traces also con tain general data, suc h as the user name and group that initiated the job and the lo cation (no de name) and start/stop time of the job. T able 2.1 sho ws the quan titativ e details of the w orkload data. Among the 234,069 jobs submitted during the p erio d of Jan uary 2003 and Marc h 2005, 113,454 jobs used data from reconstructed, ro ot-tuple and th um bnail data tiers. Out of these jobs, only 113,062 jobs ha v e information ab out les requested b y the job. These jobs w ere used for disk cac he sim ulations. T able 2.2 sho ws details ab out the w orkload used for the sim ulations. The total n um b er of le requests generated b y these jobs is 11,568,086. 7

PAGE 17

T able 2.2 Characteristics of T races Analyzed P er Data Tier Data tier Users Jobs Files # of le Input/Job Time/Job requests (GB) (hours) Avg. Std.dev. Avg. Std.dev. Reconstructed 304 17,552 507,796 1,770,176 34 285 11.08 38.52 Ro ot-tuple 51 1,226 59,923 468,176 85 115 14.19 28.93 Th um bnail 440 94,284 428,508 9,329,734 50 319 8.08 28.83 01/01/03 07/01/03 01/01/04 07/01/04 01/01/05 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Date (mm/dd/yy)Number of jobs Figure 2.1 Num b er of Jobs P er Da y 2.4 Num b er of Jobs P er Da y Figure 2.1 sho ws the n um b er of jobs that w ere submitted eac h da y The gure and T able 2.3 sho ws that h undreds of jobs are submitted on an y giv en da y Figures 2.2 and 2.3 sho w that the jobs use m ultiple les. The distribution sho wn in Figure 2.3 is hea vy tailed. But it is also imp ortan t to note that the a v erage n um b er of les p er job is 102 and the median v alue is 12. In order for jobs to run, these group of les need to b e presen t all at the same time on the lo cal disk. 8

PAGE 18

T able 2.3 Data Requested P er Da y Category Mean Median Num b er of jobs p er da y 137 107 Num b er of les p er job 102 12 Num b er of le requests p er da y 15,199 10,853 Num b er of distinct les requested p er da y 9,534 7,318 Data accessed p er da y (TB) 6.86 5.08 Distinct accessed p er da y (TB) 4.54 3.45 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 10 4 10 5 Number of FilesNumber of Jobs Figure 2.2 Num b er of Files P er Job 9

PAGE 19

10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 10 4 10 5 10 6 RankNumber of files per job Figure 2.3 Distribution of the Num b er of Files P er Job 1 2 3 x 10 5 Number of distinct files 01/01/03 07/01/03 01/01/04 07/01/04 01/01/05 0 1 2 Date (mm/dd/yy)Total number of files Figure 2.4 Num b er of T otal Files and Distinct Files Requested P er Da y 10

PAGE 20

10 0 10 1 10 2 10 3 10 2 10 3 10 4 10 5 10 6 RankNumber of files Total number of files per day Distinct files per day Figure 2.5 Distribution of T otal Num b er of Files P er Da y and Distinct Files P er Da y 2.5 Data Accessed P er Da y Figures 2.4 and 2.5 sho w that the distinct n um b er of les accessed p er da y is smaller than the total n um b er of les accessed on a da y This indicates that some of the les are requested b y more than one job. Figures 2.6 and 2.7 sho w similar results with the n um b er of distinct b ytes accessed b eing smaller than the total n um b er of b ytes. T able 2.3 sho w that on an a v erage, 37.3% of les requested on a da y are accessed t wice whic h equals to ab out 33.8% of rep eated b ytes. Another imp ortan t factor is that jobs can run for more than a single da y The plots here sho w the usage based on the start date of the job. If the plot is extended to include t w o da ys and so on, the o v erlap of the n um b er of b ytes rep eated during that p erio d will b e more pronounced. 2.6 Summary The data presen ted in this c hapter sho ws that jobs requests m ultiple les and this leads to tens of terab ytes of data accessed p er da y There is a signican t p ercen tage of data that is rep eated eac h da y (Refer T able 2.3). This led us to the in tuition that there 11

PAGE 21

0 20 40 60 Distinct requests per day (TB) 01/01/03 07/01/03 01/01/04 07/01/04 01/01/05 0 20 40 60 Date (mm/dd/yy)Total requests per day (TB) Figure 2.6 T otal Num b er of Bytes P er Da y and Distinct Bytes P er Da y 10 0 10 1 10 2 10 3 10 -2 10 -1 10 0 10 1 10 2 RankData accessed per day (TB) Distinct data Total data Figure 2.7 Distribution of T otal Num b er of Bytes P er Da y and Distinct Bytes P er Da y 12

PAGE 22

are relationships b et w een les. In Section 5.1, w e sho w that the usage patterns exhibit go o d temp oral lo calit y This existence of temp oral lo calit y in usage patterns com bined with grouping data based on the relationships b et w een les led us to exp erimen t metho ds that exploit these b eha viors to reduce v olume of data transferred b et w een no des in grid and reduce dela y to sc hedule job for execution. 13

PAGE 23

CHAPTER 3 RELA TED W ORK The nature of jobs submitted in a system inruences system design. Before building a system, it is hard to c haracterize the nature of jobs. Hence, assumptions are made ab out the job c haracteristics b y deriving similarities with other systems. Once the system is deplo y ed and running, it pro vides the opp ortunit y to test the assumptions made earlier and mak e necessary c hanges to impro v e system p erformance. Solutions that to ok adv an tage of observ ed patterns include le lo cation mec hanisms that exploit the relation b et w een stored les [23 ], information dissemination tec hniques [34 ] that exploit o v erlapping user in terests in data [35 ], and searc h algorithms [7 ] adapted to particular o v erla y top ologies [49 ]. Study of usage patterns also enables creating mo dels for the systems that can b e used for analysis without disturbing the actual system. Section 3.1 details previous w ork on analyzing le request data from w eb serv ers, pro xy serv ers and bro wsers, Windo ws and Unix le systems and P2P le sharing systems. Comparing the c haracteristics of DZero w orkload with those of traditional le systems can iden tify mo dels in traditional systems whic h can b e used to analyze the DZero w orkload. W e compare the size, p opularit y and lifetime c haracteristics of data in DZero with those of w eb requests, Windo ws and Unix le systems, and the more recen t le-sharing systems. Iden tication of lecules in v olv e grouping of les based on relationships b et w een les. Section 3.2 discusses other metho ds used to iden tify relationships b et w een les and ho w our grouping metho dology is dieren t from prior w ork. Section 3.3 discusses prior w ork that describ es stac k depth analysis and ho w it is used to test the temp oral lo calit y of a w orkload. W e use the same metho dolo dy to test temp oral lo calit y in DZero w orkload. Based on results from stac k depth analysis, w e c ho ose the LR U 14

PAGE 24

cac he replacemen t algorithm for our sim ulation. Section 3.4 and 3.5 describ es prior w ork on v arious cac he replacemen t algorithms. 3.1 W orkload Characterization This section pro vides details ab out prior w ork on w orkload c haracterization. Previous w ork that c haracterize w orkloads from the In ternet, le systems, P2P le sharing systems and data grid are describ ed. Web R e quests W eb requests ha v e b een studied at v arious end p oin ts: w eb serv ers, pro xy serv ers and bro wsers. Barford et al. [15 ] studied the w eb-clien t traces from 2 w orkstations in Boston Univ ersit y's Computer Science departmen t. File sizes w ere found to follo w lognormal distribution with P areto tails. A ma jorit y of the requests w ere targeted at a small set of les. W eb serv er w orkloads ha v e b een studied extensiv ely in [9 ], [6 ] and [14 ]. In [9 ], Almeida et al. analyzed the logs from the NCSA W eb Serv er at the National Cen ter for Sup ercomputing Applications, the SDSC W eb Serv er at San Diego Sup ercomputer cen ter, the EP A W eb serv er at Researc h T riangle P ark, NC and the w eb serv er at the Computer Science departmen t at Boston Univ ersit y The le p opularit y w as found to follo w Zipf-lik e distribution. In [6 ], Arc hary a et al. studied the w eb serv er logs from Lule a Univ ersit y of T ec hnology Sw eden. The le sizes w ere found to b e concen trated closer to the mean le size. In [14 ], 6 dieren t w eb serv er data sets w ere analyzed to iden tify c haracteristics common to all the w orkloads. 3 academic, 2 scien tic researc h organizations, 1 ISP w ere used in the study There w as a small n um b er of v ery small and v ery large les. Only 10% of the les w ere larger than 100 KB. The distribution of le size had a P areto tail. In [24 ], Cunha et al. studied the traces from bro wser logs of the Boston Univ ersit y CS departmen t. The co de of Mosaic bro wser w as mo died to log requests from users. The logs amoun ted to a p erio d of 2 mon ths. File sizes w ere observ ed to follo w P areto distribution. 15

PAGE 25

More les are small in size. File p opularit y follo w ed Zipf distribution. The relation b et w een le size and le p opularit y w as found to b e in v ersely correlated. In [11 ], Arlitt et al. studied the traces from a w eb pro xy within an ISP The traces amoun ted to a p erio d of 5 mon ths of activit y File-size distribution w as found to b e hea vytailed. 90% of the les con tributed only to 51% of the total size. 40% of the total size w as due to a few large les. File p opularit y w as also hea vy-tailed. 37% of les receiv ed 78% of the requests while 63% of the les w ere requested only once. Small les w ere requested more often than the large les. The p opularit y follo ws a Zipf-lik e distribution. File lifetimes are long for a few les. An activ e set of one da y's le requests w as observ ed o v er a p erio d of 5 mon ths. Ab out half the les b ecame unp opular on the next da y F urther c hanges to the activ e set w ere more gradual (10% reduction eac h mon th). Ab out 20% of the da y's les w ere used activ ely ev en after 5 mon ths. In [18 ], Breslau et al. analyze w eb pro xy cac he traces from dieren t sources. They sho w that the page request distribution follo ws a Zipf-lik e distribution. They also sho w that there is w eak correlation b et w een page size and p opularit y File Systems In [25 ], Douceur and Bolosky ha v e studied the c haracteristics of Windo ws le systems at the Microsoft Corp oration. The mean le sizes in these le systems v aried from 64 KB to 128 KB. But the median le sizes w as just 4 KB. This indicates that there w ere a lot of small les. The le size distribution follo ws log-normal. The high mean is inruenced b y the existence of a few large les. The le lifetimes w ere observ ed to follo w a h yp er-exp onen tial distribution. Similar observ ations w ere made b y V ogels in [56 ]. In [37 ], Gordoni analyzes the le sizes from dieren t unix systems. He iden tied that the systems either ha v e a lot of small les or ha v e a few big les. Hence he suggests using dieren t strategies when handling small and big les. In [54 ], T anen baum et al. study the le size distribution on Unix system. They analyzed the le sizes on the Unix mac hines at the Computer Science departmen t of the V rije Univ ersiteit during the 2005 and sho w ed that the median le sizes ha v e doubled since 1984 [39 ]. The largest le (2 GB) is ab out 4,000 times bigger than the largest le in 1984. 16

PAGE 26

Data Grid In [33 ], Iamnitc hi and Rip ean u studied the c haracteristics of the DZero data-in tensiv e ph ysics pro ject. They observ ed that the le size distributions and le p opularit y distributions did not follo w traditional mo dels. The le p opularit y did not follo w Zipf-lik e distribution. The reason for this seems to b e the nature of the ph ysics ev en ts whic h are recorded. All the ev en ts seems to b e equally p opular. The le size distribution did not follo w a hea vy-tailed distribution. The le sizes v aried from a few KB to 1.9 GB. File size distributions had 2 dieren t p eaks at 20 MB and 255 MB. P2P Systems In [52 ], the le p opularit y in Gn utella w as studied. It w as found that the v ery p opular les w ere equally p opular. The p opularit y distribution of rest of the les follo w ed Zipf-lik e distribution. Recen t studies in p eer-to-p eer le-sharing applications suc h as Gn utella, Kazaa and Napster conrm that dieren t le size distributions emerge with dieren t con ten t t yp es (predominan tly m ultimedia in this case) [50 ]. Ov erall, most of the le size distributions observ ed are log-normal with a hea vy tail. A large n um b er of les are small. File p opularit y in w eb requests follo w Zipf-lik e distribution. There are a lot of les that are less p opular. Ab out 20% of the les ha v e lifetimes as long as 5 mon ths. 3.2 Data Grouping In [10 ], Amer et al. create groups of les that are accessed together based on le access patterns and use these groups to prefetc h les. They use an aggregating cac he that main tains a successor list of les for eac h le that is accessed. The sequence in whic h les are requested is used to iden tify the successor of a le (next le requested). F or eac h new le access, these lists are tra v ersed un til a unique path is iden tied. Once this unique path is iden tied, the rest of the les in this path are prefetc hed. They observ ed a 20 to 1,200% impro v emen t in cac he hit rates. Filecules are dieren t from the groups iden tied in [10 ] in 17

PAGE 27

that lecules are disjoin t sets of les and grouping is not based on the order in whic h the les are accessed. In [53 ] T ait and Duc hamp analyze the use of le w orking sets for impro ving cac he p erformance using prefetc hing. Their algorithm builds distinct w orking trees based on le access sequence and patterns. F or ev ery job, they trac k the le access sequence and compare it with the existing w orking trees. Prefetc hing is dela y ed un til the sequence matc hes only one w orking tree. When a unique w orking tree is iden tied, the remaining les of that w orking tree are prefetc hed. Their exp erimen ts with le access traces from a SunOS mac hine pro v e that LR U with prefetc hing outp erforms con v en tional LR U. In [30 ], Gk an tsidis et al. analyzed grouping of les in Windo ws Up date|a soft w are up date service. They exp erimen ted with clustering of les using cosine correlation b et w een pair of les. The cosine correlation determines the probabilit y of t w o les b eing requested together. The cosine correlation of a pair of les is 1, if and only if b oth the les are alw a ys requested together. The threshold correlation used for grouping w as 0.9. They observ ed that 98% of the les formed 26 non-o v erlapping groups. 5 largest groups accoun ted for 97% of the total soft w are up date requests. They also tried clustering the up date patc hes but observ ed that they did not cluster as m uc h as les. Filecules are dened analogously but indep enden tly of this w ork [30 ], but w e do not group les that ha v e correlation co ecien ts smaller than 1. Ganger and Kaasho ek [28 ] use explicit grouping in whic h les that are used one after the other are placed in adjacen t lo cations on the disk and accessed as a whole group. Grio en and Appleton [31 ] consider t w o les related (and th us, part of the same group) if they are op ened within a sp ecied n um b er of le op en op erations from eac h other. 3.3 Stac k Depth Analysis In [12 ], Arlitt and Jin study the w orkload of the 1998 w orld cup w eb site. They use stac k depth analysis to sho w the temp oral lo calit y in the w orkload. They describ e that if the a v erage or median stac k depth is relativ ely small compared to the maxim um stac k depth in the w orkload (total n um b er of les), then there is go o d temp oral lo calit y and vice 18

PAGE 28

v ersa. The 90 th p ercen tile of the stac k depth accessed w as ab out 4% of the maxim um stac k depth. This sho ws that their w orkload has go o d temp oral lo calit y In [6 ], Ac hary a, Smith and P arnes c haracterize videos accessed on the w eb. They plot the p ercen tage of stac k depth accessed during Least Recen tly Used replacemen t algorithm. The plot sho ws that most of the stac k depth accessed is a small v alue (70% of the stac k depth accessed is less than 10) and hence go o d temp oral lo calit y in the video les accessed. W e use this stac k depth analysis in our studies to v erify that DZero w orkload traces exhibit go o d temp oral lo calit y (Section 5.1) and hence algorithms that tak e adv an tage of temp oral lo calit y should b e utilized to pro cess data in suc h systems. 3.4 Cac hing in W eb In [46 ] and [48 ], Pitk o w and Rec k er prop ose a cac hing algorithm that adapts to do cumen t hit rates and user access patterns. Based on mo dels from psyc hological researc h on h uman memory they iden tied that recency rates of do cumen t access history can b e used to predict future do cumen t access. They used w eb w orkloads from Georgia Institute of T ec hnology They calculated the probabilit y of access of a le based on recency of use in the access history and frequency of access within a giv en history windo w. Cac he replacemen t w as p erformed based on this calculated probabilit y and w as sho wn that the probabilit y of a cac he miss using this prediction w as less than 0.1. This sho ws that prediction mec hanisms using recency and frequency of data access with recency ha ving more w eigh tage predicts future do cumen t access most of the time (probabilit y of correct predition is 0.9). In [8], Aggarw al and Y u prop ose a mo died v ersion of LR U that evicts a group of do cumen ts with least dynamic frequency Dynamic frequency of a group of do cumen ts is the sum of the in v erse of dierence b et w een curren t time and last access time of eac h do cumen t, and hence accoun t for the recency of use. The group of les with the Least dynamic frequency is evicted from the cac he. They sho w that this algorithm prev en ts cac hing of rarely used ob jects and also reduces fragmen tation. In Ak amai-Con ten t Deliv ery Net w orks [36 ], w eb con ten t deliv ery is accelerated b y prefetc hing data. When a clien t requests for a w eb page, it is sen t to the edge serv er, 19

PAGE 29

a cac hing serv er lo cated closer to the user. The edge serv er sends the request to the appropriate w eb serv er for con ten t. Once the edge serv er receiv es the con ten t, it is parsed to iden tify em b edded ob jects. The edge serv ers send con ten t to the bro wser and sim ultaneously send requests to prefetc h the em b edded ob jects. The con ten t is resolv ed b y the bro wser and it sends requests to the edge serv er to load the em b edded ob jects. Since the edge serv er has prefetc hed the em b edded ob jects, the loading of this data is accelerated. In this case, the en tire data that is prefetc hed is utilized b y the clien t and the data that is prefetc hed dep ends on the con ten ts of the clien t request i.e., ob jects em b edded in the clien t request. In case of scien tic data, suc h explicit relationships do not exist b et w een les. W e predict this relationship based on usage data from the past. 3.5 Cac hing in Data Grids In [41 ], Oto o et al., prop ose a new disk cac he replacemen t p olicy for SRM [51 ] in data grids, Least Cost Benecial (LCB-K) replacemen t p olicy based on at most K bac kw ard references. They use w orkload from JasMINE (Jeerson Lab Async hronous Storage Manager) [4 ] for a p erio d of 6 mon ths. According to their LCB p olicy a utilit y v alue is calculated for eac h le that is not curren tly in use b y an running job. Files are evicted from cac he in non-decreasing order of their utilit y v alues. Their algorithm is compared with LFU, LR U, LR U-K [40 ], Greedy Dual Size [22 ] and MITK (a v arian t of LR U-K). They measure a v erage cost p er reference as total cost in time units divided b y the total n um b er of references to the le. LCB-K is sho wn to pro vide the lo w est a v erage cost p er reference. In [45 ], LCB-K is sho wn to p erform b etter than the other cac hing algorithms on w orkloads from the National Energy Researc h Scien tic Computing facilit y In [42 ], Oto o, Roten and Romosan prop ose a optimal le-bundle cac hing replacemen t algorithm, Greedy Relativ e V alue (GR V) that reduces the v olume of data transfer and increases the throughput. This algorithm determines the optimal set of les that needs to b e loaded in to the space a v ailable in the cac he suc h that the throughput of the system can b e impro v ed. The relativ e v alue of a request is a function of the p opularit y of the request and the adjusted size of the les requested. The adjusted size of a le is the size of 20

PAGE 30

the le divided b y the p opularit y of the le. Files are loaded in to the cac he based on the relativ e v alue of the requests. The GR V algorithm is explained in detail in Section 5.2.3. The p erformance of this algorithm is compared with the Landlord algorithm prop osed in [22 ] and [57 ]. In [44 ], Oto o, Rotem and Seshadri use the same relativ e v alue presen ted ab o v e to also c hange the order in whic h the jobs in the w aiting queue are pro cessed. The jobs with the highest relativ e v alue is pro cessed rst. This ensures that the jobs that can utilize the curren t set of les in the cac he can b e pro cessed immediately They compare their algorithm with First-Come First-Serv ed job sc heduling p olicy The results sho w that the a v erage resp onse time and a v erage queue length for GR V is smaller than LR U cac he replacemen t p olicy with F CFS sc heduling. In [43 ], Oto o et al., compare GR V with Greedy-Dual Size. Since the DZero traces exhibit go o d temp oral lo calit y and also b ecause GR V pro vides the capabilit y of running jobs that utilize the curren t state of the cac he itself, w e prop ose a com bination of the Least Recen tly Used cac he replacemen t algorithm with job sc heduling based on the relativ e v alue men tioned in GR V algorithm (Section 5.2.3). Filecules are dieren t from the le-bundles used in [42 ], [44 ] and [43 ]. File-bundles consist of the en tire set of les used b y a job, also kno wn as a "collection". Hence, a le-bundle migh t consist of one or more lecules. 21

PAGE 31

CHAPTER 4 FILECULES AND THEIR CHARA CTERISITICS A t ypical job in DZero requests m ultiple les (Section 2.4). Analysis of these requests rev eals that the les requested are often correlated. Suc h a group of correlated les is a lecule. In this c hapter, w e formally dene a lecule and w e c haracterize lecules in terms of size, p opularit y and lifetime. These prop erties will b e compared with those of data in traditional le systems and the In ternet. W e observ e that the prop erties of lecules follo w dieren t distributions than those observ ed in traditional le systems. Inspired from the denition of a molecule, w e dene a lecule as an aggregate of one or more les in a denite arrangemen t held together b y sp ecial forces related to their usage. W e th us consider a lecule as the smallest unit of data that still retains its usage prop erties. W e allo w one-le lecules as the equiv alen t of a monatomic molecule, (i.e., a single-atom as found in noble gases) in order to main tain a single unit of data (instead of m ultiple-le lecules and single les). F ormally a set of les F 1 ; :::; F n form a lecule G if and only if 8 F i ; F j 2 G and 8 G 0 suc h that F i 2 G 0 then F j 2 G 0 Prop erties that result directly from this denition are: 1. An y t w o lecules are disjoin t. 2. A lecule has at least one le. 3. The n um b er of requests for a le is iden tical with the n um b er of requests for the lecule that includes that le. Th us, p opularit y distribution on les and lecules is the same. T able 4.1 sho ws the statistics of size, p opularit y and lifetime for les and lecules. Figure 4.1 sho ws the distribution of the n um b er of les p er lecule. The gure sho ws that 22

PAGE 32

T able 4.1 Statistics of Size, P opularit y and Lifetime Prop ert y Minim um Maxim um Mean Median Standard Deviation File size 234 b ytes 1.98 GB 0.3859 GB 0.3773 GB 0.3230 GB Filecule size 23 KB 16,051 GB 3.9859 GB 0.9419 GB 54.5137 GB File p opularit y 1 996 12 3 25 Filecule p opularit y 1 996 41 30 50 File lifetime 15 secs 27 mon ths 4 mon ths 1 mon th 5 mon ths Filecule lifetime 15 secs 27 mon ths 8 mon ths 7 mon ths 5 mon ths 10 0 10 1 10 2 10 3 10 4 10 5 10 0 10 1 10 2 10 3 10 4 10 5 RankNumber of Files per Filecule Figure 4.1 Num b er of Files P er Filecule there are a few large groups of les (5% of lecules ab o v e 15 TB) and man y small groups of les (56% are one-le lecules). 4.1 Size Characteristics Analysis of le and lecule size c haracteristics of the DZero Exp erimen t pro vides an understanding of the t ypical data set size used b y ph ysicists. W e in tend to see ho w the size distribution c hanges due to grouping of data in to lecules. This c haracterization is useful for mo deling data size when considering lecules for data managemen t. 23

PAGE 33

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.5 1 1.5 2 2.5 3 3.5 File size data (GB)Density Figure 4.2 File Size Distribution 4.1.1 File Size Figures 4.2 and 4.3 sho w the le size distribution and le size vs. rank plot. The le with the largest size has rank 1. The smallest le is 234 b ytes and the largest size is 2.1 GB. The mean and median les are 0.4 GB. Since the mean and median v alues are same, the les are equally distributed along the mean (Num b er of les larger than mean ~ = Num b er of les smaller than mean) It can b e noticed in Figure 4.2 that there are man y les (69%) that are smaller than 110 MB. The next p opular le sizes are 440 MB to 550 MB and 1 GB to 1.1 GB. F urther analysis of these p opular le sizes sho ws that most of the les fall in the category of les smaller than 10 MB, b et w een 450 MB and 470 MB and b et w een 1.03 GB and 1.04 GB. The distribution follo ws a similar pattern observ ed in [33 ] except that the p eak le sizes are dieren t. In [33 ], there w ere few er les with le sizes that represen t p eaks in our study This could b e due to shorter traces in [33 ] (6 mon ths in [33 ] compared to 2+ y ears in our study). The m ultiple p eaks observ ed in the le size distribution is due to the dieren t p eaks observ ed in dieren t data tiers as sho wn in Figure 4.4. The les in dieren t data tiers are generated as a result of v arious reconstruction applications (Section 2.2) whic h can b e attributed to dieren t lo cations of p eaks in Figure 4.4. 24

PAGE 34

10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 -8 10 -6 10 -4 10 -2 10 0 10 2 RankFile Size (GB) Figure 4.3 File Size vs. Rank 0 1 2 3 x 10 -9 DataDensity 0 5 10 DataDensity 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 9 0 10 20 File Size (bytes)Density roottuple thumbnail reconstructed Figure 4.4 File Size Distribution P er Data Tier 25

PAGE 35

-5 -4 -3 -2 -1 0 1 2 3 4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Filecule size data (log GB)Density Filecule size data Logistic distribution fitk = 0.01024sigma = 0.41525mu = 0.09121 Figure 4.5 Filecule Size Distribution 10 0 10 1 10 2 10 3 10 4 10 5 10 5 10 0 10 5 RankFilecule size (GB) Figure 4.6 Filecule Sizes in Decreasing Order 26

PAGE 36

4.1.2 Filecule Size Figures 4.5 and 4.6 sho w the lecule size distribution and lecule size ordered in decreasing order of size. The largest lecule is 15.7 TB. The smallest lecule is 23.5 KB. All the les (2,264 les) with size less than 23.5 KB ha v e b een group ed in to lecules. The largest lecule size is ab out 8,000 times the largest le size. This lecule has 18,326 les. The mean lecule size is 4.2 GB and the median lecule size is found to b e 1.1 GB. This p ositiv e sk ewness is b ecause of the inruence of few large lecules (5% ha v e a size larger than 15 TB). Log-logistic distribution with parameters men tioned in Figure 4.5 b est ts lecule size distribution. This con tradicts the log-normal size distribution of data observ ed in [25 ] and [17 ]. The dierence b et w een log-normal and log-logistic distribution is that the log-logistic has a fat tail (larger n um b er of large les). The curv e of a log-logistic distribution increases geometrically with small v alues, rattens in the middle and decreases slo wly at high v alues. 4.2 P opularit y Characteristics P opularit y of le or lecule is measured as the n um b er of times le or lecule has b een requested. P opularit y distributions sho ws patterns in data usage. W e in tend to see if usage patterns c hange with grouping data in lecules. This c haracteristic is particularly relev an t for predicting cac hing p erformance when using lecules as the data abstraction. 4.2.1 File P opularit y Figures 4.7 and 4.8 sho w the le p opularit y distribution and le p opularit y in decreasing order. The most p opular le w as requested b y 996 jobs (34 unique users). Ab out 30% of the les ha v e b een used b y only one job (le p opularit y =1). The rank for the median le p opularit y (3 jobs) is 700,000, i.e., ab out 30% of the total n um b er of les (997,227) are requested only b y one or t w o jobs. 6.5% of the total n um b er of les (65,536 les ordered b y le p opularit y) accoun t for 45% (5,247,549 requests) of total requests (11,568,086 requests). This is similar to the observ ation in [15 ] where a small set of les accoun t for ma jorit y of the requests. Only 4 les are highly p opular with a le p opularit y of 996. The p opularit y 27

PAGE 37

0 1 2 3 4 5 6 7 0 0.2 0.4 0.6 0.8 1 1.2 File popularity (log)Density Figure 4.7 File P opularit y Distribution 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 RankPopularity Figure 4.8 File P opularit y vs. Rank 28

PAGE 38

0 50 100 150 200 250 0 0.005 0.01 0.015 0.02 0.025 Filecule popularity dataDensity Filecule popularity data Generalized Pareto Fitk =0.08004sigma = 38.628theta = 1.3625 Figure 4.9 Filecule P opularit y Distribution distribution is hea vy tailed (Figure 4.8) similar to observ ations in [33 ] and [11 ]. Also, the distribution do es not follo w the Zipf observ ed in [52 ], [9 ], [24 ] and [11 ]. 4.2.2 Filecule P opularit y Figures 4.9 and 4.10 sho w the lecule p opularit y distribution and lecule p opularit y vs. rank plot. There is only one lecule with maxim um p opularit y observ ed. There are a total n um b er of 3,918,553 lecule requests for 96,454 unique lecules. 49.7% (47,910) of the lecules accoun t for 86% (3,381,638) of the requests. 6.5% (6,270) of the lecules accoun t for 33.3% (1,126,255) requests. The generalized P areto distribution b est ts the lecule p opularit y data. The parameters are as giv en in Figure 4.9. Figure 4.10 clearly sho ws that the p opularit y distribution do es not follo w a Zipf 's la w that is observ ed in [9], [24 ] and [11 ]. The lecule p opularit y is not as hea vy tailed as le p opularit y This is b ecause les with less p opularit y group b etter in to lecules than the v ery p opular les. This is similar to the discussion in [10 ], whic h men tions that there migh t b e a few more p opular les whic h will b e used along with a lot of dieren t le sets. Hence when trying to iden tify disjoin t groups of les, these les remain single rather than form groups. 29

PAGE 39

10 0 10 1 10 2 10 3 10 4 10 5 10 0 10 1 10 2 10 3 RankPopularity Figure 4.10 Filecule P opularit y vs. Rank 4.3 Lifetime Characteristics W e dene the lifetime of a le or lecule as the time dierence b et w een the start time of the rst job that accessed the le and the end time of the last job that accessed it in our trace. This is dened as the activ e p erio d of the data. 4.3.1 File Lifetime Figures 4.11 and 4.12 sho w the le lifetime distribution and le lifetime in decreasing order of lifetime. 40% (396,341) of the les ha v e lifetime shorter than one w eek. Median le lifetime is 712 hours ( ~ =1 mon th). Ab out 35% of the les ha v e a lifetime greater than or equal to 5 mon ths. This is similar to the observ ation in a w eb pro xy w orkload [11 ], where 20% of the les w ere activ e after 5 mon ths. 294,355 (30%) les ha v e a lifetime of less than or equal to a da y (24 hours). This n um b er is less than the ones men tioned in [11 ] whic h rep orts 50% of inactiv e les on the next da y This indicates that the les in DZero ha v e longer lifetimes than those observ ed in the In ternet. This can inruence the eectiv eness of cac hing. On an a v erage 4.54 TB of distinct data is requested eac h da y 30% of this data b ecomes useless on the next da y and 70% of the data is still activ e. In order for a cac he to b e eectiv e, it needs to retain around 70% of the data from the previous da y This sho ws 30

PAGE 40

0 0.5 1 1.5 2 x 10 4 0 1 2 3 4 5 6 7 8 9 x 10 4 File Lifetime(hours)Density Figure 4.11 File Lifetime Distribution 0 1 2 3 4 5 6 7 8 9 10 x 10 5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 4 RankFile Lifetime(hours) Figure 4.12 File Lifetime vs. Rank 31

PAGE 41

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 4 0 0.2 0.4 0.6 0.8 1 1.2 x 10 4 Filecule lifetime data (hours)Density Filecule lifetime data Extreme Value Distribution(k=0.1539sigma=3649.1mu=4612) Figure 4.13 Filecule Lifetime Distribution us that an y cac he size that is less than 3.5 TB (70% of 5 TB) will not b e enough to tak e adv an tage of the temp oral lo calit y in the data. 4.3.2 Filecule Lifetime Figures 4.13 and 4.14 sho w the lecule lifetime distribution and lecule lifetime in decreasing order of lifetime. The b est distribution t for the data is extreme v alue distribution distribution with shap e parameter (k)={0.1539, scale parameter (sigma)=3,649.1 and lo cation parameter ( )=4,612. This is dieren t from the h yp erexp onen tial distribution observ ed in [25 ] Windo ws le system. More than 70% of the lecules are activ e after 5 mon ths. This 70% of lecules is equal to 35% of les (See Section 4.3.1). This also sho ws that more p opular les ha v e not group ed w ell with other les whic h is similar to the observ ation in Section 4.2.2. 5,175 (5%) lecules b ecome inactiv e after a da y This 5% of the lecules accoun t to 30% of the les whic h b ecome inactiv e after a da y This suggests that les with short activ e p erio ds ha v e a b etter tendency to group than les that ha v e long activ e p erio ds, similar to observ ation in Section 4.2.2, whic h can b e due to a le b eing accessed along with dieren t le sets b y dieren t jobs. 32

PAGE 42

0 2 4 6 8 10 x 10 4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10 4 RankFilecule Lifetime(hours) Figure 4.14 Filecule Lifetime vs. Rank T able 4.2 Co ecien ts of Correlation for File Prop erties File size & File p opularit y 0.1235 File size & File lifetime -0.0172 File p opularit y & File lifetime 0.3888 4.4 Correlation Bet w een Size, P opularit y and Lifetime The co ecien ts of correlation b et w een le attributes are giv en in T able 4.2 and those of lecule are sho wn in T able 4.3. There is no correlation b et w een these c haracteristics whic h indicates that using data size or lifetime to mak e decisions ab out data p opularit y is incorrect. F or example, evicting data from the cac he based on size or time elapsed since rst access will not impro v e the p erformance of the cac he. T able 4.3 Co ecien ts of Correlation for Filecule Prop erties Filecule size & Filecule p opularit y -0.0390 Filecule size & Filecule lifetime -0.0602 Filecule p opularit y & Filecule lifetime 0.3989 33

PAGE 43

4.5 Impact on Resource Managemen t Filecules can b e used as a new gran ularit y for resource managemen t in scien tic grids. The nature of jobs in suc h comm unities, i.e., jobs requesting m ultiple les, requires consideration for correlation b et w een les. This correlation is used to group les in to lecules. Grouping related les in to lecules reduces the n um b er of ob jects that need to b e managed, and naturally preserv es lo calit y of use. In Chapter 5, w e sho w ed that using lecules to prefetc h data in to the cac he impro v es the b yte hit rate. But lecules can b e also applied to data replication, job sc heduling, resource selection, and data staging. By using lecules for data replication, related data can b e stored together at the same lo cation. This ensures faster data searc h and retriev al. Moreo v er, it can guide job sc heduling for selecting a computational resource close to where the data needed b y the job is stored. The degree of correlation b et w een lecules can b e utilized for data staging. A threshold of correlation can b e used to determine ho w far a w a y t w o lecules should b e stored. If the threshold of correlation is met, the lecules are stored in nearb y lo cations. Instead of iden tifying lecules in one cen tral lo cation, they can b e iden tied lo cally in m ultiple storage lo cations. This will enable the system to iden tify lecules that dene the usage patterns lo cal to that storage. This can impro v e the qualit y of resource managemen t. 4.6 Summary The lecule size distribution follo ws Log-logistic distribution indicating that the decrease in the n um b er of large les is not as steep as it is observ ed in Log-normal distributions iden tied in Windo ws le systems [25 ] and w eb clien t traces [15 ]. The lecule p opularit y distribution is a Generalized P areto distribution. The le and lecule lifetime distributions indicate that the data in DZero ha v e longer lifetimes than data observ ed in w eb pro xy w orkloads [11 ]. The correlation co ecien ts sho w that data prediction or eviction decisions in storage cannot b e made based on data size and data lifetime. 34

PAGE 44

CHAPTER 5 CA CHING Storage Resource Managers (SRM) [51 ] and Storage Resource Brok ers (SRB) [47 ] provide cac hing and data storage services for data-grids. Data that needs to b e pro cessed b y a job is lo cated on a Mass Storage System (MSS) that can b e lo cated lo cally or remotely SRM has a large capacit y disk cac he and this cac he space is utilized to store data that is read from MSS. SRMs facilitize high data a v ailabilit y b y staging data and b y masking an y failures in data transfer. This c hapter discusses cac hing algorithms for suc h SRMs. Data can b e loaded in to SRM disk cac he on a demand basis or can b e pre-staged based on a usage prediction metho d. Data is loaded in the cac he on a demand basis when a job requests for data and the data is not a v ailable in the cac he. Prediction metho ds are used to predict what data migh t b e required b y a job in the future. Prediction is t ypically done based on the history of data usage [42 44 43 10 ]. In this c hapter, using stac k depth analysis, w e sho w that the DZero w orkload has go o d temp oral lo calit y W e compare the Least Recen tly Used (LR U) cac he replacemen t algorithm and the Greedy Request V alue (GR V) cac he replacemen t algorithm. W e sho w that using lecules for data prefetc hing in the cac he pro vides b etter prediction of data usage and hence b etter b yte hit rate. W e iden tify the dra wbac ks of using a First-Come First-Serv ed (F CFS) job sc heduling algorithm and compare F CFS with job sc heduling algorithm in GR V algorithm. W e sho w that a com bination of cac hing using lecules and job sc heduling using GR V algorithm pro vide go o d b yte hit rates and short job w aiting times. 35

PAGE 45

Figure 5.1 Stac k Depth Analysis of File Requests T able 5.1 Stac k Depth Analysis Statistics Measure V alue Maxim um 946,600 1 p ercen tile 85 10 p ercen tile 960 50 p ercen tile (Median) 12,260 90 p ercen tile 90,444 Standard Deviation 79,300 5.1 Stac k Depth Analysis Stac k depth analysis [12 6 ] is a metho d that measures the temp oral lo calit y of a w orkload is tested. A stac k is used to represen t a cac he. When a job requests a le, the most recen t lo cation of the le in the stac k is iden tied. The depth of this lo cation in the stac k from the top of the stac k is the stac k depth for the le access. The stac k depth accessed for the en tire w orkload is measured. There are appro ximately 11.5 million le accesses in the w orkload. F rom Figure 5.1, it can b e seen that all stac k depths are less than 1 million. This is less than 10% of the total n um b er of le accesses. T able 5.1 sho ws the statistics of the plot. 36

PAGE 46

T able 5.2 Av erage Num b er of Files in Cac he Cac he size # of les % of access depth (TB) greater than column 2 50 132,830 6.15 25 66,415 13.84 10 26,566 31.83 5 13,283 48.40 1 2,656 76.80 The strength of the temp oral lo calit y determines the stac k depths accessed: the stronger the temp oral lo calit y the smaller the stac k depth. As found in Figure 5.1, there is a thic k band closer to the x axis whic h indicates that there are a large n um b er of small stac k depths. The plot b ecomes sparse as only a small n um b er of large stac k depths are accessed. T able 5.2 sho ws the n um b er of les that can b e accomo dated in eac h cac he size and the p ercen tage of stac k depth accesses that are greater than the n um b er of les that can b e accommo dated. The DZero w orkload con tain 996,227 les whose sizes add up to appro ximately 375 TB. W e obtained the fraction of the n um b er of b ytes that can b e accommo dated in eac h cac he size (cac he size/375 TB), and used this fraction to calculate the n um b er of les that can b e accomo dated in the cac he. The n um b er of les that can b e accommo dated for all cac he sizes, with the exception of 1 TB, is less than the median stac k depth accessed. The a v erage n um b er of unique les accessed p er mon th is 100,201. Figure 5.1 sho ws that most of the stac k depth is less than 1 x 10 5 and the 90th p ercen tile is 90,444. This indicates that most of the les requested ha v e b een accessed previously within a windo w of one mon th. 5.2 Cac he Replacemen t and Job Sc heduling Algorithms An y storage has limited capacit y The amoun t of data that needs to b e stored in a system increases with time and th us it reac hes a p oin t where there is no more space left in storage to add new data. Under suc h circumstances, the system needs to mak e decisions ab out what data needs to b e retained and what can b e evicted from storage. These decisions are made based on dieren t parameters lik e age of the data, size of the 37

PAGE 47

data, etc. SRMs also ha v e limited disk cac he. When the disk cac he is full, SRMs evict data to mak e space for new data. SRMs also sc hedule jobs to impro v e the throughput of the system. This Section describ es the v arious cac he replacemen t and job sc heduling algorithms used in our exp erimen ts. The follo wing are the v arious com binations of cac he replacemen t and job sc heduling algorithms used: 1. Least Recen tly Used cac he replacemen t using les with First-Come First-Serv ed job sc heduling algorithm with innite queue length (File LR U) 2. Least Recen tly Used cac he replacemen t using lecules with First-Come First-Serv ed job sc heduling algorithm with innite queue length (Filecule LR U) 3. Greedy Request V alue with innite queue length (GR V) 4. Greedy Request V alue with threshold queue length of 1,000 (GR V-1000) 5. Least Recen tly Used cac he replacemen t using les with sc heduling using Greedy Request V alue with innite queue length (LR U-Bundle) 6. Least Recen tly Used cac he replacemen t using les with sc heduling using Greedy Request V alue with threshold queue length of 1,000 (LR U-Bundle-1000) 5.2.1 Least Recen tly Used Cac he Replacemen t Algorithm Least Recen tly Used (LR U) [5 13 14 ] is the cac he replacemen t algorithm used curren tly at the F ermi National Accelerator Lab oratory LR U is a temp oral lo calit y-based algorithm. According to this algorithm, when storage is full and a new data ob ject needs to b e loaded in to the storage, the replacemen t algorithm c ho oses the least recen tly used data in the storage to b e evicted to mak e space for the new data ob ject. 5.2.2 First-Come First-Serv ed Job Sceduling Algorithm The First-Come First-Serv ed (F CFS) job sc heduling algorithm uses the principle of a queue to sc hedule jobs to run. The job that is rst submitted to the system is sc heduled 38

PAGE 48

to run b efore an y other job that is submitted later. There is no bias in the order in whic h jobs are run. If space is not a v ailable in the cac he to load les for the job that is ahead of the queue, all jobs w ait in the queue un til the rst job starts to run. 5.2.3 Greedy Request V alue Algorithm Oto o et al., [42 44 43 ] prop osed the Greedy Request V alue (GR V) algorithm whic h com bines cac he replacemen t with job sc heduling. The goal of this algorithm is to reduce the amoun t of data transferred in to the cac he for a job to run and increase the throughput of the system b y utilizing the existing con ten ts of the cac he. A collection is dened as the set of les requested b y a job. Eac h le requested is assigned a relativ e v alue based on it's size and p opularit y (Equation 5.1). Eac h job is assigned a relativ e v alue based on the p opularit y of the collection of les requested and the relativ e v alue of the les (Equation 5.2). Jobs are sc heduled according to this v alue: the job with the largest v alue is sc heduled rst. v f i = s ( f i ) n ( f i ) (5.1) where v f i is the relativ e v alue of a le f i s ( f i ) is the size of le f i and n ( f i ) is the p opularit y of le f i V r = n ( r ) N X i =1 v f i (5.2) where V r is the relativ e v alue of request r and n ( r ) is the p opularit y of the request r When a new job is submitted to the system, this job en ters the w aiting queue. The relativ e v alue of this job is calculated based on Equations 5.1 and 5.2. The job with the largest relativ e v alue is sc heduled as the next job to run. When a job is sc heduled to run, the space required for the job is estimated. The space required is the dierence b et w een the sum of the size of les in the collection and the sum of size of les for the request that are already a v ailable in the cac he. The algorithm calculates the space a v ailable in the cac he. Space a v ailable in the cac he is the sum of the free space 39

PAGE 49

in the cac he and sum of the size of les that are not used b y an y curren tly running job. If the space a v ailable is greater than or equal to the space required, the job can b e pro cessed. When a job can b e pro cessed, the space required for the request is reserv ed and the remaining space (the dierence b et w een the space a v ailable and space required) is used for prefetc hing data. Data is prefetc hed based on the relativ e v alue of the requests in the history The les of the request with the largest relativ e v alue in the history is loaded in the a v ailable space. 5.2.4 Queue F reezing Queue freezing is a metho d in whic h the order in whic h the jobs will b e sc heduled is decided once and that order is follo w ed un til all the jobs in the frozen queue ha v e started to run. An y new job that arriv es at the queue needs to w ait un til all the jobs in the frozen queue ha v e started to run. This metho d is used to a v oid thrashing of jobs due to job sc heduling algorithms. Since F CFS is an un biased algorithm and the order in whic h jobs are pro cessed is nev er c hanged, there is no queue freezing for F CFS. In GR V and LR U-Bundle, certain jobs can suer long dela ys b ecause of v ery small request relativ e v alues. In order to a v oid these long dela ys, queue freezing is implemen ted for GR V and LR U-Bundle. W e c hose 1,000 jobs as the threshold queue length for queue freezing b ecause it is the a v erage n um b er of jobs submitted p er w eek. Queue freezing also tak es adv an tage of the temp oral lo calit y c haracteristic in the w orkload. F or GR V and LR U-Bundle, when the queue length reac hes this threshold, the relativ e v alues of all the jobs in the queue is calculated and the jobs are sorted in decreasing order of their relativ e v alues. The queue is frozen with these jobs in their sc heduled order. The relativ e v alue of an y incoming new job is not computed un til all the jobs in the frozen section of the queue are pro cessed. This also has an eect on the computational o v erhead for sc heduling. Once all the jobs in the frozen section of the job w aiting queue are pro cessed, the relativ e v alue of all the jobs in the remaining job w aiting queue are computed. If again the queue length is greater than or equal to threshold, the rst 1,000 jobs are sorted in 40

PAGE 50

decreasing order of their relativ e v alues and the queue is frozen. If the queue length is less than the threshold, the jobs are pro cessed without freezing the queue. 5.3 Metrics The goal of our exp erimen ts is to iden tify a com bination of cac he replacemen t and job sc heduling algorithm that will utilize the temp oral lo calit y in the w orkload, reduce the v olume of data transfer in to the cac he and a v oid job thrashing with small computational o v erhead. W e iden tied metrics that will quan tify our goals for the algorithms men tioned in Section 5.2. The traditional metric used in ev aluating cac he replacemen t algorithms [42 ] that use predictiv e prefetc hing is b yte miss ratio. Queue w ait time measured in terms of n um b er of iterations a job remains in the queue [43 ], a v erage resp onse time and a v erage queue length [44 ] are the metrics used to compare the p erformance of job sc heduling algorithms. Resp onse time in [43 ] is measured as the time dierence b et w een the arriv al time of the job and the time when the le requests are completed. Byte Hit R ate Byte hit rate is the most commonly measured p erformance metric for cac he replacemen t algorithms. Byte hit rates indicate the p ercen tage utilization of the con ten t of the cac he. B y teH itR ate = 100 B y teH it B y teH it + B y teM iss (5.3) F or our exp erimen ts, w e measure b yte hit rate p er job. Since data is prefetc hed in to the system, w e in tend to quan tify the utilit y of this prefetc hing for the job that is run. This do es not quan tify ho w m uc h data is mo v ed in to the cac he for this job to b e pro cessed. Per c entage of Cache Change The p ercen tage of cac he c hange is a measure of the amoun t of data loaded in to the cac he in order to run a job. This is measured as the p ercen tage dierence in the b ytes in cac he b efore and after the cac he is loaded with les required b y a job. This measure is an 41

PAGE 51

indicator of the v olume of data transfer that o ccurs in order to pro cess a job. The form ula used for calculation is as b elo w: V C i 1 ;i = S ( C ) ( M i 1 ;i + min ( F ( C i 1 ) ; F ( C i ))) S ( C ) 100 (5.4) where C i is the con ten t of the cac he after loading the les necessary for the i th job, V C i 1 ;i is the p ercen tage of cac he c hange due to pro cessing of the i th job, S ( C ) is the size of the cac he, F ( C i 1 ) is the free space in the cac he b efore the les for the i th job are loaded, F ( C i ) is the free space in the cac he after the les for the i th job are loaded, M i 1 ;i is the size of the les in C i 1 \ C i Queue L ength The queue length indicates ho w man y jobs are in the w aiting queue. This metric sho ws the eciency of the sc heduling algorithm in terms of utilizing the a v ailable cac he con ten ts and the free space to sc hedule new jobs. Job Waiting Time Job w aiting time indicates ho w long a job w as retained in the w aiting queue b efore it w as sc heduled for pro cessing. This measure sho ws if some of the jobs exp erienced v ery long dela ys due to the sc heduling algorithm. Sche duling Overhe ad Sc heduling algorithms do some computations based on v arious attributes of the jobs in the w aiting queue and mak e sc heduling decisions based on the results of those computations. These computations are o v erhead to the system. The computational o v erhead can b e represen ted as a function of the n um b er of computations p erformed to mak e a decision. This v alue is an indicator of ho w man y CPU cycles are utilized to sc hedule the next job. F or F CFS job sc heduling, the sc heduling o v erhead is alw a ys zero. F or GR V and LR U-Bundle, sc heduling o v erhead is dep enden t on the n um b er of jobs in the w aiting queue. 42

PAGE 52

5.4 Exp erimen t Setup Our exp erimen ts consisted of sim ulation of disk cac he using real w orkloads from the DZero Exp erimen t at F ermi National Accelerator Lab oratory W e implemen ted the algorithms listed in Section 5.2 and compared them using the metrics listed in Section 5.3. The GR V implemen tation is based on the OptCac heSelect and OptFileBundle algorithms describ ed in [43 ]. LR U-Bundle uses least recen tly used cac he replacemen t algorithm. GR V and LR U-Bundle sc hedule jobs based on the relativ e request v alue of the job calculated as giv en in Equation 5.2. The w orkload obtained from the DZero Exp erimen t resides in a MySql database. The w orkload consists of t w o sets of information: information ab out the start time and end time of eac h job and information ab out the list of les requested b y eac h job. The w orkload has data ab out all jobs submitted from Jan uary 2003 to Marc h 2005. The sim ulator is a Ja v a program consisting of 3,500 lines of co de whic h connects to the MySql database using JDBC to obtain details ab out eac h job. The input v alues to the sim ulator are the size of the cac he, the cac he replacemen t algorithm and the job sc heduling algorithm. Our sim ulations w ere run on the en tire w orkload. A job run and, in turn, le cac hing can b e triggered b y t w o ev en ts: a job arriv al or a job completion. When the total size of a job, i.e. the sum of the size of les requested b y a job is bigger than the size of the cac he, the job is ignored. The metrics listed in Section 5.3 are rep orted after eac h job is sc heduled to run. Sim ulations w ere run for 6 dieren t cac hing algorithms with 5 dieren t cac he sizes leading to 30 dieren t runs. F or calculating the optimal set of les to b e loaded in to the cac he for GR V algorithm, the history of jobs from the previous 1 w eek w as used. The a v erage run time for eac h sim ulation is around 6 hours on a P en tium I I with 2 GB memory running Lin ux op erating system. 5.5 Exp erimen tal Results Our exp erimen ts aim to iden tify the impact of these algorithms o v er a range of cac he sizes (1 TB to 50 TB). Our goal is to iden tify a set of algorithms for the particle ph ysics 43

PAGE 53

1 5 10 25 50 0 10 20 30 40 50 60 70 80 90 100 Cache size (TB)Byte hit rate (%) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.2 Av erage Byte Hit Rate science grid that will impro v e the throughput of the system with least amoun t of data mo v emen t. W e in tend to see whic h algorithms are suitable for small and large cac he sizes. The queue length for GR V do es not exceed the threshold v alue of 1,000 for cac he sizes of 5 TB and ab o v e. Hence, GR V and GR V-1000 algorithms are the same for cac he sizes of 5 TB and ab o v e. Similarly the queue lengths for LR U-Bundle and LR U-Bundle-1000 algorithms do not exceed threshold v alue for cac he sizes of 10 TB and ab o v e, and hence LR U-Bundle and LR U-Bundle-1000 are the same for these cac he sizes. A t 50 TB, all jobs are sc heduled to run as so on as they are submitted to the system. Hence File LR U and LR U-Bundle run the same for 50 TB cac he size. 5.5.1 Byte Hit Rate Figure 5.2 sho ws the v ariation of b yte hit rate for all the algorithms with increasing cac he size. Filecule LR U has highest b yte hit rate for all cac he sizes. This is due to the eectiv e prefetc hing of lecules. This is an upp er b ound for prefetc hing using lecules b ecause the lecule denitions used are optimal. W e used the en tire w orkload to iden tify these lecules. In Chapter 6,w e discuss the eect of using windo w of history to iden tify lecules. 44

PAGE 54

0 2 4 6 8 10 12 x 10 4 0 10 20 30 40 50 60 70 80 90 100 RankByte hit rate (%) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.3 Byte Hit Rate for Cac he Size of 1 TB F or cac he sizes up to 10 TB (Figures 5.3, 5.4 and 5.5), File LR U has lo w er b yte hit rate than GR V and LR U-Bundle. GR V and LR U-Bundle tak e adv an tage of the con ten ts of the cac he b y c hanging the order in whic h jobs are pro cessed. They sc hedule those jobs that can utilize the con ten ts of the cac he rather than on a F CFS basis. F or larger cac he sizes of 25 TB and 50 TB (Figures 5.6 and 5.7), File LR U hash higher b yte hit rate than GR V and sho ws similar p erformance as LR U-Bundle. This is b ecause for these cac he sizes, the n um b er of les that can b e accommo dated in the cac he is high enough that the les required for the jobs are found in the cac he. F or 50 TB and 25 TB, only 6.15% and 13.84% of the stac k depths are higher than the a v erage n um b er of jobs that can b e stored in the cac he (Refer T able 5.2). The dierence in b yte hit rates of GR V and LR U-Bundle is less than 5% for all cac he sizes. Though GR V in v olv es prefetc hing of data and LR U-Bundle do es not, they seems to pro vide similar b yte hit rates. W e also see that LR U-Bundle has higher b yte hit rate than GR V for cac he sizes of 25 TB and 50 TB b y taking adv an tage of the temp oral lo calit y F rom gure 5.7, it is found that the b yte hit rate of GR V is w orse than all the other algorithms. This is b ecause, the GR V algorithm clears more space than what is actually required b y the job. The GR V algorithm lo oks at all the les that are not curren tly in 45

PAGE 55

0 2 4 6 8 10 12 x 10 4 0 10 20 30 40 50 60 70 80 90 100 RankByte hit rate (%) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.4 Byte Hit Rate for Cac he Size of 5 TB. GR V = GR V-1000 0 2 4 6 8 10 12 x 10 4 0 10 20 30 40 50 60 70 80 90 100 RankByte hit rate (%) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.5 Byte Hit Rate for Cac he Size of 10 TB. GR V = GR V-1000 and LR U-Bundle = LR U-Bundle-1000 46

PAGE 56

0 2 4 6 8 10 12 x 10 4 0 10 20 30 40 50 60 70 80 90 100 RankByte hit rate (%) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.6 Byte Hit Rate for Cac he Size of 25 TB. GR V = GR V-1000 and LR U-Bundle = LR U-Bundle-1000 0 2 4 6 8 10 12 x 10 4 0 10 20 30 40 50 60 70 80 90 100 RankByte hit rate (%) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.7 Byte Hit Rate for Cac he Size of 50 TB. GR V = GR V-1000 and File LR U = LR U-Bundle = LR U-Bundle-1000 47

PAGE 57

1 5 10 25 50 0 0.5 1 1.5 2 2.5 3 Cache size (TB)Cache change (%) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.8 Av erage P ercen tage of Cac he Change for Dieren t Cac he Sizes use b y an y job, and determines whether or not to retain the le. This decision is based on the relativ e v alue of the request (Equation 5.2) that added this le to the cac he. Th us the eviction of a le from the cac he is not dep enden t on recency of use. The eect of eliminating more les than required and le elimination without taking in to accoun t recency of access is the cause for small b yte hit rate. 5.5.2 P ercen tage of Cac he Change File LR U and Filecule LR U ha v e the same amoun t of cac he c hanges at all cac he sizes b ecause w e use optimal lecules. There is no incorrect prediction, i.e., all the data that is prefetc hed is utilized b y the job. Figure 5.8 sho ws the a v erage p ercen tage of cac he c hange for increasing cac he sizes. Av erage p ercen tage of cac he c hange is b est for the LR U-Bundle algorithm for all cac he sizes. GR V has large cac he c hanges for all cac hes sizes except at 1 TB, where File LR U and Filecule LR U ha v e higher cac he c hanges. Queue freezing increases the p ercen tage of cac he c hange at 1 TB (Figure 5.9) b ecause there is no c hange to the job sc heduling order once the queue is frozen. An y job that arriv es when the queue is frozen cannot tak e adv an tage of the con ten ts of the cac he. 48

PAGE 58

10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 10 20 30 40 50 60 70 80 90 100 RankCache change (%) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.9 P ercen tage of Cac he Change for Cac he Size of 1 TB. File LR U = Filecule LR U 10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 10 20 30 40 50 60 70 80 90 100 RankCache change (%) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.10 P ercen tage of Cac he Change for Cac he Size of 5 TB. File LR U = Filecule LR U and GR V = GR V-1000 49

PAGE 59

10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 10 20 30 40 50 60 70 80 90 100 RankCache change (%) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.11 P ercen tage of Cac he Change for Cac he Size of 10 TB. File LR U = Filecule LR U, GR V = GR V-1000 and LR U-Bundle = LR U-Bundle-1000 A t 10 TB (Figure 5.11), there are some c hanges to the cac he made b y LR U-Bundle algorithm that are higher than some of those made b y File LR U and Filecule LR U. This can b e an eect of job thrashing. Some jobs that need a lot of c hanges to the cac he will b e dela y ed b y the LR U-Bundle algorithm. If they w ere sc heduled with F CFS sc heduling, they could ha v e utilized the some of the con ten ts of the cac he. Since they w ere resc heduled to run at a later time, the amoun t of cac he c hange needed at that time could b e higher. Figures 5.11, 5.12 and 5.13 sho w that the dierence b et w een the largest cac he c hanges for GR V and the largest cac he c hanges for LR U-Bundle increases with increasing cac he size. As the cac he size increases, the n um b er of les that are not used b y an y job but are still retained in the cac he increases. GR V algorithm is capable of replacing these h uge a v ailable spaces though it is not required. 5.5.3 Job W aiting Time and Queue Length When using F CFS sc heduling algorithm, jobs can b e dela y ed only due to lac k of free space in the cac he to load les for the job. In case of GR V algorithm, jobs can b e dela y ed either due to lac k of free cac he space or due to reordering of jobs in the queue. The job 50

PAGE 60

10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 10 20 30 40 50 60 70 80 90 100 RankCache change (%) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.12 P ercen tage of Cac he Change for Cac he Size of 25 TB. File LR U = Filecule LR U, GR V = GR V-1000 and LR U-Bundle = LR U-Bundle-1000 10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 10 20 30 40 50 60 70 80 90 100 RankCache change (%) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.13 P ercen tage of Cac he Change for Cac he Size of 50 TB. File LR U = Filecule LR U = LR U-Bundle = LR U-Bundle-1000 and GR V = GR V-1000 51

PAGE 61

1 5 10 25 50 0 1 2 3 4 5 6 x 10 10 Cache size (TB)Job waiting time (milliseconds) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.14 Av erage Job W aiting Time for Dieren t Cac he Sizes w aiting time for File LR U and Filecule LR U are equal b ecause the jobs are pro cessed in the same order (F CFS). Figure 5.14 sho ws the a v erage job w aiting time for the v arious algorithms for increasing cac he sizes. The job w aiting time for File LR U and Filecule LR U are equal b ecause the jobs are pro cessed in the same order (F CFS). File LR U and Filecule LR U ha v e the w orse a v erage w aiting times b ecause a lot of jobs are made to w ait b ecause of dela y in sc heduling the job at the head of the queue (F CFS). The Figure also sho ws that queue freezing increases the a v erage job w aiting time. This is b ecause once the queue is frozen, an y new job en tering the queue is made to w ait un til all the jobs in the frozen section of the queue are sc heduled to run. Figures 5.16, 5.17 and 5.18 sho w the eect of job thrashing. It can b e seen that when using GR V job sc heduling algorithm, few jobs suer longer dela ys than those observ ed when using F CFS. This is b ecause some jobs are dela y ed un til all the other jobs in the queue are sc heduled. But it also sho ws that there are man y jobs that suer longer w aiting time when using F CFS. This is b ecause man y jobs are made to w ait un til the job that is ahead in the queue is getting dela y ed due to lac k of space in cac he. 52

PAGE 62

10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 0.5 1 1.5 2 2.5 x 10 11 RankJob waiting time (milliseconds) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.15 Job W aiting Time for Cac he Size of 1 TB. File LR U = Filecule LR U 10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 1 2 3 4 5 6 x 10 10 RankJob waiting time (milliseconds) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.16 Job W aiting Time for Cac he Size of 5 TB. File LR U = Filecule LR U and GR V = GR V-1000 53

PAGE 63

10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 0.5 1 1.5 2 2.5 3 x 10 10 RankJob waiting time (milliseconds) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.17 Job W aiting Time for Cac he Size of 10 TB. File LR U = Filecule LR U, GR V = GR V-1000 and LR U-Bundle = LR U-Bundle-1000 10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 5 10 15 x 10 7 RankJob waiting time (milliseconds) File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.18 Job W aiting Time for Cac he Size of 25 TB. File LR U = Filecule LR U, GR V = GR V-1000 and LR U-Bundle = LR U-Bundle-1000 54

PAGE 64

1 5 10 25 50 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 4 Cache size (TB)Queue length File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.19 Av erage Queue Lengths for Dieren t Cac he Sizes Figures 5.15 and 5.16 sho w that queue freezing a v oids job thrashing, but also there is an increased n um b er of jobs that ha v e high job w aiting times compared to those with no queue freezing. The a v erage queue length sho wn in Figure 5.19 is similar to the one observ ed in Figure 5.14. Figures 5.20 and 5.21 sho w that the queue gets longer with queue freezing. It is also in teresting to see that at 10 TB, more jobs (68,415 jobs) are pro cessed as so on as they en ter the queue for File LR U and Filecule LR U than for GR V (61,771 jobs) or LR U-Bundle (50,289 jobs). But the a v erage is aected b y the long queues observ ed for the rest of the jobs for File LR U and Filecule LR U. 5.5.4 Sc heduling Ov erhead GR V and LR U-Bundle algorithms sc hedule jobs from the job w aiting queue based on the the relativ e v alue of the request (Equation 5.2). Computing the relativ e v alue of the requests in the w aiting queue is an o v erhead to the system. F CFS job sc heduling do es not incur an y sc heduling o v erhead. Hence the sc heduling o v erhead for File LR U and Filecule LR U are 0. 55

PAGE 65

0 2 4 6 8 10 12 x 10 4 10 0 10 1 10 2 10 3 10 4 10 5 RankQueue length File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.20 Queue Length for Cac he Size of 1 TB. File LR U = Filecule LR U 0 2 4 6 8 10 12 x 10 4 10 0 10 1 10 2 10 3 10 4 10 5 RankQueue length File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.21 Queue Length for Cac he Size of 5 TB. File LR U = Filecule LR U and GR V = GR V-1000 56

PAGE 66

0 2 4 6 8 10 12 x 10 4 10 0 10 1 10 2 10 3 10 4 RankQueue length File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.22 Queue Length for Cac he Size of 10 TB. File LR U = Filecule LR U, GR V = GR V-1000 and LR U-Bundle = LR U-Bundle-1000 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 RankQueue length File LRU Filecule LRU GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.23 Queue Length for Cac he Size of 25 TB. File LR U = Filecule LR U, GR V = GR V-1000 and LR U-Bundle = LR U-Bundle-1000 57

PAGE 67

1 5 10 25 50 0 1 2 3 4 5 6 7 x 10 4 Cache size (TB)Number of computations GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.24 Av erage Sc heduling Ov erhead for Dieren t Cac he Sizes Figure 5.24 sho ws the a v erage n um b er of computations p erformed to sc hedule a job. It sho ws that dieren t algorithms p erform sc heduling with small computational o v erhead for dieren t cac he sizes. F or example, LR U-Bundle has more a v erage computations than GR V for 1 TB and 10 TB cac he sizes. The rev erse is true for 5 TB cac he size. Figure 5.24 also sho ws that queue freezing reduces the computational o v erhead, b ecause once the queue is frozen no more computation is done un til all the jobs in the frozen part of the queue are sc heduled to run. Figure 5.25 also sho ws the same eect. W e exp ected the largest computational o v erhead for algorithms with queue freezing will b e greater than the largest v alue without queue freezing b ecause once the queue is unfrozen all the requests in the remaining queue is up dated with their relativ e request v alues. Though a lot of computations are done as so on as the queue is unfrozen, it is not as large as the largest o v erhead for algorithms with no queue freezing. Figure 5.28 sho ws that the o v erhead for GR V and LR U-Bundle are almost equal inspite of the queue length of LR U-Bundle b eing longer than GR V (Figure 5.23. A t 50 TB (Figure 5.29), the computational o v erhead is equal for all the algorithms b ecause all the jobs are sc heduled to run as so on as they are submitted. 58

PAGE 68

10 0 10 1 10 2 10 3 10 4 10 5 10 0 10 2 10 4 10 6 10 8 RankNumber of computations GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.25 Sc heduling Ov erhead for Cac he Size of 1 TB. File LR U = Filecule LR U = 0 10 0 10 1 10 2 10 3 10 4 10 5 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 RankNumber of computations GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.26 Sc heduling Ov erhead for Cac he Size of 5 TB. File LR U = Filecule LR U = 0 and GR V = GR V-1000 59

PAGE 69

10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 2 4 6 8 10 12 14 16 x 10 5 RankNumber of computations GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.27 Sc heduling Ov erhead for Cac he Size of 10 TB. File LR U = Filecule LR U = 0, GR V = GR V-1000 and LR U-Bundle = LR U-Bundle-1000 10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 RankNumber of computations GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.28 Sc heduling Ov erhead for Cac he Size of 25 TB. File LR U = Filecule LR U = 0, GR V = GR V-1000 and LR U-Bundle = LR U-Bundle-1000 60

PAGE 70

10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x 10 4 RankNumber of computations GRV GRV1000 LRUBundle LRUBundle1000 Figure 5.29 Sc heduling Ov erhead for Cac he Size of 50 TB. File LR U = Filecule LR U = 0 and GR V = GR V-1000 = LR U-Bundle = LR U-Bundle-1000 The computational o v erhead can b e further reduced for all the ab o v e algorithms b y not computing the relativ e v alue of an incoming job when there is no other job in the queue and when there is enough space to pro cess the job immediately 5.6 Summary T able 5.3 lists the v arious metrics and the algorithm that p erforms b est for that metric. It sho ws that LR U is a go o d cac he replacemen t algorithm for the scien tic w orkload used for these exp erimen ts. This is in agreemen t with the temp oral lo calit y of the w orkload. The F CFS job sc heduling algorithm is not suitable b ecause man y jobs are dela y ed due to jobs at the head of the queue. LR U-Bundle p erformance measured with all the metrics listed in Section 5.3 tak es adv an tage of the temp oral lo calit y in the w orkload and also pro vides short job w aiting times. Predicting data usage using lecule denitions pro vides b est b yte hit rates. The a v erage sc heduling o v erhead of LR U-Bundle is smaller than that of GR V. Apart from the computational o v erhead, GR V also has another o v erhead to calculate the optimal set of les that needs to b e prefetc hed in to the cac he. The n um b er of computations to 61

PAGE 71

T able 5.3 Summary of Results on Cac hing and Sc heduling Algorithms Metric Algorithm with the b est p erformance Byte hit rate Filecule LR U P ercen tage of cac he c hange LR U-Bundle Job W aiting Time GR V Queue Length GR V Sc heduling Ov erhead File LR U and Filecule LR U iden tify the optimal set of les to b e loaded in to the cac he is dep enden t on the amoun t of information stored ab out jobs in the history This o v erhead do es not exist for LR U-Bundle. The exp erimen tal results presen ted in this c hapter suggest that b y com bining LR UBundle cac he replacemen t algorithm with prefetc hing based on lecules ma y pro vide ev en b etter p erformance than the algorithms men tioned in Section 5.2. 62

PAGE 72

CHAPTER 6 IMP A CT OF HISTOR Y WINDO W ON FILECULE IDENTIFICA TION Data prefetc hing [10 46 48, 42 44 43 ] metho ds use recency and frequency of data usage from past requests to predict what data will b e requested in the future. The imp ortan t parameter that decides the p erformance of the prefetc h is dep enden t on the amoun t of requests used from history The requests from a p erio d that is far bac k in the past should ha v e lo w impact on the predictions and the requests from near past should ha v e high impact. Main taining all the requests from the past is practically imp ossible due to storage restrictions. The long pro cessing time of long history of requests ma y add signican t o v erhead to the system. This c hapter discusses the eect of the history windo w that is used to iden tify lecules. Optimal lecules w ere formed using a le request information from Jan uary 2003 to Marc h 2005. This is the grouping that should b e used during this 27 mon th p erio d to ac hiev e the b est b yte hit rates (Results using optimal lecules is sho wn in Section 5.5.1). An appropriate history windo w that predicts lecules closer to the optimal lecules needs to b e iden tied. The relationships b et w een les can c hange o v er a p erio d of time. Tw o les that ha v e high correlation (alw a ys requested together b y an y job) during a p erio d of time need not main tain their correlation during a later p erio d in time. The rev erse can also b e true where t w o unrelated les migh t ha v e stronger relationships in the future. A go o d history windo w should b e capable of grouping les that are p opular in the presen t and also b e able to iden tify this transitioning relationships b et w een les. W e exp erimen t b y using 1-mon th windo w. W e compare the lecules iden tied after 1 mon th with the optimal set of lecules. T able 6.1 sho ws ho w man y lecules iden tied matc h with the optimal lecules. The mismatc h is due to the lecules iden tied during 63

PAGE 73

T able 6.1 Comparison of 1-mon th Filecules and Optimal Filecules. Filecules formed with data used during Jan uary 2003 are compared with the optimal lecules iden tied with data used from Jan uary 2003 to Marc h 2005 Category # of lecules % of lecules % of b ytes Matc h 2,111 71.1 25 Mismatc h 858 28.9 75 T able 6.2 Comparison of Filecules Iden tied in 2 Consecutiv e 1-mon th Windo ws. Filecules formed with data used during Jan uary 2003 is compared with the lecules formed with data used during F ebruary 2003 Category # of lecules % of lecules % of b ytes Matc hing 2,162 72.8 13.67 Mismatc h 583 19.6 60.68 Not represen ted 224 7.5 25.65 1-mon th windo w b eing larger than the optimal lecules. The p ercen tage of b ytes that do not form optimal lecules is high. The information a v ailable during one mon th is not enough to iden tify optimal lecules. Some of the optimal lecules are not represen ted in the 1-mon th windo w b ecause the les in those optimal lecules w ere not requested during this 1-mon th windo w (not included in T able 6.1). T able 6.2 compares the lecules formed during t w o consecutiv e 1-mon th windo ws. The p ercen tage of b ytes that do not matc h is high (60.68%). This sho ws that the relationships b et w een les ha v e c hanged within one mon th. It also sho ws that a considerable amoun t of b ytes (25.65%) are not requested during the second mon th. Data from the T ables 6.1 and 6.2 clearly illustrates that the windo w used to iden tify lecules should iden tify the transitioning relationships (decrease the mismatc h b et w een t w o consecutiv e windo ws) and the correct relationships (increase the matc h b et w een one windo w and optimal lecules). 6.1 Filecule LR U Using 1-mon th Windo w In order to sim ulate the eect of using 1-mon th windo w, w e ran the cac he sim ulation for the rst mon th (Jan uary 2003). Before adding jobs for the next mon th (F ebruary 2003) to the w aiting queue, w e iden tify lecules using the history of jobs from Jan uary 2003. 64

PAGE 74

T able 6.3 Comparison of Byte Hit Rate of Filecule LR U Using 1-mon th Windo w with File LR U and Filecule LR U Using Optimal Filecules Category File LR U Filecule LR U with optimal lecules % of jobs with equal b yte hit rate 96.52 83.55 % of jobs with b etter b yte hit rate 3.30 2.55 % of jobs with w orse b yte hit rate 0.17 14.89 Similar lecule iden tication is p erformed at the end of eac h mon th b efore sc heduling a job from the next mon th. This Section will compare the b yte hit rate and p ercen tage of cac he c hange p er job obtained using File LR U, Filecule LR U using optimal lecules and Filecule LR U using 1-mon th windo w for lecule iden tication. The comparison will sho w that the p erformance of Filecule LR U using 1-mon th windo w is b etter than that of File LR U and w orse than Filecule LR U using optimal lecules. Figure 6.1 sho ws the dierence in b yte hit rate b et w een Filecule LR U with 1-mon th windo w and File LR U. Most of the data p oin ts are found in the upp er part of the graph, sho wing that b yte hit rate p er job for Filecule LR U with 1-mon th windo w is generally b etter than that of File LR U. This indicates that using a windo w of history to iden tify lecules impro v es the b yte hit rate. Figure 6.2 sho ws the dierence in b yte hit rate b et w een Filecule LR U with optimal lecules and Filecule LR U with 1-mon th windo w. Filecule LR U with optimal lecules has b etter b yte hit rate p er job than Filecule LR U with 1-mon th windo w. T able 6.3 sho ws the p ercen tage of jobs using Filecule LR U with 1-mon th windo w that ha v e equal, w orse and b etter b yte hit rates compared to File LR U and Filecule LR U. There are certain sections of the plot in Figure 6.2 where the 1-mon th lecules ha v e b etter hit rates than the optimal lecules. F or those p erio ds, the lecules using 1-mon th windo w dene the b est relationships. It indicates that there are some trade-os when using long history to iden tify lecules, where w e use information ab out transitioning relationships. Certain jobs can b enet from relationships iden tied during shorter windo ws. The p ercen tage of cac he c hange p er job when using File LR U and Filecule LR U algorithm with optimal lecules is the same. Figure 6.3 sho ws the dierence b et w een p ercen tage of cac he c hange b et w een Filecule LR U using 1-mon th windo w and File LR U. The dierence 65

PAGE 75

0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 4 100 80 60 40 20 0 20 40 60 80 100 Number of jobs startedDifference in byte hit rate (%) Figure 6.1 Dierence in Byte Hit Rate Bet w een Filecule LR U with 1-mon th Windo w and File LR U. File LR U has higher b yte hit rates for 66 jobs. Filecule LR U with 1-mon th windo w has higher b yte hit rates for 1,274 jobs. Equal b yte hit rates observ ed for 37,225 jobs. Ov erall Filecule LR U with 1-mon th has b etter b yte hit rates than File LR U 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 4 100 80 60 40 20 0 20 40 60 80 100 Number of jobs startedDifference in byte hit rate (%) Figure 6.2 Dierence in Byte Hit Rate Bet w een Filecule LR U with Optimal Filecules and Filecule LR U with 1-mon th Windo w. Filecule LR U with optimal lecules has higher b yte hit rates for 5,357 jobs. Filecule LR U with 1-mon th windo w has higher b yte hit rates for 985 jobs. Equal b yte hit rates observ ed for 32,223 jobs. Ov erall Filecule LR U with optimal lecules has b etter b yte hit rates than Filecule LR U with 1-mon th windo w. 66

PAGE 76

T able 6.4 Comparison of P ercen tage of Cac he Change of Filecule LR U Using 1-mon th Windo w with File LR U and Filecule LR U Using Optimal Filecules. The p ercen tage of cac he c hange for File LR U is equal to the p ercen tage of cac he c hange for Filecule LR U with optimal lecules. Category File LR U % of jobs with equal cac he c hange 86.57 % of jobs with b etter cac he c hange 8.04 % of jobs with w orse cac he c hange 6.47 0 0.5 1 1.5 2 2.5 3 3.5 4 x 10 4 1.5 1 0.5 0 0.5 1 1.5 2 Number of jobs startedDifference in percentage of cache change (%) Figure 6.3 Dierence in P ercen tage of Cac he Change Bet w een Filecule LR U with 1-mon th Windo w and File LR U. The p ercen tage of cac he c hange for File LR U is equal to the p ercen tage of cac he c hange for Filecule LR U with optimal lecules. in p ercen tage of cac he c hange is not substan tial. T able 6.4 sho ws the p ercen tage of jobs using Filecule LR U with 1-mon th windo w that ha v e equal, w orse and b etter p ercen tage of cac he c hange compared to File LR U and Filecule LR U with optimal lecules. The n um b er of jobs using Filecule LR U with 1-mon th windo w that cause larger c hanges to the cac he (2,495 jobs) and those that cause smaller c hanges (2,684 jobs) compared to File LR U are almost equal. Figure 6.3 also sho ws that the maxim um dierence in p ercen tage of cac he c hange is small ( < 2%). 67

PAGE 77

07/01/03 08/01/03 09/01/03 10/01/03 11/01/03 12/01/03 01/01/04 100 50 0 50 100 DateDifference in byte hit rate Figure 6.4 Dierence in Byte Hit Rate Bet w een Filecule LR U with 6-mon th Windo w and 1-mon th Windo w. 22,499 jobs ha v e equal b yte hit rates. Filecule LR U with 6-mon th windo w has lo w er b yte hit rate for 1,005 jobs. Filecule LR U with 1-mon th windo w higher b yte hit rate for 888 jobs. These results (presen ted in Figures 6.1, 6.2 and 6.3) sho w that smaller windo w size for lecule iden tication in Filecule LR U still leads to b etter p erformance than File LR U. In the next set of exp erimen ts, w e will see the impact of increasing windo w sizes. 6.2 Impact of Windo w Size in Filecule LR U Small history windo ws form few er lecules but with more les. This is due to the small n um b er of jobs that inruence the iden tication of lecules. As the n um b er of jobs used for iden tifying lecules increases, the lecule denitions are more accurate. When only a small n um b er of jobs are used for lecule iden tication, it will pro vide limited information ab out correlation b et w een lecules. Hence, les that are not correlated ma y end up b eing part of a lecule. When this o ccurs, the eciency of the prediction decreases. When using disjoin t windo ws for iden tifying lecules, lecule information is lost from one windo w to the next. W e ran exp erimen ts with dieren t windo w lengths: 1 mon th and 6 mon th windo ws. W e compared the dierence in b yte hit rates and p ercen tage of cac he c hange for eac h of these windo ws. 68

PAGE 78

T able 6.5 Comparison of Byte Hit Rate of Filecule LR U Using 6-mon th Windo w and Filecule LR U Using 1-mon th Windo w Category Filecule LR U with 1-mon th Windo w % of jobs with equal cac he c hange 92.24 % of jobs with b etter cac he c hange 4.12 % of jobs with w orse cac he c hange 3.95 Figure 6.4 sho ws the dierence b et w een b yte hit rate obtained using Filecule LR U using 6-mon th windo w and Filecule LR U using 1-mon th windo w. Most of the jobs ha v e the same b yte hit rates for b oth windo ws. This sho ws that some lecules iden tied using 1-mon th windo w are similar to those iden tied using 6-mon th windo w. T able 6.5 sho ws the p ercen tage of jobs that ha v e equal, w orse and b etter b yte hit rates b et w een Filecule LR U with 6-mon th windo w and Filecule LR U with 1-mon th windo w. Filecule LR U with 6-mon th windo w has b etter b yte hit rates than those with 1-mon th windo w up to No v em b er 2003. During the mon th of No v em b er 2003, the lecules iden tied with 1-mon th windo w predict data usage b etter than the lecules iden tied with 6-mon th windo w. No v em b er 2003 is the reason for almost equal p ercen tages of jobs with b etter cac he c hange and jobs with w orse cac he c hange (T able 6.5). During all the regions except No v em b er 2003 and Decem b er 2003, Filecule LR U with 6-mon th windo w has higher b yte hit rate than that of 1-mon th windo w. When using 1-mon th windo w, a lot of information ab out lecules formed during the rst 5 mon ths is lost. There is no prediction ab out those les. But there is also a signican t amoun t of jobs that ha v e smaller b yte hit rates with 6-mon th windo w than b yte hit rates with 1-mon th windo w. There are almost t wice the n um b er of jobs that arriv ed during that 1-mon th windo w (No v em b er 2003) whic h sho ws negativ e dierence. T able 6.6 sho ws the n um b er of jobs p er mon th for the 6 mon ths w e ha v e compared. This result suggests that w e need to study if the windo w used to iden tify lecules should b e based on the n um b er of jobs pro cessed since last lecule iden tication. F or example, iden tify lecules ev ery 1,000 jobs or so. Figure 6.5 sho ws the dierence in p ercen tage of cac he c hange p er job b et w een Filecule LR U with 6-mon th windo w and 1-mon th windo w. It also sho ws dieren t b eha vior during No v em b er 2003 compared to the rest of the mon ths. Throughout the en tire p erio d of 69

PAGE 79

T able 6.6 Num b er of Jobs P er Mon th Mon th Num b er of Jobs Jul 2003 3,762 Aug 2003 4,109 Sep 2003 4,365 Oct 2003 3,552 No v 2003 5,286 Dec 2003 3,318 07/01/03 08/01/03 09/01/03 10/01/03 11/01/03 12/01/03 01/01/04 1 0.5 0 0.5 1 1.5 2 2.5 3 DateDifference in percentage of cache change Figure 6.5 Dierence in P ercen tage of Cac he Change Bet w een Filecule LR U with 6-mon th Windo w and 1-mon th Windo w 6 mon ths of comparison, there is no signican t dierence in p ercen tage of cac he c hange b et w een the t w o dieren t windo ws: the largest dierence b eing 2.6%. This need not b e an eect of the algorithm itself. It can b e b ecause of the dierence in the order in whic h the system c ho oses to add les to cac he whic h aects the recency of access and this in turn aects the order in whic h les are evicted from the cac he. This sho ws that un w an ted les are not prefetc hed irresp ectiv e of the windo w length. 6.3 Summary Based on the Figures 6.4 and 6.5, it is found that selecting an appropriate windo w to iden tify lecules, a length of history that will group les that are p opular in the presen t and 70

PAGE 80

also iden tify transitioning relationships b et w een les, is imp ortan t for cac he p erformance (measured in b yte hit rate p er job). Another direction suggested b y our study is that p erformance is inruenced b y time localit y and th us sliding windo ws ma y lead to more adaptiv e grouping of les in to lecules. Another observ ation is that instead of xing the windo w size to a time in terv al, a windo w size dictated b y job-in terarriv al time could impro v e cac hing p erformance. Another parameter that ma y dictate windo w sizes ma y b e determined the transition rate of le p opularit y [29 ]. 71

PAGE 81

CHAPTER 7 CONCLUSION W e analyzed traces from a large high-energy ph ysics collab oration fo cusing on asp ects related to data usage. W e prop ose a new abstraction for data managemen t, namely lecules, and sho w that it is more ecien t than traditional one-le data gran ularit y for data cac hing. Using stac k depth analysis, w e sho w ed that this scien tic w orkload exhibits go o d temp oral lo calit y (Section 5.1). Hence, algorithms that use recency of data usage needs to b e used in suc h systems. W e prop osed and ev aluated a new com bination of cac hing with job sc heduling, LR UBundle (Chapter 5). W e compared the p erformance of LR U-Bundle with LR U and GR V cac he replacemen t algorithms b y sim ulating disk cac he ev en ts using real traces from the DZero Exp erimen t. Our exp erimen ts sho w that LR U-Bundle pro vides b etter b yte hit rates compared to File LR U (4%-106%) and GR V (4%-8%), and signican tly shorter (3%-103% compared to File LR U) job w aiting times. LR U-Bundle algorithm transfers less data (30% to 56%) from remote storage to SRM disk cac he compared to GR V. In CMS grid [20 ], EU data grid [16 ] and Grid PP [21 ], the bandwidth b et w een no des v aries from 10 Mb/s to 10 Gb/s. Most of the links b eing less than or equal to 1 Gb/sec. 30% of 1 TB amoun ts to 307 GB. The time tak en to transfer 307 GB o v er a 10 Mb/s link is 29 da ys and 40 mins o v er 1 Gb/s link. When LR U-Bundle algorithm is used, the data transfer time is reduced b y 40 mins to 29 da ys. W e also studied the eect of history windo w in iden tifying lecules and their impact on cac hing. W e observ ed that small windo w sizes iden tify lecules that are large and large windo w sizes creates lecules of smaller sizes. W e sho w that c ho osing an appropriate windo w is essen tial for the cac he p erformance (measured in b yte hit rate p er job). More analysis needs to b e p erformed using o v erlapping windo ws and c ho osing windo w length 72

PAGE 82

based on usage patterns (suc h as n um b er of jobs submitted, p ercen tage of cac he misses etc.) This researc h leads to a new set of questions left for future w ork. What is the eect of iden tifying lecules that o v erlap instead of using disjoin t lecules? F rom the discussion in Section 4.2.2, w e see that les with high p opularit y do not group in to large lecules. This ma y b e due to some p opular les b eing used along with dieren t sets of les. F or example, a p opular le F ma y b e used along with t w o dieren t sets of les namely f F A ; :::; F Z g and f F a ; :::; F z g in t w o dieren t jobs. This will lead to iden tifying 3 lecules: one mono-le lecule with F a lecule with set of les f F A ; :::; F Z g and another lecule with f F a ; :::; F z g If w e allo w o v erlap b et w een lecules, then instead of the ab o v e 3 lecules, 2 lecules can b e iden tied with one consisting of f F ; F A ; :::; F Z g and another consisting of f F ; F a ; :::; F z g In other w ords, les with correlation co ecien ts less than 1 can b e group ed in to lecules. This will b e analogous to the le groups iden tied in [30 ] using cosine correlation. Another direction for future w ork is to apply lecules for data replication and placemen t, as discussed in Chapter 4. In that case, w e w ould need to consider the b enets of using lecules and the tradeos in replication costs. Finally w e w ould lik e to v erify the generalit y of the patterns w e iden tied on other scien tic w orkloads. Recen t eorts led to the creation of a Grid W orkload Arc hiv e [32 ] that ma y mak e a v ailable other relev an t traces in the near future. 73

PAGE 83

REFERENCES [1] The Collider Detector at Fermi National Accelerator Lab oratory h ttp://wwwcdf.fnal.go v/ph ysics/public/public.h tml. [2] The DZero Exp erimen t, h ttp://www-d0.fnal.go v. [3] Deutc hes Elektronen-Sync hrotron, h ttp://www.desy .de. [4] Thomas Jeerson National Accelerator Facilit y h ttp://www.jlab.org. [5] Marc Abrams, Charles R. Standridge, Ghaleb Ab dulla, Stephen Williams, and Edw ard A. F o x. Cac hing pro xies: Limitations and p oten tials. T ec hnical rep ort, Blac ksburg, V A, USA, 1995. [6] S. Ac hary a, B. Smith, and P P arnes. Characterizing user access to videos on the w orld wide w eb. In Pr o c e e dings of Multime dia Computing and Networking 2000. [7] Lada Adamic, Bernardo Hub erman, Ra jan Luk ose, and Amit Puniy ani. Searc h in p o w er la w net w orks. Physic al R eview. E 64:46135{46143 2001. [8] Charu G. Aggarw al and Philip S. Y u. On disk cac hing of w eb ob jects in pro xy serv ers. In CIKM '97: Pr o c e e dings of the sixth international c onfer enc e on Information and know le dge management pages 238{245, New Y ork, NY, USA, 1997. A CM Press. [9] Virgilio Almeida, Azer Besta vros, Mark Cro v ella, and Adriana deOliv eira. Characterizing reference lo calit y in the www. T ec hnical rep ort, Boston, MA, USA, 1996. [10] Ahmed Amer, Darrell D. E. Long, and Randal C. Burns. Group-based managemen t of distributed le cac hes. In ICDCS '02: Pr o c e e dings of the 22 nd International Confer enc e on Distribute d Computing Systems (ICDCS'02) page 525, W ashington, DC, USA, 2002. IEEE Computer So ciet y [11] Martin Arlitt, Ric h F riedric h, and T ai Jin. W orkload c haracterization of a w eb pro xy in a cable mo dem en vironmen t. SIGMETRICS Perform. Eval. R ev. 27(2):25{36, 1999. [12] Martin Arlitt and T ai Jin. A w orkload c haracterization study of the 1998 w orld cup w eb site. T ec hnical rep ort, 1999. [13] Martin F. Arlitt, Ric h F riedric h, and T ai Jin. P erformance ev aluation of w eb pro xy cac he replacemen t p olicies. In TOOLS '98: Pr o c e e dings of the 10th International Confer enc e on Computer Performanc e Evaluation: Mo del ling T e chniques and T o ols pages 193{206, London, UK, 1998. Springer-V erlag. [14] Martin F. Arlitt and Carey L. Williamson. In ternet w eb serv ers: w orkload c haracterization and p erformance implications. IEEE/A CM T r ans. Netw. 5(5):631{645, 1997. 74

PAGE 84

[15] P aul Barford, Azer Besta vros, Adam Bradley and Mark Cro v ella. Changes in w eb clien t access patterns: Characteristics and cac hing implications. T ec hnical rep ort, Boston, MA, USA, 1998. [16] William H. Bell, Da vid G. Cameron, and A. P aul Millar. Optorsim: A grid sim ulator for studying dynamic data replication strategies. 17-4:403{416. [17] A. Besta vros. Demand-based do cumen t dissemination to reduce trac and balance load in distributed information systems. In SPDP '95: Pr o c e e dings of the 7th IEEE Symp osium on Par al lel and Distribute e d Pr o c essing page 338, W ashington, DC, USA, 1995. IEEE Computer So ciet y [18] Lee Breslau, P ei Cao, Li F an, Graham Phillips, and Scott Shenk er. W eb cac hing and zipf-lik e distributions: Evidence and implications. In INF OCOM (1) pages 126{134, 1999. [19] R. Brun and F. Rademak ers. In A n obje ct oriente d data analysis fr amework 1996. [20] D. G. Cameron, R. Carv a jal-Sc hiano, A. P Millar, C. Nic holson, K. Sto c kinger, and F. Zini. Ev aluating sc heduling and replica optimisation strategies in optorsim. In F ourth International Workshop on Grid Computing, 2003 pages 52{59. [21] D. G. Cameron, R. Carv a jal-Sc hiano, A. P Millar, C. Nic holson, K. Sto c kinger, and F. Zini. Uk grid sim ulation b enc hmarks with optorsim. In UK e-Scienc e A l l Hands Confer enc e Nottingham, 2003. [22] P ei Cao and Sandy Irani. Cost-a w are WWW pro xy cac hing algorithms. In Pr o c e e dings of the 1997 Usenix Symp osium on Internet T e chnolo gies and Systems (USITS-97) Mon terey CA, 1997. [23] Edith Cohen, Amos Fiat, and Haim Kaplan. Asso ciativ e searc h in p eer to p eer netw orks: Harnessing laten t seman tics. In Info c om San F ancisco, CA, 2003. [24] Carlos Cunha, Azer Besta vros, and Mark Cro v ella. Characteristics of www clien tbased traces. T ec hnical rep ort, Boston, MA, USA, 1995. [25] John R. Douceur and William J. Bolosky A large-scale study of le-system con ten ts. In SIGMETRICS '99: Pr o c e e dings of the 1999 A CM SIGMETRICS International c onfer enc e on Me asur ement and Mo deling of Computer Systems pages 59{70, New Y ork, NY, USA, 1999. A CM Press. [26] Ian F oster, Carl Kesselman, and Stev en T uec k e. The anatom y of the Grid: Enabling scalable virtual organizations. L e ctur e Notes in Computer Scienc e 2150:1{4, 2001. [27] P F uhrmann. dCac he: the commo dit y cac he. In Twelfth NASA Go ddar d and Twenty First IEEE Confer enc e on Mass Stor age Systems and T e chnolo gies 2004. [28] Gregory R. Ganger and M. F rans Kaasho ek. Em b edded ino des and explicit grouping: Exploiting disk bandwidth for small les. In USENIX A nnual T e chnic al Confer enc e pages 1{17, 1997. [29] Adam Shak ed Gish, Y uv al Sha vitt, and T omer T ank el. Geographical statistics and c haracteristics of p2p query strings. In IPTPS2007 Pr o c e e dings of the 6th International Workshop on Pe er-to-Pe er Systems F ebruary 2007. 75

PAGE 85

[30] Christos Gk an tsidis, Thomas Karagiannis, P ablo Ro driguez, and Milan V o jno vic. Planet scale soft w are up dates. T ec hnical rep ort, 2006. [31] Jim Grio en and Randy Appleton. Reducing le system latency using a predictiv e approac h. In USENIX Summer pages 197{207, 1994. [32] The Grid Workloads Arc hiv e, h ttp://gw a.ewi.tudelft.nl/. [33] A. Iamnitc hi and M. Rip ean u. Myth and realit y: Usage b eha vior in a large datain tensiv e ph ysics pro ject, 2003. [34] Adriana Iamnitc hi and Ian F oster. In terest-a w are information dissemination in smallw orld comm unities. In High Performanc e Distribute d Computing 2005. [35] Adriana Iamnitc hi, Matei Rip ean u, and Ian F oster. Small-w orld le-sharing comm unities. In Info c om Hong Kong, China, 2004. [36] Ak amai T ec hnologies Inc. W eb application accelerator. T ec hnical rep ort, 2005. [37] G. Irlam. Unix le size surv ey 1993, 1993. [38] L. Lo eb el-Carp en ter, L. Lueking, C. Mo ore, R. P ordes, J. T rum b o, S. V eseli, I. T erekho v, M. V ranicar, S. White, and V. White. Sam and the particle ph ysics data grid. In In Computing in High-Ener gy and Nucle ar Physics, Beijing, China 2001. [39] Sap e J. Mullender and Andrew S. T anen baum. Immediate les. Softw. Pr act. Exp er. 14(4):365{368, 1984. [40] Elizab eth J. O'Neil, P atric k E. O'Neil, and Gerhard W eikum. The lru-k page replacemen t algorithm for database disk buering. In SIGMOD '93: Pr o c e e dings of the 1993 A CM SIGMOD international c onfer enc e on Management of data pages 297{306, New Y ork, NY, USA, 1993. A CM Press. [41] Ek o w Oto o, F rank Olk en, and Arie Shoshani. Disk cac he replacemen t algorithm for storage resource managers in data grids. In Sup er c omputing '02: Pr o c e e dings of the 2002 A CM/IEEE c onfer enc e on Sup er c omputing pages 1{15, Los Alamitos, CA, USA, 2002. IEEE Computer So ciet y Press. [42] Ek o w Oto o, Doron Rotem, and Alexandru Romosan. Optimal le-bundle cac hing algorithms for data-grids. In SC '04: Pr o c e e dings of the 2004 A CM/IEEE c onfer enc e on Sup er c omputing page 6, W ashington, DC, USA, 2004. IEEE Computer So ciet y [43] Ek o w Oto o, Doron Rotem, Alexandru Romosan, and Sridhar Seshadri. File cac hing in data in tensiv e scien tic applications. In Data Management in Grids L e ctur e Notes in Computer Scienc e v olume V olume 3836/2006, pages 85{99. Springer Berlin / Heidelb erg, 2006. [44] Ek o w Oto o, Doron Rotem, and Sridhar Seshadri. Ecien t algorithms for m ulti-le cac hing. In Datab ase and Exp ert Systems Applic ations L e ctur e Notes in Computer Scienc e pages 707{719. Springer Berlin / Heidelb erg, 2004. 76

PAGE 86

[45] Ek o w Oto o and Arie Shoshani. Accurate mo deling of cac he replacemen t p olicies in a data grid. In MSS '03: Pr o c e e dings of the 20 th IEEE/11 th NASA Go ddar d Conferenc e on Mass Stor age Systems and T e chnolo gies (MSS'03) page 10, W ashington, DC, USA, 2003. IEEE Computer So ciet y [46] Jim Pitk o w and Mimi Rec k er. A simple y et robust cac hing algorithm based on dynamic access patterns. T ec hnical Rep ort GVU T ec hnical Rep ort, GIT-GVU-94-39, 1994. [47] Arcot Ra jasek ar, Mic hael W an, and Reagan Mo ore. Mysrb & srb: Comp onen ts of a data grid. In HPDC '02: Pr o c e e dings of the 11 th IEEE International Symp osium on High Performanc e Distribute d Computing HPDC-11 20002 (HPDC'02) page 301, W ashington, DC, USA, 2002. IEEE Computer So ciet y [48] Margaret M. Rec k er and James E. Pitk o w. Predicting do cumen t access in large m ultimedia rep ositories. A CM T r ans. Comput.-Hum. Inter act. 3(4):352{375, 1996. [49] Matei Rip ean u, Adriana Iamnitc hi, and Ian F oster. Mapping the Gn utella net w ork: Prop erties of large-scale p eer-to-p eer systems and implications for system design. Internet Computing 6(1):50{57, 2002. [50] Stefan Saroiu, Krishna P Gummadi, Ric hard J. Dunn, Stev en D. Gribble, and Henry M. Levy An analysis of in ternet con ten t deliv ery systems. SIGOPS Op er. Syst. R ev. 36(SI):315{327, 2002. [51] A. Shoshani, A. Sim, and J. Gu. Storage resource managers: Middlew are comp onen ts for grid storage, 2002. [52] K. Sripanidkulc hai. The p opularit y of gn utella queries and its implications on scalabilit y 2001. [53] C. D. T ait and D. Duc hamp. Detection and exploitation of le w orking sets. In Pr o c e e dings of the 11th International Confer enc e on Distribute d Computing Systems (ICDCS) pages 2{9, W ashington, DC, 1991. IEEE Computer So ciet y [54] Andrew S. T anen baum, Jorrit N. Herder, and Herb ert Bos. File size distribution on unix systems: then and no w. SIGOPS Op er. Syst. R ev. 40(1):100{104, 2006. [55] I. T erekho v. Meta-computing at d0. In In Nucle ar Instruments and Metho ds in Physics R ese ar ch, Se ction A, NIMA14225 v olume 502/2-3, pages 402{406, 2002. [56] W erner V ogels. File system usage in windo ws n t 4.0. In SOSP '99: Pr o c e e dings of the sevente enth A CM symp osium on Op er ating systems principles pages 93{109, New Y ork, NY, USA, 1999. A CM Press. [57] Neal E. Y oung. On-line le cac hing. In SOD A '98: Pr o c e e dings of the ninth annual A CM-SIAM symp osium on Discr ete algorithms pages 82{86, Philadelphia, P A, USA, 1998. So ciet y for Industrial and Applied Mathematics. 77

PAGE 87

APPENDICES 78

PAGE 88

App endix A Probabilit y Distributions A.1 Log Normal Distribution A con tin uous distribution in whic h the logarithm of a v ariable has a normal distribution. The probabilit y densit y function of a log normal distribution is giv en b y f ( x ; ; ) = e (ln x ) 2 2 2 x p 2 (A.1) for x>0, where is the mean of the v ariable's logarithm and is the standard deviation of the v ariable's logarithm. A.2 Log Logistic Distribution A con tin uous distribution in whic h the logarithm of a v ariable has a logistic distribution. The probabilit y densit y function of a logistic distribution is giv en b y f ( x ; ; s ) = e ( x ) s s 1 + e ( x ) s 2 (A.2) where is the lo cation parameter, s is the scale parameter and s>0. A.3 Generalized P areto Distribution The probabilit y densit y function of generalized pareto distribution is f ( x ; k ; ; ) = 1 1 + k x 1 1 k (A.3) where k is the shap e parameter, k 6 =0, is the scale parameter and is the threshold parameter. 79

PAGE 89

App endix A (Con tin ued) A.4 Hyp er Exp onen tial Distribution The probabilit y densit y function of a random v ariable X is f X ( x ) = n X i =1 f Y i ( y ) p i (A.4) where Y i is an exp onen tially distributed random v ariable with rate parameter i and p i is the probabilit y that X will tak e on the form of the exp onen tial distribution with rate i A.5 Extreme V alue Distribution The probabilit y densit y function of extreme v alue distribution is f ( x ; ; ) = 1 exp x exp exp x (A.5) where is the lo cation parameter and is the scale parameter. A.6 Zipf Distribution The zipf distribution follo ws p o w er la w. It is a discrete distribution with probabilit y mass function p ( x ; ; n ) = 1 x n X i =1 1 i (A.6) where x = 1 ; 2 ; :::; n > 1 and n is a p ositiv e in teger. 80