USF Libraries
USF Digital Collections

Performance analysis of a binary-tree-based algorithm for computing Spatial Distance Histograms

MISSING IMAGE

Material Information

Title:
Performance analysis of a binary-tree-based algorithm for computing Spatial Distance Histograms
Physical Description:
Book
Language:
English
Creator:
Sharma Luetel, Sadhana
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla
Publication Date:

Subjects

Subjects / Keywords:
Spatial Distance Histogram
Particle Distance Histogram
Quad-tree
Binary tree
Uniformity
Dissertations, Academic -- Computer Science -- Masters -- USF   ( lcsh )
Genre:
non-fiction   ( marcgt )

Notes

Abstract:
ABSTRACT: The environment is made up of composition of small particles. Hence, particle simulation is an important tool in many scientific and engineering research fields to simulate the real life processes of the environment. Because of the enormous amount of data in such simulations, data management, storage and processing are very challenging tasks. Spatial Distance Histogram (SDH) is one of the most popular queries being used in this field. In this thesis, we are interested in investigating the performance of improvement of an existing algorithm for computing SDH. The algorithm already being used is using a conceptual data structure called density map which is implemented via a quad tree index. An algorithm having density maps implemented via binary tree is proposed in this thesis. After carrying out many experiments and analysis of the data, we figure out that although the binary tree approach seems efficient in earlier stage, it is same as the quad tree approach in terms of time complexity. However, it provides an improvement in computing time by a constant factor for some data inputs. The second part of this thesis is dedicated to an approach that can potentially reduce the computational time to a great extent by taking advantage of regions where data points are uniformly distributed.
Thesis:
Thesis (M.S.C.S.)--University of South Florida, 2009.
Bibliography:
Includes bibliographical references.
System Details:
Mode of access: World Wide Web.
System Details:
System requirements: World Wide Web browser and PDF reader.
Statement of Responsibility:
by Sadhana Sharma Luetel.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 34 pages.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 002068372
oclc - 606852851
usfldc doi - E14-SFE0003136
usfldc handle - e14.3136
System ID:
SFS0027452:00001


This item is only available as the following downloads:


Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam 2200385Ka 4500
controlfield tag 001 002068372
005 20100413112522.0
007 cr mnu|||uuuuu
008 100413s2009 flu s 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0003136
035
(OCoLC)606852851
040
FHM
c FHM
049
FHMM
090
QA76 (Online)
1 100
Sharma Luetel, Sadhana.
0 245
Performance analysis of a binary-tree-based algorithm for computing Spatial Distance Histograms
h [electronic resource] /
by Sadhana Sharma Luetel.
260
[Tampa, Fla] :
b University of South Florida,
2009.
500
Title from PDF of title page.
Document formatted into pages; contains 34 pages.
502
Thesis (M.S.C.S.)--University of South Florida, 2009.
504
Includes bibliographical references.
516
Text (Electronic thesis) in PDF format.
3 520
ABSTRACT: The environment is made up of composition of small particles. Hence, particle simulation is an important tool in many scientific and engineering research fields to simulate the real life processes of the environment. Because of the enormous amount of data in such simulations, data management, storage and processing are very challenging tasks. Spatial Distance Histogram (SDH) is one of the most popular queries being used in this field. In this thesis, we are interested in investigating the performance of improvement of an existing algorithm for computing SDH. The algorithm already being used is using a conceptual data structure called density map which is implemented via a quad tree index. An algorithm having density maps implemented via binary tree is proposed in this thesis. After carrying out many experiments and analysis of the data, we figure out that although the binary tree approach seems efficient in earlier stage, it is same as the quad tree approach in terms of time complexity. However, it provides an improvement in computing time by a constant factor for some data inputs. The second part of this thesis is dedicated to an approach that can potentially reduce the computational time to a great extent by taking advantage of regions where data points are uniformly distributed.
538
Mode of access: World Wide Web.
System requirements: World Wide Web browser and PDF reader.
590
Advisor: Yicheng Tu, Ph.D.
653
Spatial Distance Histogram
Particle Distance Histogram
Quad-tree
Binary tree
Uniformity
690
Dissertations, Academic
z USF
x Computer Science
Masters.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.3136



PAGE 1

PerformanceAnalysisofaBinary-Tree-BasedAlgorithmforComputingSpatialDistance Histograms by SadhanaSharmaLuetel Athesissubmittedinpartialfulllment oftherequirementsforthedegreeof MasterofScienceinComputerScience DepartmentofComputerScienceandEngineering CollegeofEngineering UniversityofSouthFlorida MajorProfessor:YichengTu,Ph.D. RafaelPerez,Ph.D. RahulTripathi,Ph.D. SagarPandit,Ph.D. DateofApproval: October30,2009 Keywords:SpatialDistanceHistogram,ParticleDistanceHistogram,Quad-tree,Binarytree, Uniformity c Copyright2009,SadhanaSharmaLuetel

PAGE 2

ACKNOWLEDGEMENTS FirstandforemostIwouldliketoexpressmysinceregratitudetomyM.S.advisorDr.YiChengTuforprovidingmethewonderfulopportunityforcontinuingmyeducation.Ialways respecthimforprovidingmethecondenceandsupporttobeginthethesisintheareaofmy interestofdatabasemanagementsystems. IwouldliketoexpressheartfeltgratitudetowardsDr.RafaelPerezwhogavemethe opportunitytoworkasaGraduateAssistant"fortheCollegeofEngineering". Iwouldliketoextendmyappreciationtothecommitteemembersfortheirsupportand encouragement. IwouldliketoexpressmylovingthankstomyhusbandMr.PrakashLuetelforhisregular support,encouragementandmotivationthroughoutmythesis. Lastly,andmostimportantly,Iwishtothankmyparents,Mr.LakshmiDharGuragain andMrs.SapanaPokharelGuragainfortheirunconditionalloveandsupportthroughoutmy life.Tothem,Idedicatethisthesis.

PAGE 3

TABLEOFCONTENTS LISTOFTABLESii LISTOFFIGURESiii ABSTRACT iv CHAPTER1INTRODUCTION1 CHAPTER2OVERVIEWOFPRIORWORK3 2.1IntroductiontoQuad-tree4 2.2SpatialDistanceHistogram5 2.3ImplementationofDensityMaps6 2.4TheDM-SDHAlgorithm6 2.5TimeComplexity8 2.6DiscussionandConclusionofPriorWork9 CHAPTER3BINARYTREESTRUCTURE10 3.1OrganizationofTreeStructure11 3.2AnalysisoftheAlgorithm13 3.3ExperimentalResults21 CHAPTER4INSPECTINGUNIFORMREGIONSHOPINGTOIMPROVETHE PERFORMANCE25 4.1TheGoodnessofFitTest25 4.2R-forStatisticalComputing27 CHAPTER5CONCLUSIONANDFUTUREENHANCEMENTS30 5.1Conclusion30 5.2FutureEnhancements30 REFERENCES32 i

PAGE 4

LISTOFTABLES Table2.1Comparisonbetweenrunningtimeofbruteforcealgorithmand DM-SDHalgorithmforresolutionp=6500.0,maximumdistance=40000.0,minimumdistance=0.0,sizeofsmallchunk= 50.0anduniformdatadistribution7 Table3.1Notationsusedthroughouttheanalysisandtheirdenitions14 Table3.2Percentageofthepairsofcellsthatcanberesolvedunderdifferentlevelsofdensitymaps m andtotalnumberofhistogram buckets L computedwithMathematica6.018 Table3.3ComparisonbetweenrunningtimeofpreviousDM-SDHalgorithmimplementingquad-treeandnewDM-SDHalgorithmimplementingbinarytreeforresolution p =6500.0,maximumdistance=40000.0,minimumdistance=0.0,sizeofsmallchunk= 50.0anduniformdatadistribution21 Table4.1Numberofuniformregionsfor500,000numberofatomsondifferentlevels28 ii

PAGE 5

LISTOFFIGURES Figure2.1Densitymapswithdierentresolutionsforsamedataset4 Figure2.2MinimumandmaximumdistancebetweencellsA-BandA-C wherethedottedlinesrepresentmaximumdistancesandsolid linesrepresentminimumdistances7 Figure2.3ResolveTwoCellsfunction8 Figure2.4Treestructureofthedensitymap8 Figure3.1Ageneralbinarytree11 Figure3.2Partitionsofnodesashowsparentnode,aispartitionedas binsecondlevelandascinthirdlevel11 Figure3.3Densitymapsimplementedviaabinarytreeapproach12 Figure3.4Binarytreetoorganizethedensitymap13 Figure3.5BuildBinaryTreealgorithm22 Figure3.6Aisthecellwithbucketwidth p ,bucket1istheregionbounded bycurvesC1toC8andbucket2istheregionboundedbycurves D1toD8[1]23 Figure3.7Conceptualtreestructurewiththreedensitymapswherethe hiddenlinesigniestheintermediatedensitymap23 Figure3.8TimetakenbyquadtreeimplementedalgorithmqPDHand binarytreeimplementedalgorithmbPDHvsnumberofnodes inxaxis24 Figure3.9Ratioofthetimetakenbypreviousalgorithmandnewalgorithm inyaxisvsnumberofnodesinxaxis24 Figure4.1IfBisuniform,thechildrennodesofBarealsouniform26 Figure4.2IfBisuniform,thecellsinBarealsouniform26 Figure4.3FindingUniformRegionsalgorithm27 Figure4.4Chi-Squaretable28 iii

PAGE 6

PERFORMANCEANALYSISOFABINARY-TREE-BASEDALGORITHM FORCOMPUTINGSPATIALDISTANCEHISTOGRAMS SadhanaSharmaLuetel ABSTRACT Theenvironmentismadeupofcompositionofsmallparticles.Hence,particlesimulation isanimportanttoolinmanyscienticandengineeringresearcheldstosimulatethereallife processesoftheenvironment.Becauseoftheenormousamountofdatainsuchsimulations, datamanagement,storageandprocessingareverychallengingtasks.SpatialDistanceHistogramSDHisoneofthemostpopularqueriesbeingusedinthiseld.Inthisthesis,we areinterestedininvestigatingtheperformanceofimprovementofanexistingalgorithmfor computingSDH.Thealgorithmalreadybeingusedisusingaconceptualdatastructurecalled densitymapwhichisimplementedviaaquadtreeindex.Analgorithmhavingdensitymaps implementedviabinarytreeisproposedinthisthesis.Aftercarryingoutmanyexperiments andanalysisofthedata,wegureoutthatalthoughthebinarytreeapproachseemsecient inearlierstage,itissameasthequadtreeapproachintermsoftimecomplexity.However, itprovidesanimprovementincomputingtimebyaconstantfactorforsomedatainputs. Thesecondpartofthisthesisisdedicatedtoanapproachthatcanpotentiallyreducethe computationaltimetoagreatextentbytakingadvantageofregionswheredatapointsare uniformlydistributed. iv

PAGE 7

CHAPTER1 INTRODUCTION Computersimulationallowsthescientiststodeterminethefeaturesofthesystemand visualizeitvirtuallybeforethesystemisactuallybuilt.Itresultstheecientandeective constructionofthesystem.Almostallthescienticeldsareusingthegoodnessofcomputersimulationintoday'sworld.Scienticparticlesimulationsaregettingmorepopularin scienticandengineeringeldssuchasmaterialscience,astro-physics,biomedicalsciences, chemistryandsoon.Theyaredemandinghugedatastoragesystemsimposinggreatchallengeinanalyzing,storingandprocessingthedata.[4]Here,wedealwiththetechniquesand algorithmswhichareveryimportantintheanalysisoftheparticlesimulationdata. AHistogramisadatastructuremaintainedbyaDatabaseManagementSystemDBMS toapproximatedatadistribution.Thedatadistributioncanbeapproximatedbyassigning thedatavaluesinparticularthesub-rangeofthevaluecalledbuckets.Histogramscanbeof manytypesandareusedasqueryoptimizerinmanydatabasesystems. ParticleSimulation,asubsetofComputerSimulation,treatsthebasicentitiesoflarge systemsas"classicalentities"thatinteracttooneanotherviaempiricalforces.Datageneratedbyparticlesimulationsrequirehugedatabasesystemsandqueryprocessingduetoits largevolumeofdata.Insuchcaseofhugedataset,wecanimplementtheconceptofSpatial DistancehistogramSDH,whichisconsideredasafundamentaltoolinvalidationandanalysisofsuchdata.SDHisatypeofquerythatmaintainsthehistogramofdistancesamong thepairsofparticleswithinthesystem.ItisthedirectestimationofradialdistributionfunctionRDH,whichisacontinuousstatisticaldistributionfunctionthatdescribesrelationship betweendensityofsurroundingmatterandfunctionofdistancefromaparticularpoint.[3] 1

PAGE 8

Chapter2describestheoverviewofpreviousworkdonetocomputethedistancehistogram. Itpresentsthedensitymapdata-structurewhichisimplementedbyusingquad-treeindexand analyzesthealgorithm. Chapter3presentstheideaofbinarytreeimplementedSDHconcept.Itpresentsthe densitymapdatastructureimplementedusingbinarytree.Itpresentstheexperimentalresults andobservationsbeingmadefordierentdatasets.Thischapteralsocomparestheresults obtainedbyquadtreeimplementationofdensitymapsandthebinarytreeimplementation ofdensitymaps. Chapter4presentsthenovelideaofimplementinguniformitytestinthedatasoasto reducethecomputingtime.ItdescribestheChi-squaregoodnessofttestwhichisbeingused togureouttheuniformityamongthecodesanddescribeshowthetestcanbeimplemented inoursystem. Chapter5presentstheconclusionandfutureenhancementofthisthesis. 2

PAGE 9

CHAPTER2 OVERVIEWOFPRIORWORK Usuallythevolumeofscienticdataissolargethatitbecomesachallengetostoreand retrievesuchdatausingcurrentDBMSsystems.Particlesimulationisanexampleofsuch scienticdatainwhichbasiccomponentsoflargesystemsaretreatedastheclassicalentities thatinteractforcertaindurationunderpostulatedempiricalforces.[1] Incaseoflargebiomedicalsimulationsystems,moleculesaretreatedastheclassicalentities.Themoleculesinsuchsystemsinteractwitheachotherforcertaindurationunder someforce.Similarly,incaseofastrologicalsimulationsystem,particlesallovertheuniverse aretreatedastheclassicalentities.And,theparticlesinteractwithoneanotherforcertain durationunderpostulatedempiricalforces. Hugespaceisrequiredtostoretheresultsobtainedbysuchparticlesimulation.For example,amolecularsimulationofthecell'sprotein-makingstructurecreatedbyresearchers atLosAlamosNationalLaboratorysimulates2.64millionatoms.Although,thecongurations ofparticlesimulationtendtostoreinformationabouttheirtypes,velocitiesandcoordinates, scientistsaremainlyfocusedonthecoordinatesonly.SDHkeepsahistogramofthedistances ofallpairsoftheparticlesintheparticlesimulationsystem.IfbruteforcemethodofSDH osimplemented,thealgorithmrequiresO N 2 computationsforNnumberofparticles.On theotherhand,wecanreducethecomplexityto N 3 = 2 ifweimplementaconceptualdata structurecalleddensitymapinSDHalgorithmasdescribedinthepriorworkofthisthesis [1]. In[1],thedensitymapisdenedasa2Dgridthatcontainssquaresofequalsize.Every cellinthegridrepresentsthesimulatedspaceandcontainsthenumberofparticleslocatedin thatspaceandthefourcoordinatesofthecell.ToprocessSDH,aseriesofdensitymapsare built.Eachcellinthedensitymapisdividedintofourdisjointcellsinthenextdensitymap 3

PAGE 10

Figure2.1Densitymapswithdierentresolutionsforsamedataset asshowningure2.1becausethedensitymaporganizedbyconnectingallthecellsusing point-regionPRquadtreeapproach.Eachlevelinthequad-tree,becomesoneresolutionof thedensitymap. 2.1IntroductiontoQuad-tree Asindicatedbyitsname,quadtreeisatreestructurewhichrepeatedlydividesthespace intoquadrants.Itisanexampleofspace-partitioningtrees.Quad-treeisusedtodescribe aclassofhierarchicaldatastructureswhosecommonpropertyisthattheyarebasedonthe principleofrecursivedecompositionofspace.[2]Quad-treesareclassiedtodierentclasses dependinguponthedatatheyrepresent.Themajorsub-classesofquad-treeare: Pointquad-tree Regionquad-tree Edgequad-tree Pointquad-treeisverymuchsimilartobinarytreebutitrepresentstwodimensionalpoint data.Itisimplementedasamulti-dimensionalgeneralizationofabinarytree.Eachnode hadfourchildrenrepresentedasNWNorth-West,NENorth-East,SWSouth-Westand SESouth-East.Allthechildrennodescontainpointinxandycoordinatesandvalueof thatpoint. 4

PAGE 11

Regionquad-treealsoknownastrieisabranchingstructurewhichbranchestheregion intofourequalquadrants.Eachnodeinthetreehaseitherexactlyfourchildrenornochildren atall. Edgequad-treeisusedtorepresenttheedgesorlinesratherthanthepoints. Regardlessofthetypes,allquadtreespartitionthespaceintocells,andthetreefollows thespatialdecompositionofthequad-tree. IntheDM-SDHalgorithm,weimplementtheconceptofPRPoint-Regionquadtreeto organizethedensitymap.PRquad-treeadaptsregionquadtreetopointdata.Itispretty muchsimilartoregionquad-treebutthedierenceisthatunlikeinregionquad-tree,inPR quad-trees,leafnodescanbeeitheremptyorcontainingdata. 2.2SpatialDistanceHistogram Whileanalyzingandresearchingonparticlesimulationdata,spatialdistancehistogram SDHisusedasabasictool.Itisadirectestimationofacontinuousstatisticaldistribution functionknownasRadialDistributionFunction"RDF[1].RDFbasicallygivestheprobabilityofndingaparticleindistancerofanotherparticle.RDFcanbeviewedasnormalized SDH.RDFcanbedenedmathematicallyas: g r = N r 4 r 2 r where N r isthetotalnumberofatomsinspacebetween r and r + r aroundanyparticle and istheaveragedensityofalltheparticlesinthesystem RDFisverymuchimportantinthermodynamicsandusingthisfunction,wecancompute thethermodynamicquantitiesofthesystemlikepressureandenergy. SDHtechniquesarenotyetusedbethecommercialdatabasesystems.InSDHProblem, wehavetocalculatethedistancebetweenallgivenpointsandputtheminahistogrambucket. Inthisthesis,thewidthofallthehistogrambucketsarealwaysthesame,denotedby p 5

PAGE 12

2.3ImplementationofDensityMaps Densitymapisaconceptualdatastructure,usedtocalculatethepoint-to-pointdistances onlesstime.Fortwodimensionaldata,itisa2-Dgridthatdividesthespaceintosquaresand rectangles.Whileimplementingquadtreestructure,thegriddividesthespaceintosquares andwhileimplementingbinarytree,thespaceisdividedintorectangularspaces. Eachnodeofthekeyholdsp-count,x1,x2,y1,y2,child,p-list,nextwherep-countis thetotalnumberofatomsheldbythenode,x1,x2,y1,y2denetheboundofthesquare, childpointstotheleftmostchildofthenodesothatchildis-1forthenodesattheleaf level,p-listcontainsthedatastoredbythetreeandnextchainsthenodesatthesamelevel together. Whilebuildingthetree,itismadesurethatthespacerepresentedbyeverynodeisa squarerst.Then,onchangeofeachlevel,thespaceispartitionedintwodimensionsto getfourmoresquaresasdepictedingure2.1.Thedensitymapshowningure2.1canbe representedbyatreestructureasshowninthegure2.4. Ingure2.4DM3hasthehighestresolutionbecauseitisatthelowestlevelabovethe leaflevel,so,allthenodesofDM3areconnectedtothedataoftheparticles. 2.4TheDM-SDHAlgorithm Resolvingtwocellsisthemostimportantpartofthisprocess.Twoofthecellsinthesame densitymapareknownasresolvablecellsiftheminimumandmaximumdistancesbetweenthe cellsfallinthesamehistogrambucket.Whiledeterminingwhetherthecellsareresolvableor not,anyofthetwocellsofsamedensitymaparetakenandminimumandmaximumdistances betweenthosetwocellsarecalculatedasshowningure2.2.Ifthosedistancesfallintothe samehistogrambucket,thetwocellsareresolvableintothatbucket.Ifthosedistancesdonot fallintoasamehistogrambucket,theydonotresolveonthecurrentdensitymapandthe controlismovedtothenextlevelofthetreeorhighresolutionofthedensitymapandsame thingisrepeatedagain.Inthisway,consideringthenumberofatomsinthedensitymapcells toprocessmultiplepoint-to-pointdistancesatonce,signicantlyimprovestheperformance 6

PAGE 13

Figure2.2MinimumandmaximumdistancebetweencellsA-BandA-Cwherethedotted linesrepresentmaximumdistancesandsolidlinesrepresentminimumdistances Table2.1ComparisonbetweenrunningtimeofbruteforcealgorithmandDM-SDHalgorithm forresolutionp=6500.0,maximumdistance=40000.0,minimumdistance=0.0,sizeof smallchunk=50.0anduniformdatadistribution No.ofAtoms Brute-ForceAlgorithm DM-SDHAlgorithm 50 0.000089 0.000135 500 0.008975 0.00504 5000 0.834 0.202 50000 82.5 5.9 100000 339.45 17.807 overthebrute-forceapproach.Thealgorithmimplementedforresolvingtwocellsisasshown inalgorithm2.3[1]. Inthistree,MinimumBoundingRectangleMBRformedbythedataparticlescontained inaparticularnodeisalsobeingstored.MBRisbeingusedtocomputetheminimumand maximumpoint-to-pointdistances.TheuseofMBRinthisalgorithmmakesmorecells resolvableateachlevel. Whilebuildingthetree,seriesofdensitymapsiscreatedstartingfromthezerothlevelof thetree,whichhasasinglenodemapthatcoverswholespaceandhasleastresolutionamong alldensitymaps.Thetotallevelofdensitymapsasshownin[1]is H=log 2 d [ N= ]+1 where2 d isthedegreeoftreenodes4for2-dimensionaldata,Nisthetotalnumberof atomsand istheaveragenumberofparticlesineverynode.Inthisalgorithm, issetto beslightlygreaterthan4. 7

PAGE 14

Input :SayAandBarethetwoinputcells. if A and B are resolvable then Add n A n B tothecorrespondingbucketWhere n A and n B arethetotalnumber ofparticlescontainedbyAandBrespectively end elseif A and B are the leaf nodes then Computeallpair-wisedistancebetweenAandBaddthemtothecorresponding bucket end else for each child A1 in A do for each child B1 in B do ResolveTwoCellsA1,B1/Callthefunctionrecursively/ end end end Figure2.3ResolveTwoCellsfunction Figure2.4Treestructureofthedensitymap 2.5TimeComplexity Anyalgorithmisanalyzedbydeterminingtheamountofresourcesmainlytimeandspace requiredbythatalgorithm.Timecomplexityofanalgorithmisthenumberofstepstakenby thatalgorithm.Here,wearecalculatingthetimecomplexityoftheDM-SDHalgorithm. ThetotaltimetakenbyDM-SDHalgorithmmainlycontainstwomajoroperations.They are: 1.Timetakentocheckiftwocellsareresolvable 8

PAGE 15

2.Timecalculationsfordataincellswhicharenon-resolvableeveninthehighestdensity map. Accordingtolemma1in[1],thetimecomplexityofDM-SDHoftherstoperationis calculatedas N 2 d )]TJ/F22 5.9776 Tf 5.756 0 Td [(1 d andthetimecomplexityofthesecondoperationisalsoderivedas N 2 d )]TJ/F22 5.9776 Tf 5.757 0 Td [(1 d Hence,thetimecomplexityofDM-SDHalgorithmasawholeis N 2 d )]TJ/F22 5.9776 Tf 5.756 0 Td [(1 d 2.6DiscussionandConclusionofPriorWork ItisfoundthattheDM-SDHalgorithmisbetteroverbruteforceonlyifthenumberof atomsislarge.However,since,DM-SDHalgorithmisdesignedfornottoosmallnumberof atoms,thelimitationdoesnothampermuch.DM-SDHalgorithmusingquadtreeapproach improvestheeciencyofthecomputationofSDHquerygreatlyoverbruteforcealgorithm. Theexperimentsandanalysisdescribedin[1]showsthatthetimecomplexityofDMSDHalgorithmis N 2 d )]TJ/F22 5.9776 Tf 5.757 0 Td [(1 d ,for d =2,its N 3 2 ,whichbeatstheothersolutionsavailable. Although,thisalgorithmhasprovidedaverygoodsolutionoftheproblemofcomputing spatialdistancehistograms,sincethequadtreeisveryshortandbushy,itmaygetless numberofresolvablecells.Inthefollowingchaptersofthisthesis,wearediscussionother approachesofDM-SDHalgorithm. 9

PAGE 16

CHAPTER3 BINARYTREESTRUCTURE Fromthischapteronwards,wedealwiththeapproachesweresearchedandusedtoanalyze theperformanceofcomputingSDHecientlyinscienticdatabasewhichmakesuseofbinary treestructure. Therstapproachusedismakinguseofbinarytreelikedensitymaps.Unlikequadtree, binarytreejusthaveatmosttwochildrenforeachnodeasshowningure3.1.Therstnode isnamedasparentnodeandchildrennodesarenamedbyleftnodeandrightnode.Inthe generaluseofComputerScience,binarytreesareverymuchpopularlyusedinbinarysearch trees.BinaryTreecanbeofmanytypes.Someofthetypesare: RootedBinaryTree PerfectBinaryTree CompleteBinaryTree FullBinaryTree BalancedBinaryTree Inthisthesis,thetree,weareimplementing,ismorelikerootedbinarytree,whichisthe simplestformofbinarytreewhichhasatmosttwochildrenandwhichhasonerootnode. However,itisnotexactlyarootedbinarytree,oranyotherbinarytreebecausebinarytrees arenotspacepartitioningbynature,andwearepartitioningthespaceinthiscase.Wecan alsosay,thistreestructureasak-dtreewithk=2,butthedenitionofk-dtreewithk=2is sameasthatofabinarytree. 10

PAGE 17

Figure3.1Ageneralbinarytree Figure3.2Partitionsofnodesashowsparentnode,aispartitionedasbinsecondlevel andascinthirdlevel 3.1OrganizationofTreeStructure Incaseofquadtreestructurewedealtinpreviouschapter,everyavailablenodeisstrictly squarebutunlikeinearliercase,thespacerequiredbyeachnodeinthiscaseisnotstrictly square,itcanberectangularaswellassquare.Thenodesarepartitionedbydividingone dimensiononceandthendivideanotherdimensionatthenextlevelasshowningure3.2. Ifwetraversefromtheroot,i.e.,levelzero;thesecondlevelinthiscasehassamenodesas therstlevelofthepreviousapproach.Fromthis,itcanbeassertedthatthisapproachis justaddingsomeintermediatelevels,soastomakethetreelessbushy.Inthiscase,every partitioningwillgeneratetwopartitionsinnextlevel. Inthiscase,alloddlevelsarepartitionedinx-directionhorizontallyandallevenlevels arepartitionedverticallyiny-direction.Consideringtheexampleweconsideredingure2.1, implementationofbinarytreebecomesasshowningure3.3. 11

PAGE 18

Figure3.3Densitymapsimplementedviaabinarytreeapproach 12

PAGE 19

Figure3.4Binarytreetoorganizethedensitymap Thetreestructuretoorganizethedensitymapasshowningure3.1isshowningure 3.4.Inthiscase,DM0.5andDM1.5aretheintermediatelevelofdensitymapsandDM2 hasthehighestresolution.ThenodesatDM2connecttothedataparticles. Thealgorithmtobuildthebinarytreestructureasshowningure3.2isdescribedin algorithm3.5. Afterestablishingthedensitymapsinbinarytreestructureasshowninalgorithm3.5,the densitymapsareusedtoprocesstheSDHqueryasinDM-SDHalgorithm2.3oftheprevious chapter. 3.2AnalysisoftheAlgorithm Thecoverableregionisatheoreticalregionthatconsistsofalltheparticlesthatcould havedistancewithinagivenbuckettoagivencell[1]. Thenotationsusedthroughouttheanalysischapterandtheirdenitionsareshownin table3.2. Intheanalysisofthisalgorithm,non-coveringfactorhasaveryimportantrole.Thenoncoveringfactorof m th levelofthedensitymap,denotedas m ,isdenedasthepercentage ofthosepairsofnodesatthelevelmwhicharenotresolvableevenonthehighestresolution densitymap.1 )]TJ/F23 10.9091 Tf 11.051 0 Td [( m givesusthevalueofpercentageofpairsofnodesthatareresolvable inthe m th densitymap. 13

PAGE 20

Table3.1Notationsusedthroughouttheanalysisandtheirdenitions Notation Denition N numbersofparticlesatomsindata p widthofthehistogrambucket L totalnumberofhistogrambuckets i anindexnotationforanyhistogrambucket sidelengthofacell S areaofaregionin2Dspace m non-coveringfactoronlevel DM m Fortheevenlevelsofdensitymapm=2,4,6..,theresultsaresameasdiscussed[1].Hence fortheevenvaluesofm,lim p 0 m +1 m = 1 2 Herewearedealingwiththeformulasrelatedtotheoddvaluesofm.Asin[1],Sisthe areaofthecoverableregions,pisthewidthofthehistogrambucketsand p = p 2 d .Thearea ofthecoverableregionforthe n th bucketandthe m th levelofthetreeisdenotedas S np;m TherecentunpublishedworkofChenandTu[37]givesthefollowinganalysisofthe coverableregions. Form=1,therearetwocasesn=1andn=2,theareaofcoverableregionsforthesetwo casescanbecalculatedas S p; 1 =4 8 < : 1 2 p 2 arctan q p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 2 d 2 )]TJ/F15 10.9091 Tf 12.104 7.38 Td [(1 2 d 2 r p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 2 9 = ; S 2 p; 1 =4 8 < : 1 2 2 p 2 arctan q 2 p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 2 d 2 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(1 2 d 2 r 2 p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 2 9 = ; )]TJ/F15 10.9091 Tf 8.485 0 Td [(4 8 < : 1 2 p 2 arctan q p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 2 d 2 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(1 2 d 2 r p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 2 9 = ; Thegeneralformulaform=1is 14

PAGE 21

S np; 1 =4 8 < : 1 2 np 2 arctan q np 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 2 d 2 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(1 2 d 2 r np 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 2 9 = ; )]TJ/F15 10.9091 Tf 8.485 0 Td [(4 8 < : 1 2 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 p 2 arctan q n )]TJ/F15 10.9091 Tf 10.91 0 Td [(1 p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 2 d 2 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(1 2 d 2 r n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 2 9 = ; Forn=1butm=3andm=5theformulasare: S p; 3 =4 1 2 p 2 2 +2 p d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 4 S p; 5 =4 1 2 p 2 2 +2 p d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 8 +2 p d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 4 + d )]TJ/F15 10.9091 Tf 10.91 0 Td [(2 d 8 d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 4 Thegeneralformulaforn=1is: S p; m +1 =4 1 2 p 2 2 +2 p d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 2 m +1 +2 p d )]TJ/F15 10.9091 Tf 10.91 0 Td [(2 d 2 m + d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 2 m +1 d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 2 m Similarly,forn=2butm=3andm=5,theformulasare: S 2 p; 3 =4 1 2 p 2 2 +2 p d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 4 )]TJ/F15 10.9091 Tf 10.909 0 Td [(4 8 < : 1 2 p 2 arctan q p 2 )]TJ/F24 7.9701 Tf 12.105 4.295 Td [(d 4 2 d 4 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(1 2 d 4 r p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 4 2 9 = ; S 2 p; 5 =4 1 2 p 2 2 +2 p d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 8 +2 p d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 4 + d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 8 d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 4 )]TJ/F15 10.9091 Tf 8.485 0 Td [(4[ 1 2 p 2 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(arctan d 2 )]TJ/F24 7.9701 Tf 12.105 4.296 Td [(d 8 q p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F24 7.9701 Tf 12.104 4.296 Td [(d 8 2 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(1 2 r p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F23 10.9091 Tf 12.105 7.38 Td [(d 8 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F23 10.9091 Tf 12.105 7.38 Td [(d 4 d 2 )]TJ/F23 10.9091 Tf 12.105 7.38 Td [(d 8 )]TJ/F15 10.9091 Tf 9.681 7.38 Td [(1 2 r p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F23 10.9091 Tf 12.105 7.38 Td [(d 4 2 )]TJ/F15 10.9091 Tf 10.91 0 Td [( d 2 )]TJ/F23 10.9091 Tf 12.105 7.38 Td [(d 8 d 2 )]TJ/F23 10.9091 Tf 12.105 7.38 Td [(d 4 ] 15

PAGE 22

Ingeneralform: S 2 p; m +1 =4 1 2 p 2 2 +2 p d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 2 m +1 +2 p d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 2 m + d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 2 m +1 d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 2 m )]TJ/F15 10.9091 Tf 8.485 0 Td [(4[ 1 2 p 2 arctan d 2 )]TJ/F24 7.9701 Tf 20.183 4.296 Td [(d 2 m +1 q p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F24 7.9701 Tf 23.061 4.295 Td [(d 2 m +1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(arctan d 2 )]TJ/F24 7.9701 Tf 15.561 4.295 Td [(d 2 m q p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F24 7.9701 Tf 15.561 4.295 Td [(d 2 m 2 )]TJ/F15 10.9091 Tf 9.681 7.38 Td [(1 2 r p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F23 10.9091 Tf 24.691 7.38 Td [(d 2 m +1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F23 10.9091 Tf 15.987 7.38 Td [(d 2 m d 2 )]TJ/F23 10.9091 Tf 30.193 7.38 Td [(d 2 m +1 )]TJ/F15 10.9091 Tf 9.681 7.38 Td [(1 2 s p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F23 10.9091 Tf 30.193 7.38 Td [(d 2 m +1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F23 10.9091 Tf 24.691 7.38 Td [(d 2 m +1 ] Forhighervaluesofmandni.e,m 2andn 2,theformulabecomes: S np; m +1 =4 1 2 np 2 2 +2 np d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 2 m +1 +2 np d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 2 m + d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 2 m +1 d )]TJ/F15 10.9091 Tf 10.909 0 Td [(2 d 2 m )]TJ/F15 10.9091 Tf 8.485 0 Td [(4[ 1 2 n )]TJ/F15 10.9091 Tf 10.91 0 Td [(1 p 2 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(arctan d 2 )]TJ/F24 7.9701 Tf 20.183 4.295 Td [(d 2 m +1 q n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 p 2 )]TJ/F15 10.9091 Tf 10.91 0 Td [( d 2 )]TJ/F24 7.9701 Tf 23.06 4.295 Td [(d 2 m +1 2 )]TJ/F15 10.9091 Tf 10.303 0 Td [(arctan d 2 )]TJ/F24 7.9701 Tf 15.561 4.295 Td [(d 2 m q n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F24 7.9701 Tf 15.561 4.295 Td [(d 2 m 2 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(1 2 r n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F23 10.9091 Tf 24.691 7.381 Td [(d 2 m +1 2 )]TJ/F15 10.9091 Tf 8.485 0 Td [( d 2 )]TJ/F23 10.9091 Tf 15.987 7.38 Td [(d 2 m d 2 )]TJ/F23 10.9091 Tf 30.192 7.38 Td [(d 2 m +1 )]TJ/F15 10.9091 Tf 12.104 7.38 Td [(1 2 s n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 p 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( d 2 )]TJ/F23 10.9091 Tf 30.192 7.38 Td [(d 2 m +1 2 )]TJ/F15 10.9091 Tf 8.485 0 Td [( d 2 )]TJ/F23 10.9091 Tf 24.69 7.38 Td [(d 2 m +1 ] Fortherstcell, p = p 2 d .Hencewecanalwaysusethisrelationbecausetheparameters p and d donotdependuponthelevel m Substitutingthevalueof p intermsof d inaboveequation: 16

PAGE 23

S np; m +1 =[2 n 2 +2 p 2 n )]TJ/F15 10.9091 Tf 21.509 7.38 Td [(2 2 m +1 +2 p 2 n )]TJ/F15 10.9091 Tf 16.099 7.38 Td [(2 2 m + )]TJ/F15 10.9091 Tf 21.509 7.38 Td [(2 2 m +1 )]TJ/F15 10.9091 Tf 16.099 7.38 Td [(2 2 m )]TJ/F15 10.9091 Tf 8.485 0 Td [(4[ n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(arctan q 2 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( 1 2 )]TJ/F21 7.9701 Tf 24.594 4.295 Td [(1 2 m +1 2 1 2 )]TJ/F21 7.9701 Tf 20.245 4.296 Td [(1 2 m +1 )]TJ/F15 10.9091 Tf 10.303 0 Td [(arctan 1 2 )]TJ/F21 7.9701 Tf 15.623 4.295 Td [(1 2 m q 2 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( 1 2 )]TJ/F21 7.9701 Tf 15.623 4.296 Td [(1 2 m 2 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(1 2 s 2 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 )]TJ/F15 10.9091 Tf 10.91 0 Td [( 1 2 )]TJ/F15 10.9091 Tf 30.304 7.38 Td [(1 2 m +1 2 )]TJ/F15 10.9091 Tf 8.485 0 Td [( 1 2 )]TJ/F15 10.9091 Tf 16.099 7.38 Td [(1 2 m 1 2 )]TJ/F15 10.9091 Tf 30.304 7.38 Td [(1 2 m +1 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(1 2 s 2 n )]TJ/F15 10.9091 Tf 10.91 0 Td [(1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( 1 2 )]TJ/F15 10.9091 Tf 20.911 7.38 Td [(1 2 m 2 )]TJ/F15 10.9091 Tf 8.485 0 Td [( 1 2 )]TJ/F15 10.9091 Tf 30.305 7.38 Td [(1 2 m +1 1 2 )]TJ/F15 10.9091 Tf 20.911 7.38 Td [(1 2 m ]] d 2 Theareaofthecoverableregionsforallthebucketsisdenotedbyfn,m.Itcanbefound byusingtheaboveformulasofcoverableregionSasfollows: L X n =1 f n; 1= 4 L 2 arctan p 8 L 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(1 2 p 8 L 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 d 2 Substituting p = p 2 d ontheformulaof S np )]TJ/F21 7.9701 Tf 6.586 0 Td [(1 Similarly,fromtheformulaof S np; 3 L X n =1 f n; 3= L X n =1 n 2 + p 2 n )]TJ/F28 10.9091 Tf 10.404 10.363 Td [(X n =2 L 4 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 arctan p 32 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 )]TJ/F15 10.9091 Tf 12.105 7.38 Td [(1 8 p 32 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 d 2 17

PAGE 24

Table3.2Percentageofthepairsofcellsthatcanberesolvedunderdierentlevelsofdensity maps m andtotalnumberofhistogrambuckets L computedwithMathematica6.0 m +1 m L=2 L=4 L=8 L=16 L=32 L=64 L=128 L=256 m=1 0.8068 0.8898 0.9413 0.9697 0.9846 0.9922 0.9961 0.9980 m=2 0.7596 0.7522 0.7505 0.7501 0.75002 0.75 0.75 0.75 m=3 0.6696 0.6670 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 m=4 0.7545 0.7510 0.7502 0.75 0.75 0.75 0.75 0.75 m=5 0.6677 0.6667 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 m=6 0.7521 0.7504 0.7502 0.75 0.75 0.75 0.75 0.75 m=7 0.6670 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 m=8 0.7510 0.7502 0.75 0.75 0.75 0.75 0.75 0.75 m=9 0.6668 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 m=10 0.7505 0.7501 0.75 0.75 0.75 0.75 0.75 0.75 m=11 0.6668 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 m=12 0.7502 0.75 0.75 0.75 0.75 0.75 0.75 0.75 m=13 0.6667 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 m=14 0.7501 0.75 0.75 0.75 0.75 0.75 0.75 0.75 m=15 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 0.6666 m=16 0.75006 0.75 0.75 0.75 0.75 0.75 0.75 0.75 Forthehighervaluesofm L X n =1 f n; 2 m +1= L X n =1 2 n 2 +2 p 2 n )]TJ/F15 10.9091 Tf 16.099 7.38 Td [(2 2 m + )]TJ/F15 10.9091 Tf 30.304 7.38 Td [(2 2 m +1 )]TJ/F15 10.9091 Tf 16.099 7.38 Td [(2 2 m )]TJ/F15 10.9091 Tf 8.485 0 Td [(4 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(arctan q 2 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( 1 2 )]TJ/F21 7.9701 Tf 24.595 4.295 Td [(1 2 m +1 2 1 2 )]TJ/F21 7.9701 Tf 24.595 4.295 Td [(1 2 m +1 )]TJ/F15 10.9091 Tf 10.303 0 Td [(arctan 1 2 )]TJ/F21 7.9701 Tf 15.622 4.295 Td [(1 2 m q 2 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( 1 2 )]TJ/F21 7.9701 Tf 19.184 4.295 Td [(1 2 m 2 )]TJ/F28 10.9091 Tf 10.303 18.655 Td [(" 1 2 s 2 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( 1 2 )]TJ/F15 10.9091 Tf 30.304 7.38 Td [(1 2 m +1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( 1 2 )]TJ/F15 10.9091 Tf 16.099 7.38 Td [(1 2 m 1 2 )]TJ/F15 10.9091 Tf 30.304 7.38 Td [(1 2 m +1 # )]TJ/F28 10.9091 Tf 10.303 18.654 Td [(" 1 2 s 2 n )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( 1 2 )]TJ/F15 10.9091 Tf 20.91 7.38 Td [(1 2 m 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [( 1 2 )]TJ/F15 10.9091 Tf 30.304 7.38 Td [(1 2 m +1 1 2 )]TJ/F15 10.9091 Tf 20.911 7.38 Td [(1 2 m # d 2 Asin[1],wecandeneLemma1forthisalgorithmwhichtellsthatthechancethatany pairofcellsisnotresolvabledecreasesbytwothirdinonelevelandthreefourthinthenext level.Theratioof m +1and m valuesfordierentlevelsofdensitymapsmand dierentnumberofhistogrambucketsLaredepictedintable3.2. Suppose , bethenoncoveringfactorsforthreeconsecutivelevelsofthe densitymap.Then 18

PAGE 25

= 2 3 = 3 4 Thisgivestheratioof valuesforeveryalternatelevelsandthatis: = 1 2 Theratioof valuesforeveryalternatelevelsissameasthattheratioof valuesofthe twoconsecutivelevelsofthequadtreeapproach,asdiscussedinpreviouschapter.Therefore, wecanviewthelevelbetweenthetwoalternatelevelsasanintermediatelevelassume m +0 : 5. Asdepictedingure3.7, m +0 : 5 m = 3 4 and m +1 m +0 : 5 = 2 3 gives m +1 m = 1 2 Withlemma1,wecancalculatethetimecomplexityofthealgorithmasfollows: AssumethatthereareIpairsofcellstoberesolvedon DM i .Onnextlevel,totalnumber ofcellpairsbecomes I 2 d Accordingtolemma1, 3 4 ofthemwillberesolvedleavingonly I 3 2 d +1 pairstoresolve. Onlevel DM i +2 2 3 of2 d [ I 3 2 d +1 ]willberesolved.Herethenumberbecomes I 2 d )]TJ/F21 7.9701 Tf 6.587 0 Td [(1 whichissameasthevalueof DM i +1 inearlieralgorithmwithquadtreeimplementation. Inthisway,thegeometricprogressionbecomesasfollows: I; I 3 2 d +1 ;I 2 d )]TJ/F21 7.9701 Tf 6.587 0 Td [(1 ; I 3 2 3 d ;I 2 2 d )]TJ/F21 7.9701 Tf 6.586 0 Td [(1 ;:::; I 3 2 nd )]TJ/F25 5.9776 Tf 7.782 3.361 Td [(n )]TJ/F22 5.9776 Tf 5.756 0 Td [(3 2 ;I 2 n 2 d )]TJ/F21 7.9701 Tf 6.586 0 Td [(1 .1 19

PAGE 26

Sum =[ I + I 2 d )]TJ/F21 7.9701 Tf 6.587 0 Td [(1 + I 2 2 d )]TJ/F21 7.9701 Tf 6.586 0 Td [(1 + ::: + I 2 n 2 d )]TJ/F21 7.9701 Tf 6.586 0 Td [(1 ]+[ I 3 2 d +1 + I 3 2 3 d + ::: + I 3 2 nd )]TJ/F25 5.9776 Tf 7.782 3.361 Td [(n )]TJ/F22 5.9776 Tf 5.756 0 Td [(3 2 ] .2 Fortherstgeometricprogression,thersttermisIandthecommonratiois2 d )]TJ/F21 7.9701 Tf 6.586 0 Td [(1 ,sowe getthesumofthegeometricprogressionas T c 1 N = I d )]TJ/F21 7.9701 Tf 6.586 0 Td [(1 N +1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 d )]TJ/F21 7.9701 Tf 6.586 0 Td [(1 )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 .3 OnemorelevelofdensitymapwillbebuiltwhenNincreasesto2 d N .Fromequation3.3we canget T c 1 d N =2 d )]TJ/F21 7.9701 Tf 6.586 0 Td [(1 T c 1 N )]TJ/F23 10.9091 Tf 10.909 0 Td [(o .4 FromthesecondGPofequation3.2,therstterm= I 3 2 d +1 andcommonratio=2 2 d )]TJ/F21 7.9701 Tf 6.587 0 Td [(1 hencethesumofthegeometricprogressionis T c 2 N = I 3 2 d +1 d )]TJ/F21 7.9701 Tf 6.587 0 Td [(1 N )]TJ/F22 5.9776 Tf 5.756 0 Td [(1 2 )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 2 d )]TJ/F21 7.9701 Tf 6.587 0 Td [(1 )]TJ/F15 10.9091 Tf 10.909 0 Td [(1 .5 T c 2 d N =2 d )]TJ/F21 7.9701 Tf 6.587 0 Td [(1 T c 1 N )]TJ/F23 10.9091 Tf 10.909 0 Td [(o .6 Applyingmastertheorem,toequation3.5andequation3.6separately,weget T c 1 N = N 2 d )]TJ/F22 5.9776 Tf 5.757 0 Td [(1 d T c 2 N = N 2 d )]TJ/F22 5.9776 Tf 5.757 0 Td [(1 d SinceTotalTimespentisthesummationonequation3.5andequation3.6,andthetime complexityofequation3.5andequation3.6aresame,thetimecomplexityoftheoperation is N 2 d )]TJ/F22 5.9776 Tf 5.757 0 Td [(1 d Ouranalysissaysthatthetimecomplexityofthealgorithmimplementingbinarytreeis sameasthatofthealgorithmimplementingquad-tree.Itmeansthatimplementingbinarytree approachinDM-SDHalgorithmdoesnotreallysavetimethanthealgorithmimplementing quad-treeapproach.Theadvantageoftheuseofbinarytreeapproachisthat,ifthecellsare 20

PAGE 27

Table3.3ComparisonbetweenrunningtimeofpreviousDM-SDHalgorithmimplementing quad-treeandnewDM-SDHalgorithmimplementingbinarytreeforresolution p =6500.0, maximumdistance=40000.0,minimumdistance=0.0,sizeofsmallchunk=50.0anduniform datadistribution No.ofAtoms Previousalgorithm T qPDH Newalgorithm T bPDH T qPDH T bPDH 100000 12.403473 11.597128 1.069529715 200000 19.094154 32.221161 0.59259671 400000 99.173138 92.599468 1.070990365 800000 152.360474 257.339976 0.592059098 1600000 790.500308 738.939081 1.069777372 3200000 1219.687797 2056.458933 0.593100974 6400000 6328.104035 5908.555371 1.071006978 12800000 9767.872919 16422.10191 0.594800408 neartheintermediatelevelasingure3.7areresolvable,quadtreemayloosethem.So,in caseofbinarytreebasedalgorithm,wecangetmorenumberofresolvablecells. 3.3ExperimentalResults ThealgorithmsareimplementedinCprogramminglanguageandvarioustestonsynthetic andrealdataareusedinexperiments.Aseriesofexperimentswereperformedtocompare andcontrasttheoutputsofbothofourapproaches.Fortheexperiments,wechosethebucket widthas p =6500.ThetimetakenbythepreviousalgorithmofDM-SDHimplementing quad-treeandtimetakenbynewalgorithmimplementingbinarytreeisasshownintable3.3. Notonlyfromtheanalysis,butfromtheexperimentaloutcomeaswell,wecanstatethatthe binarytreeapproachissimilartoquadtreeapproachintermsoftimecomplexitybutinsome casesitprovidesbetterperformanceandimprovesthetimecomplexitybyconstantfactor. Sincethetreeismoretallerandlessbushierthanthequadtree,wemaygetmorenumberof resolvablecellsanditsavestimebysomeconstantfactor. Thegure3.8showsthegraphofthetimetakenbyquadtreeimplementedandbinarytree implementedalgorithms.Inthegraph,qPDHsigniesthequadtreeimplementedalgorithm andbPDHsigniesthebinarytreeimplementedalgorithm.Figure3.9showstheratiooftime takenbypreviousalgorithmandnewalgorithmversusthetotalnumberofnodesingraph. Theseguresalsoshowthatimplementationofbinarytreeandquadtree,bothshowing similartimeforexecution. 21

PAGE 28

Input :Consider x min y min x max and y max betheminimumandmaximumxandy coordinates Initialize x span and y span as x max )]TJ/F23 10.9091 Tf 10.909 0 Td [(x min and y max )]TJ/F23 10.9091 Tf 10.909 0 Td [(y min ; if x span is greater than y span then Partitionthespacehorizontally; end else Partitionthespacevertically; end Initializethe x min x max y min and y max coordinatesofrst densitymapas x low x high y low and y high respectively; for All the levels do if the level is even then for All the nodes in that level do HavetheCoordinatesoftheparentnode; Partitionthespaceoftheparentnodeintotwoequalhalvesvertically andassigneachofthespaceforthetwochildrennodes; end end else for All the nodes in that level do Havethecoordinatesoftheparentnode; Partitionthespaceoftheparentnodeintotwoequalhalveshorizontally andassigneachofthespaceforthetwochildrennodes; if There are other nodes in the level then Incrementthe currentnode pointer else Incrementthe parent pointer end end end end end Figure3.5BuildBinaryTreealgorithm 22

PAGE 29

Figure3.6Aisthecellwithbucketwidth p ,bucket1istheregionboundedbycurvesC1to C8andbucket2istheregionboundedbycurvesD1toD8[1] Figure3.7Conceptualtreestructurewiththreedensitymapswherethehiddenlinesignies theintermediatedensitymap 23

PAGE 30

Figure3.8TimetakenbyquadtreeimplementedalgorithmqPDHandbinarytreeimplementedalgorithmbPDHvsnumberofnodesinxaxis Figure3.9Ratioofthetimetakenbypreviousalgorithmandnewalgorithminyaxisvs numberofnodesinxaxis 24

PAGE 31

CHAPTER4 INSPECTINGUNIFORMREGIONSHOPINGTOIMPROVETHE PERFORMANCE Itisfoundthatalthoughwehopedtosavesometimeimplementingbinarytreeapproach ofSDHinitially,itissameasquadtreebasedapproachintermsoftimecomplexity.Insearch forthemeansoffurtherimprovementsofspatialhistogram,inthischapter,itisproposed thatifwecouldndsomeuniformregionsinthetree,thealgorithmcanbemoreimproved. ConsidertwocellsAandB,ifwecouldconcludethatAandBareuniformregionsinthe tree,wedonotneedtogureoutwhetherAandBareresolvableornot,andwedonotneed tocomputethepoint-to-pointdistanceamongtheparticlesinthoseuniformregionsevenif AandBareirresolvableandatthehighestresolutionlevelofthedensitymap.Wecanhave thedistancedistributionfunctionofthoseuniformcellsandndthedistancehistogramusing thatdistributionfunction.Sincedeterminingwhetherthetwocellsresolveornotandnding thepoint-to-pointdistancesamongtheparticlesarethetwomajoroperationsoftheDM-SDH algorithm,onwhichthealgorithmspendsalotoftime,thetimewoulddenitelybesavedif wecouldskipthemajoroperationsforsomeofthecells. 4.1TheGoodnessofFitTest Thechi-squaretestispopularlyusedinthoseexperimentsinwhichdataisfrequencyor counts[16].ThegoodnessofttesttoanystatisticaltestlikeChisquaretest,Kolmogorov Smirnovtestandsoondescribeshowwellittsinthesetsofobservations.Inourcase, oncalculatingthep-valueimplementingChi-Squaretest,ifthep-valueisgreaterthansome specicconstantvalueweconsiderthenodeisuniformotherwisethenodeisnot.Ifthe nodeBasingure4.1isuniformallthechildrenofnodeBasdepictedinthegurebythe trianglearealsouniformandwedonotneedtocalculatethepoint-to-pointdistancesamong 25

PAGE 32

Figure4.1IfBisuniform,thechildrennodesofBarealsouniform Figure4.2IfBisuniform,thecellsinBarealsouniform thoseuniformnodes.Itcanalsobedescribedasingure4.2,ifBistheuniformcellinthe tree,allthecellsinsideBarealsouniform. TheteststatisticsofChi-squaretestcanberepresentedbytheequationgivenbelow: 2 = P n i =1 O i )]TJ/F24 7.9701 Tf 6.586 0 Td [(E i 2 E 2 i where 2 =theteststatisticthatasymptoticallyapproachesachi-squaredistribution. O i =anobservedfrequency. E i =anexpectedtheoreticalfrequencyasgivenbythenullhypothesis. n=thenumberofparticlesinourtermsineachnode. 26

PAGE 33

4.2R-forStatisticalComputing Risastatisticalcomputingtoolwhichhavebeenextensivelyusedbystatisticiansand bio-statisticians.ItisdevelopedatBelllaboratoriesbyJohnChambersandcolleagues[36]. Rprovidesawiderangeofstatisticalfunctionswhicharehighlyextensibleaswell.Inthis thesis,wehaveusedRtoimplementtheChi-Squaretest.Wehaveusedthestand-alone versionofmathlibraryfromRtoC. Input :Saynodebetheinputnode,andlevelbetherespectivelevelofthatnode for all the available nodes do totalcount=calculatethetotalnumberofparticles; end DOF=4 level )]TJ/F15 10.9091 Tf 10.909 0 Td [(1; expected= totalcount DOF ; t = i ; while t has children do for Each child k of t do observed=theparticlescontainedbyk; chisqval= chisqval + observed )]TJ/F24 7.9701 Tf 6.586 0 Td [(expected 2 expected 2 ; end t=rstchildoft; end pval= pchisq chisqval;DOF;TRUE;FALSE ; If pval 0 : 1Return1; ElseReturn0; Figure4.3FindingUniformRegionsalgorithm Inthealgorithm4.3,wehavepresentedtheideaofhowcanwendtheuniformregions, usingchi-squaretest.Ateverylevelofthedensitymap,thedegreeoffreedomchangesas totalnumberofnodesonthatlevel )]TJ/F15 10.9091 Tf 11.659 0 Td [(1becauseateverylevelthenumberoftotalnodes alsochange,theconstantfactortocomparethep-valuewehavechosenis0.1.pchisqisthe functionofthestand-alonelibraryofR.Aseriesofexperimentalanalysiscouldbedoneto ndoutanappropriateconstantfactor.Thegeneralchi-squaredistributiontableisshownin gure4.4. Table4.2showsthenumberofuniformregionsdetectedfor500,000numberofinput atoms.Wecangureoutthataswegotothelowerlevel,wecanndhigherpercentageof numberofuniformregions.Anotherdiscoveryisthatstartingfromthelevel3with64cells, 27

PAGE 34

Figure4.4Chi-Squaretable Table4.1Numberofuniformregionsfor500,000numberofatomsondierentlevels Level TotalNumberofNodes NumberofUniformNodes PercentageofUniformRegions 0 1 0 0 1 4 0 0 2 16 0 0 3 64 61 95.32 4 256 252 98.43 5 1024 1019 99.51 6 4096 4090 99.85 7 16384 16377 99.95 8 65536 65528 99.98 9 262144 262136 99.99 28

PAGE 35

wecanseeahighpercentageofuniformnodesthatgiveahighpercentageoftheuniform regions.Thismeansthatmostofthehighlevelregionscanbetreatedasasingleentityin theapproximatealgorithm,thatmeansforthoseregionsfor61nodesinthelevel3wedo notneedtocalculatethepointtopointdistancesandwedonotneedtondoutwhetherthe cellsareuniformornot.Thus,itgreatlyreducesthecomputationtime.Afterndingthe uniformregionsasshowninalgorithm4.3.Itwouldbeveryinterestingtoanalyzethetime complexityofouralgorithmafterimplementingtheideaofinspectionoftheuniformregions. Itcanbeconsideredasanimmediatefutureworkofthisresearch. 29

PAGE 36

CHAPTER5 CONCLUSIONANDFUTUREENHANCEMENTS 5.1Conclusion SDHwasrstimplementedinquad-tree,andwedidnotknowwhetherthatwastheoptimalalgorithmornot.Inthisthesis,wehaveshowntheanotherapproachofthecomputation ofSDHusingbinarytreeapproach.Thebinarytreeapproachhasthetimecomplexityof N 2 d )]TJ/F22 5.9776 Tf 5.756 0 Td [(1 d wheredisthedimension.Inthisthesis,wearedealingwiththetwodimensional dataonly,sothetimecomplexityoftheprogramfor d =2is N 3 2 .Thisapproachisdefinitelyfasterthanthebruteforceapproach.Ourexperimentalresultsshowthatthetime complexityofthealgorithmusingquad-treeapproachissameasthatofthealgorithmusing thebinarytreeapproach.Insomecases,binarytreeapproachdoesprovideimprovementof timebyaconstantfactor. Thisthesispresentsanideaofimplementationofbinarytreeinthedensitymapsand henceimplementationofrectangularcellshapeintheDM-SDHconcept. Thisthesisalsodealswithsomestatisticaltestsonthedatacontainedbythetree.The uniformnodesarefoundoutusingthechi-squaretest.Ourintuitionsaysthatifwewould implementtheideaoftheuniformregionsintheDM-SDHalgorithm,wecouldbeableto improvethealgorithmintermsoftime,sincemostofthehighlevelregions,duetothe uniformlydistributedparticles,canbetreatedasasingleentityintheapproximatealgorithm. Analyzingthetimecomplexityafterinspectingtheuniformregionsisbeyondthescopeofthis thesis. 5.2FutureEnhancements Althoughtherearemanyrelationaldatabasesystems,noneofthemareworkingperfectly intheeldofstoringandanalyzingscienticparticlesdata,becausetheyareparticularly 30

PAGE 37

designedtostore,analyzeandhandlewiththedatainbusinessenvironment.Therecanbeso manywaystoenhancetheconceptofSpatialDistanceHistogram".Wecanalsoimplement thisconceptinvariousothertypesoftreestondouttheoptimalsolution.Thespace partitioningmethodswithcellshapesotherthansquareasdealtinquadtreeandrectangle asdealtinbinarytreecanalsobeimplemented. IncaseofDM-SDHalgorithm,wecomputethedistancebetweenalltheirresolvableparticlesatthehighestresolutionlevel.Byimplementingthenovelideaofuniformregions,we maydecreasetheno.ofcomputations,andhenceimprovetheeciency.Asalreadydiscussed inpreviouschapter,calculationoftimetakenbythealgorithmbyexperimentalandanalytic meanswouldbeconsideredasanimmediatefuturework. Inthisthesis,theoptimizationofthealgorithmbaseduponI/Ocostsarenotdiscussed. Thealgorithmcouldcertainlybeimprovediftheadvantagesofpre-fetchingmechanismcould beimplemented.ThewaystoimprovetheI/Operformanceofalgorithmcanalsobestudied. 31

PAGE 38

REFERENCES [1]Y-C.Tu,S.Chen,andS.Pandit"ComputingSpatialDistanceHistogramsEcientlyin ScienticDatabases",DepartmentofComputerScienceandEngineering,Universityof SouthFlorida,2008. [2]H.Samet,"TheDesignandAnalysisofSpatialDataStructures",UniversityofMaryland, Addison-Wesley,Reading,MA,1990. [3]D.FrenkelandB.Smit,"UnderstandingMolecularSimulationsfromAlgorithmtoApplications",ser.ComputationalScienceSeries.AcademicPress,2002,vol.1. [4]J.Gray,D.Liu,M.Nieto-Santisteban,A.Szalay,D.DeWitt,andG.Heber,"Scientic DataManagementintheComingDecade",SIGMODRecord,vol.34,no.4,Dec.2005. [5]Y-C.TuandS.Chen"PerformanceAnalysisofDual-TreeAlgorithmsforComputing DistanceHistograms" [6]P.N.Yianilos,"DataStructuresandAlgorithmsforNearestNeighborSearchinMetric Spaces"inProceedingsofACM-SIAMSymposiumonDiscreteAlgorithmsSODA,1993 [7]M.A.Nieto-Santisteban,J.Gray,A.S.Szalay,J.Annis,A.R.Thakar,andW.J. O'Mullane,"WhenDatabaseSystemsMeettheGrid"Proceedingsofthe2005CIDR Conference [8]R.Agrawal,A.Ailamaki,P.A.Bernstein,E.A.Brewer,M.J.Carey,S.Chaudhuri, A.Doan,D.Florescu,M.J.Franklin,H.Garcia-Molina,J.Gehrke,L.Gruenwald,L. M.Haas,A.Y.Halevy,J.M.Hellerstein,Y.E.Ioannidis,H.F.Korth,D.Kossmann, S.Mad-den,R.Magoulas,B.C.Ooi,T.O'Reilly,R.Ramakrishnan,S.Sarawagi,M. Stonebraker,A.S.Szalay,andG.Weikum,"Theclaremontreportondatabaseresearch," Commun.ACM,vol.52,no.6,pp.56-65,2009. [9]M.Y.Eltabakh,M.Ouzzani,W.G.Aref,"bdbms-ADatabaseManagementSystemfor BiologicalData",3rdBiennialConferenceonInnovativeDataSystemsResearchCIDR January7-10,2007,Asilomar,California,USA. [10]J.M.Patel,"TheRoleofDeclarativeQueryinginBioinformatics",OMICSAJournalof IntegrativeBiology,Volume7,Number1,2003,MaryAnnLiebert,Inc. [11]B.Nam,A.Sussman,"AComparativeStudyofSpatialIndexingTechniquesforMultidimensionalScienticDatasets",Proceedingsofthe16thInternationalConferenceon ScienticandStatisticalDatabaseManagementSSDBM041099-3371/042004IEEE [12]I.Narsky,"GoodnessofFit:WhatDoWeReallyWanttoKnow?",PHYSTAT2003, SLAC,Stanford,California,September8-11,2003 32

PAGE 39

[13]M.H.Ng,S.Johnston,B.Wu,S.E.Murdock,K.Tai,H.Fangohr,S.J.Cox,J.W.Essex,M.S.P.Sansom,P.Jereys,"BioSimGrid:Grid-enabledbiomolecularsimulationdata storageandanalysis"FutureGenerationComputerSystems22657664 [14]C.W.Bachman,"DATASTRUCTUREDIAGRAMS",JournalofACMSIGBDPVol1 No2March1969pages4-10. [15]B.Chan,J.Talbot,L.Wu,N.Sakunkoo,M.Cammarano,P.Hanrahan,"Vispedia:OndemandDataIntegrationforInteractiveVisualizationandExploration",SIGMOD09, June29July2,2009,Providence,RhodeIsland,USA.ACM978-1-60558-551-2/09/06. [16]A.E.Maxwell,"AnalysingQualitativeData",1961 [17]S.L.Meyer,"DataAnalysisforScientistsandEngineers",1975 [18]A.S.Szalay,J.Gray,A.Thakar,P.Z.Kunszt,T.Malik,J.Raddick,C.Stoughton,and J.vandenBerg,TheSDSSSkyserver:PublicAccesstotheSloanDigitalSkyServerData, inProceedingsofInternationalConferenceonManagementofDataSIGMOD,2002, pp.570581. [19]J.L.StarkandF.Murtagh,AstronomicalImageandDataAnalysis.Springer,2002. [20]A.Filipponi,TheradialdistributionfunctionprobedbyXrayabsorptionspectroscopy, J.Phys.:Condens.Matter,vol.6,pp.84158427,1994. [21]V.Springel,S.D.M.White,A.Jenkins,C.S.Frenk,N.Yoshida,L.Gao,J.Navarro, R.Thacker,D.Croton,J.Helly,J.A.Peacock,S.Cole,P.Thomas,H.Couchman, A.Evrard,J.Colberg,andF.Pearce,SimulationsoftheFormation,Evolutionand ClusteringofGalaxiesandQuasars,Nature,vol.435,pp.629636,June2005. [22]J.A.Orenstein,MultidimensionalTriesusedforAssociativeSearching,InformationProcessingLetters,vol.14,no.4,pp.150157,1982. [23]Y.Tao,J.Sun,andD.Papadias,Analysisofpredictivespatio-temporalqueries,ACM Trans.DatabaseSyst.,vol.28,no.4,pp.295336,2003. [24]I.Csabai,M.Trencseni,L.Dobos,P.Jozsa,G.Herczegh,N.Purger,T.Budavari,andA. S.Szalay,SpatialIndexingofLargeMultidimensionalDatabases,inProceedingsofthe 3rdBiennialConferenceonInnovativeDataSystemsResarchCIDR,2007,pp.207218. [25]M.Arya,W.F.Cody,C.Faloutsos,J.Richardson,andA.Toya,QBISM:Extendinga DBMStoSupport3DMedFicalImages,inICDE,1994,pp.314325. [26]B.Hess,C.Kutzner,D.vanderSpoel,andE.Lindahl,GROMACS4:Algorithmsfor HighlyEcient,Load-Balanced,andScalableMolecularSimulation,JournalofChemical TheoryandComputation,vol.4,no.3,pp.435447,March2008. [27]M.Feig,M.Abdullah,L.Johnsson,andB.M.Pettitt,LargeScaleDistributedData Repository:DesignofaMolecularDynamicsTrajectoryDatabase,FutureGeneration ComputerSystems,vol.16,no.1,pp.101110,January1999. [28]M.P.AllenandD.J.Tildesley,"ComputerSimulationofLiquids".ClarendonPress,2002, vol.1. 33

PAGE 40

[29]J.M.Haile"MolecularDynamicsSimulation:ElementaryMethods".Wiley,NewYork, 1992. [30]D.P.LandauandK.Binder."AGuidetoMonteCarloSimulationinStatisticalPhysics". CambridgeUniversityPress,Cambridge,2000. [31]M.Bamdad,S.Alavi,B.Naja,andE.Keshavarzi,"Anewexpressionforradialdistributionfunctionandinniteshearmodulusoflennard-jonesuids",Chem.Phys.Vol. 325,pp.554-562,2006. [32]Y-C.Tu,S.Chen,andS.Pandit"ComputingSpatialDistanceHistogramsEcientlyin ScienticDatabases",InProceedingsofInternationalConferenceonDataEngineering ICDE,pages726-807,March2009. [33]J.K.Uhlman,MetricTrees,AppliedMathematicsLetters,vol.4,no.5,pp.6162,1991. [34]J.BarnesandP.Hut"AHierarchical O N log N ForceCalculationAlgorithm".Nature, 324:446-449,1986 [35]P.B.CallahanandS.R.Kosaraju,"Adecompositionofmultidimensionalpointsets withapplicationstok-nearestneighborsandn-bodypotentialelds".JournalodACM, 42:67-90,1995. [36]R-resourcesandtutorials,websitelink:http://www.r-project.org/ [37]S.Chen,Y.Tu.Personalcommunication. [38]Chi-Squaretable,websitelink:http://www2.lv.psu.edu/jxm57/irp/chisquar.html 34