USF Libraries
USF Digital Collections

Combinatorial models for DNA rearrangements in ciliates

MISSING IMAGE

Material Information

Title:
Combinatorial models for DNA rearrangements in ciliates
Physical Description:
Book
Language:
English
Creator:
Angeleska, Angela
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla
Publication Date:

Subjects

Subjects / Keywords:
Assembly graph
Assembly number
Assembly word
DNA recombination model
Polygonal path
Dissertations, Academic -- Mathematics and Statistics -- Doctoral -- USF   ( lcsh )
Genre:
non-fiction   ( marcgt )

Notes

Summary:
ABSTRACT: Motivated by genome rearrangements that take place in some species of ciliates we introduce a combinatorial model for these processes based on spatial graphs. This model builds up on two earlier models for pointer-guided DNA recombination (intramolecular model and intermolecular model) and is influenced by a molecular model for RNA guided DNA recombination. Despite their differences, the intermolecular and intramolecular model formalize the recombination events through rewriting operations applied on formal words. Both models predict the same set of molecules as a result of correct rearrangement. Here, we give an algorithm that for an input of scrambled gene structure outputs a set of strings which represents the expected set of molecules after complete assembly.Moreover, we prove that both the set of all realistic words (words that model a possible gene structure) and the set of all nonrealistic words are closed under the rewriting operations in the intramolecular model. We investigate spatial graphs that consist of 4-valent rigid vertices, called assembly graphs. An assembly graph can be seen as a representation of DNA molecule during certain recombination processes, in which 4-valent vertices represent molecular alignment of the recombination sites. We introduce a notion of polygonal path in assembly graph as a model for a single gene. Polygonal paths are defined as paths that make "90° turn'' at each vertex of the assembly graph and define smoothing of the vertices visited by the paths. Such vertex smoothing models a homologous DNA recombination. We investigate the minimal number of polygonal paths that visit all vertices of a given graph exactly once, called assembly number.We prove that for every positive integer n there is assembly graph with assembly number n. We also study the relationship between the number of vertices in assembly graph and its assembly number. One of the results is that every assembly graph with assembly number n has at least 3n-2 vertices. In addition, we show that there is an embedding in three dimensional space of each assembly graph with a given set of polygonal paths, such that smoothing of vertices with respect to the polygonal paths results in unlinked circles. We study the recombination strategies by subsets of vertices. Such a subset is called a successful set if smoothing of all vertices from the set with respect to a polygonal path results in a graph that contains the polygonal path in a single component. We characterize the successful sets in a given assembly graph by a notion of complementary polygonal path.Furthermore, we define a smoothing strategy in assembly graph relative to a polygonal path as a sequence of successful sets which model a successive DNA recombinations for correct gene assembly. Recent experimental results suggest that there might be different pathways for unscrambling a gene. These results lead to a mathematical model for gene recombination that builds upon the intermolecular model. We introduce assembly words as a formalization of a set of linear and circular DNA molecules. Assembly words are partially ordered, so that any linearly ordered subset models a pathway for gene rearrangement. We suggest two different pathways for unscrambling of the actin I gene in O.Trifallax and we prove that they are the only theoretically possible pathways.
Thesis:
Dissertation (Ph.D.)--University of South Florida, 2009.
Bibliography:
Includes bibliographical references.
System Details:
Mode of access: World Wide Web.
System Details:
System requirements: World Wide Web browser and PDF reader.
Statement of Responsibility:
by Angela Angeleska.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 139 pages.
General Note:
Includes vita.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 002063282
oclc - 556064086
usfldc doi - E14-SFE0002998
usfldc handle - e14.2998
System ID:
SFS0027315:00001


This item is only available as the following downloads:


Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam 2200409Ka 4500
controlfield tag 001 002063282
005 20100316123028.0
007 cr mnu|||uuuuu
008 100316s2009 flu s 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0002998
035
(OCoLC)556064086
040
FHM
c FHM
049
FHMM
090
QA36 (Online)
1 100
Angeleska, Angela.
0 245
Combinatorial models for DNA rearrangements in ciliates
h [electronic resource] /
by Angela Angeleska.
260
[Tampa, Fla] :
b University of South Florida,
2009.
500
Title from PDF of title page.
Document formatted into pages; contains 139 pages.
Includes vita.
502
Dissertation (Ph.D.)--University of South Florida, 2009.
504
Includes bibliographical references.
516
Text (Electronic dissertation) in PDF format.
520
ABSTRACT: Motivated by genome rearrangements that take place in some species of ciliates we introduce a combinatorial model for these processes based on spatial graphs. This model builds up on two earlier models for pointer-guided DNA recombination (intramolecular model and intermolecular model) and is influenced by a molecular model for RNA guided DNA recombination. Despite their differences, the intermolecular and intramolecular model formalize the recombination events through rewriting operations applied on formal words. Both models predict the same set of molecules as a result of correct rearrangement. Here, we give an algorithm that for an input of scrambled gene structure outputs a set of strings which represents the expected set of molecules after complete assembly.Moreover, we prove that both the set of all realistic words (words that model a possible gene structure) and the set of all nonrealistic words are closed under the rewriting operations in the intramolecular model. We investigate spatial graphs that consist of 4-valent rigid vertices, called assembly graphs. An assembly graph can be seen as a representation of DNA molecule during certain recombination processes, in which 4-valent vertices represent molecular alignment of the recombination sites. We introduce a notion of polygonal path in assembly graph as a model for a single gene. Polygonal paths are defined as paths that make "90 turn'' at each vertex of the assembly graph and define smoothing of the vertices visited by the paths. Such vertex smoothing models a homologous DNA recombination. We investigate the minimal number of polygonal paths that visit all vertices of a given graph exactly once, called assembly number.We prove that for every positive integer n there is assembly graph with assembly number n. We also study the relationship between the number of vertices in assembly graph and its assembly number. One of the results is that every assembly graph with assembly number n has at least 3n-2 vertices. In addition, we show that there is an embedding in three dimensional space of each assembly graph with a given set of polygonal paths, such that smoothing of vertices with respect to the polygonal paths results in unlinked circles. We study the recombination strategies by subsets of vertices. Such a subset is called a successful set if smoothing of all vertices from the set with respect to a polygonal path results in a graph that contains the polygonal path in a single component. We characterize the successful sets in a given assembly graph by a notion of complementary polygonal path.Furthermore, we define a smoothing strategy in assembly graph relative to a polygonal path as a sequence of successful sets which model a successive DNA recombinations for correct gene assembly. Recent experimental results suggest that there might be different pathways for unscrambling a gene. These results lead to a mathematical model for gene recombination that builds upon the intermolecular model. We introduce assembly words as a formalization of a set of linear and circular DNA molecules. Assembly words are partially ordered, so that any linearly ordered subset models a pathway for gene rearrangement. We suggest two different pathways for unscrambling of the actin I gene in O.Trifallax and we prove that they are the only theoretically possible pathways.
538
Mode of access: World Wide Web.
System requirements: World Wide Web browser and PDF reader.
590
Co-advisor: Natasha Jonoska, Ph.D.
Co-advisor: Masahiko Saito, PhD
653
Assembly graph
Assembly number
Assembly word
DNA recombination model
Polygonal path
690
Dissertations, Academic
z USF
x Mathematics and Statistics
Doctoral.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.2998



PAGE 1

CombinatorialModelsforDNARearrangementsinCiliates by AngelaAngeleska Adissertationsubmittedinpartialfulllment oftherequirementsforthedegreeof DoctorofPhilosophy DepartmentofMathematicsandStatistics CollegeofArtsandSciences UniversityofSouthFlorida MajorProfessor:NatasaJonoska,PhD MajorProfessor:MasahikoSaito,PhD GordonFox,PhD Xiang-DongHou,PhD GregoryMcColm,PhD DateofApproval: May,202009 Keywords:assemblygraph,assemblyword,assemblynumber, DNA recombinationmodel,polygonalpath c r Copyright2009,AngelaAngeleska

PAGE 2

Acknowledgements Thisdissertationiswrittenundersupervisionofmyadviso rsDr.Natasa JonoskaandDr.MasahikoSaito.Theimpactthattheymadeonm yoverall developmentasascholarissubstantial.Theirprofessiona lismandexpertisesuccessfullyledmethrougheveryaspectofmyentireresearch. Theywerevery patientandencouraging,alwaysavailableforanadviceora discussion,which mademyacademicjourneyveryenjoyableexperience.Formet heywillalways exemplifyprofessional,successfulanddevotedacademics .Iexpressmygenuine gratitudeforallofthat. IwouldliketoacknowledgeDr.LauraLandweberforsupporti ngmymathematicalstudywithusefuladviceandexperimentaldata.The collaborationwith herprovidedbiologicalmotivationforpartsoftheresearc hpresentedinthisdissertation.IwishtothankthecommitteemembersDr.McColm, Dr.Houand Dr.Foxfortheirwillingnesstohelpinthecompletionofthi sdissertation.Itruly respecttheirprofessionalopinionandcomments.Ialsorec ognizemyundergraduateadvisorDr.Dimoski,whopromotedmydeterminationforg raduateeducation inmathematics. Inaddition,IwouldliketothankDr.McWaters,Dr.Rimbeyan dallfaculty membersfromtheDepartmentofMathematicsatUSFfortheirc ollaboration.I extendmythankstoBeverly,Maryann,Sarina,Denise,Aya,N ancyandBarbara. Iwouldliketoexpressmydeepestgratitudetowardsmyclose stfamily;mom Blagica,dadAngele,sisterBiljanaandbrotherDarko.Even thoughmilesaway duringmygraduatestudies,theywerealwayspresentformew iththeirsupport. Ithankthemfortheunconditionalcareandencouragement. IamespeciallythankfultoVladimir,whowasalwaystherefo rmethrough theyearsthatIspendpursuingmydoctoratedegree.Hisunre servedlove,understandingandcarehelpedmesustaininallofmyendeavors .Hehasbeennot onlybestcompanionandfriend,butalsogreatcolleagueand critic.2

PAGE 3

IsincerelythankAna,Elka,Enver,Ferenc,Irena,Mintie,M ende,Sasefor beingkindandsupportivefriends. SpecialthankstomyspecialfriendAdi.3

PAGE 4

TableofContents Abstract viii 1Introduction 1 2BiologicalMotivation 7 2.1HomologousDNARecombination..................72.2GeneRearrangementsinciliates...................1 0 2.2.1CharacteristicsofCiliates..................102.2.2Conjugationinciliates....................122.2.3FromMicronuleartoMacronuclearDNA..........14 3PointerGuidedModelsforDNARearrangements19 3.1IntramolecularModel.........................20 3.1.1MDS-IESdescriptors.....................233.1.2ReductionStrategies.....................32 3.2IntermolecularModel.........................383.3InvariantProperties..........................41 4TemplateGuidedModelsforDNARearrangements47 4.1HomologousRecombinationwithdsRNATemplates....... .47 4.2HomologousRecombinationwithssRNATemplates....... .52 4.3Ld,HiandDladthroughtheTemplateGuidedModel....... 55 4.3.1LoopRecombinationwithTemplates............554.3.2DoubleLoopRecombinationwithTemplates........574.3.3HairpinRecombinationwithTemplates...........59 i

PAGE 5

5AssemblyGraphs 62 5.1GeneRearrangementsviaAssemblyGraphs............6 3 5.2DenitionsandNotations......................655.3PolygonalPathsandHamiltonianSets...............7 1 5.4AssemblyNumbers..........................735.5WordsandAssemblyGraphs.....................765.6SmoothingsofAssemblyGraphs...................84 6SimultaneousDNArearrangementsthroughAssemblyGraphs 91 6.1SimultaneousSmoothings......................916.2PartialSmoothings..........................956.3SmoothingStrategies.........................107 7AssemblyWords 111 7.1DenitionsandNotations......................1117.2Rewritingrules............................1137.3OrderingofAssemblyWords.....................1157.4AssemblyGraphsandAssemblyWords...............1167.5ApplicationofAssemblyWordstoExperimentalData.... ..121 8Conclusion 130 References 134 AbouttheAuthor EndPageii

PAGE 6

ListofFigures Figure2.1DoublestrandedDNAsegmentrepresentedasapair ofcomplementarysequencesoveralphabet f A;C;G;T g .......8 Figure2.2TwodoublestrandedDNAmolecules(X-left,Y-rig ht). XandYcontainidenticalsegments...................8 Figure2.3StepbystephomologousrecombinationoftwoDNA molecules...................................9 Figure2.4Imagesofciliates.1) Paramecium ,2) Oxytrichatrifallax ......10 Figure2.5Stepbystepconjugationprocess.............. .......13 Figure2.6SchematicrepresentationoftheC2geneof Oxytricha nova .(A)micronucleargene.(B)macronucleargene........15 Figure2.7Schematicrepresentationofthescrambledandas sembled pol geneof Oxytrichatrifallax .(A)micronucleargene, (B)macronucleargene..........................15 Figure2.8AnMDSnucleotidesequenceoftheUSG1micronucle ar genein Uroleptus (right)anditsreverse(left)............16 Figure2.9MicronuclearactinIgenein Oxytrichianova .............16 Figure2.10MacronuclearactinIgenein Oxytrichianova .............17 Figure2.11AsegmentcontainingMDS 4 andMDS 5 oftheUSG1 micronucleargenein Uroleptus .....................17 Figure2.12MicronuclearactinIgenein Oxytrichianova including thepointersequences...........................18 Figure3.1Pointerrecombination.(A)Apairofpointers3al ign sidebyside.(B)Thepointersrecombineashomolog-oussequences.(C)Tworesultingmolecules,suchthatoneofthemcontainsMDS 2 andMDS 3 connectedthroughacopyofpointer3.........................20 Figure3.2Ahairpinrecombination.(A)Apairofpointers2a ligniii

PAGE 7

sidebyside,byformingahairpin.(B)Thepointersre-combineashomologoussequences.(C)Theresultingmoleculecontains MDS 1 and MDS 2 connectedthrough acopyofpointer2............................21 Figure3.3Adoublelooprecombination.(A)Aportionofthea ctin Imicronuclargenein O.trialax .(B)Thepairofpointers6andthepairofpointers7align,byformingadou-bleloopandtheyrecombine.(C)Theresultingmolecule containMDS 5 ; MDS 6 andMDS 7 joinedcorrectly...........22 Figure3.4SchematicrepresentationofthemicronuclearC2 gene includingthepointers2,3and4....................26 Figure3.5Schematicrepresentationoftheintermediatemo lecule obtainedfromthemicronuclearC2geneafterloopre-combinationofpointer3.........................26 Figure3.6Schematicrepresentationoftheintermolecular operation op 1......................................38 Figure3.7Schematicrepresentationoftheintermolecular operation op 2......................................39 Figure4.1LadderrepresentationofDNAsegments X;Y and T where correspondstoMDS i r toMDS i+1 topointer i +1and and correspondtoIESs..............48 Figure4.2StepbystepmodelforDNArecombinationguidedby doublestrandedRNAtemplate.....................49 Figure4.3Thepositionsofcutsinmolecules X and Y correspondingtocutpoints c 1 ;c 2 ;c 3 ;c 4 inFigure4.2(E)...........50 Figure4.4Thepossiblepositionsofcutsinmolecules X and Y .........51 Figure4.5Theresultingmoleculesaftertherecombination ...........52 Figure4.6Schematicrepresentationofstepbysteprecombi nation guidedbyassRNAtemplate......................53 Figure4.7Step-by-stepprocessoflooprecombinationiv

PAGE 8

withatemplate..............................56 Figure4.8Step-by-stepprocessofdoublelooprecombinati on guidedbyatemplate...........................57 Figure4.9Step-by-stepprocessofhairpinrecombination guidedbyatemplate...........................60 Figure5.1Schematicrepresentationofthepointeralignme ntshown asa4-valentvertexandtherecombinationresultshownassmoothingofthevertex.......................63 Figure5.2SchematicrepresentationofmicronuclearActin Igene in Oxytrichanova .............................64 Figure5.3Schematicrepresentationofthesimultaneous braidingandrecombinationprocess..................65 Figure5.4AssemblyGraphs............................ .71 Figure5.5PolygonalPathsandHamiltonianSets.......... ......73 Figure5.6Theassemblygraph 1 hasassemblynumber1(left). Theassemblynumberof 2 is2(right)................74 Figure5.7Assemblygraphs 1 and 2 suchthat An( 1 2 ) 6 =An( 2 1 )........................75 Figure5.8Assemblygraphwithassemblynumber n ...............76 Figure5.9Removingavertex v = v i = v k ,thecasewhen i +1 6 = k .......80 Figure5.10Unrealizablegraphs....................... .....83 Figure5.11Twotypesofsmoothings,parallel( p -)smoothing(left) andnon-parallel( n -)smoothing(right)................85 Figure5.12Smoothingatavertex v withrespecttoapolygonal path^ r ....................................85 Figure5.13Smoothingtheassemblygraph(left)withrespe ctto theHamiltonianpolygonalpath r .Theresultissmoothedgraph ~ r (right)..........................86 Figure5.14AssemblygraphthatcorrespondstoMDS 2 IES 1 MDS 4 IES 2 MDS 3 IES 3 MDS 1 .................87v

PAGE 9

Figure5.15TwoHamiltonianpolygonalpathsresultingindi stinct smoothedembeddings..........................87 Figure5.16(A)Propercoloringofavertexandsmoothingcor respondingtothepropercoloring.(B)Anexampleofapropercoloringforanassemblygraph\representing" thetrefoilandtheresultingcomponentsafterthesmoothing....................................90 Figure6.1Fourdistinctoccurrencesofpointer i in G andtheir correspondingsmoothings........................94 Figure6.2AnassemblygraphmodelingactinIgeneof S.nova AHamiltonianpathinthisassemblygraphisindica-tedwithathickenedline.........................98 Figure6.3Threechoicesofpartialsmoothings.(A)Example of alinearset.(B)Exampleofasuccessfulbutnotli-nearset(C)Exampleofanon-linearnon-successfulset......................................99 Figure6.4ComplementarypathsfortheHamiltonianpathdep ictedinFigure6.2.............................100 Figure6.5Successfulandunsuccessfulsmoothings.(A)Suc cessful smoothing(B)Unsuccessfulsmoothing................10 7 Figure6.6(A)AnexampleofanassemblygraphandthecorrespondingHamiltonianpolygonalpath.(B) S 1 -smoothingofthegraphin(A),forthesuccessfulset S 1 = f 2 ; 3 g .(C) S 2 -smoothingofthegraphin(A),fortheset S 2 = f 5 ; 6 g S 2 isnotsuccessfulandthereforecannot betherstsmoothingsetinasmoothingstrategy.........1 08 Figure6.7(A)AnexampleofanassemblygraphandthecorrespondingHamiltonianpolygonalpath.(B) S 1 -smoothingofthegraphin(A),forthesuccessfulset S 1 = f 2 ; 3 ; 4 g .(C) S 2 -smoothingofthegraphin(B),forthevi

PAGE 10

successfulset S 2 = f 5 ; 6 ; 7 g .(D) S 3 -smoothingofthe graphin(C),forthesuccessfulset S 2 = f 8 ; 9 g ...........110 Figure7.1(A)Therigidvertex v belongstosingletransverse component C .(B)Therigidvertex v belongstotwo transversecomponents C and C 1 ...................118 Figure7.2Parallelsmoothingofvertex v (left)andnonparallel smoothingofvertex v (right).....................119 Figure7.3SchematicrepresentationoftheactinImicronuc lear genein O.Trifallax ............................121 Figure7.4Twosmoothingstrategies((A)(B)(D)(E)(F)and( A) (B)(C)(F))asmodelsoftwodierentpathwaysforacmacronucleartinIgeneassembly...................122 Figure7.5FirstproposedpathwayforunscramblingactinIg ene in O.trifallax ,basedonthepossibleintermediatesobservedin[39]................................128 Figure7.6SecondproposedpathwayforunscramblingactinI gene in O.trifallax ,basedonthepossibleintermediatesobservedin[39]................................129vii

PAGE 11

CombinatorialModelsforDNARearrangementsinCiliates AngelaAngeleska ABSTRACT Motivatedbygenomerearrangementsthattakeplaceinsomes peciesofciliatesweintroduceacombinatorialmodelfortheseprocesse sbasedonspatial graphs.Thismodelbuildsupontwoearliermodelsforpointe r-guidedDNA recombination(intramolecularmodelintroducedin[22,23 ]andintermolecular modelintroducedin[35,36])andisinruencedbyamolecular modelforRNA guidedDNArecombination(introducedin[2]). Despitetheirdierences,theintermolecularandintramol ecularmodelformalizetherecombinationeventsthroughrewritingoperati onsappliedonformal words.Bothmodelspredictthesamesetofmoleculesasaresu ltofcorrectrearrangement.Here,wegiveanalgorithmthatforaninputofs crambledgene structureoutputsasetofstringswhichrepresentstheexpe ctedsetofmolecules aftercompleteassembly.Moreover,weprovethatboththese tofallrealistic words(wordsthatmodelapossiblegenestructure)andthese tofallnonrealistic wordsareclosedundertherewritingoperationsintheintra molecularmodel. Weinvestigatespatialgraphsthatconsistof4-valentrigi dvertices,called assemblygraphs.Anassemblygraphcanbeseenasarepresent ationofDNA moleculeduringcertainrecombinationprocesses,inwhich 4-valentverticesrepresentmolecularalignmentoftherecombinationsites.Wei ntroduceanotionof polygonalpathinassemblygraphasamodelforasinglegene. Polygonalpaths aredenedaspathsthatmake\90 -turn"ateachvertexoftheassemblygraph anddenesmoothingoftheverticesvisitedbythepaths.Suc hvertexsmoothing modelsahomologousDNArecombination.Weinvestigatethem inimalnumber viii

PAGE 12

ofpolygonalpathsthatvisitallverticesofagivengraphex actlyonce,called assemblynumber.Weprovethatforeverypositiveinteger n thereisassembly graphwithassemblynumber n .Wealsostudytherelationshipbetweenthenumberofverticesinassemblygraphanditsassemblynumber.On eoftheresultsis thateveryassemblygraphwithassemblynumber n hasatleast3 n 2vertices. Inaddition,weshowthatthereisanembeddinginthreedimen sionalspaceof eachassemblygraphwithagivensetofpolygonalpaths,such thatsmoothingof verticeswithrespecttothepolygonalpathsresultsinunli nkedcircles. Westudytherecombinationstrategiesbysubsetsofvertice s.Suchasubset iscalledasuccessfulsetifsmoothingofallverticesfromt hesetwithrespectto apolygonalpathresultsinagraphthatcontainsthepolygon alpathinasingle component.Wecharacterizethesuccessfulsetsinagivenas semblygraphby anotionofcomplementarypolygonalpath.Furthermore,wed eneasmoothing strategyinassemblygraphrelativetoapolygonalpathasas equenceofsuccessful setswhichmodelasuccessiveDNArecombinationsforcorrec tgeneassembly. Recentexperimentalresultssuggestthattheremightbedi erentpathways forunscramblingagene.Theseresultsleadtoamathematica lmodelforgene recombinationthatbuildsupontheintermolecularmodel.W eintroduceassembly wordsasaformalizationofasetoflinearandcircularDNAmo lecules.Assembly wordsarepartiallyordered,sothatanylinearlyorderedsu bsetmodelsapathway forgenerearrangement.Wesuggesttwodierentpathwaysfo runscramblingof theactinIgenein O.Trifallax andweprovethattheyaretheonlytheoretically possiblepathways.ix

PAGE 13

1Introduction Topicsofthisdissertationaremathematicalmodelsforgen erearrangementin ciliates.Theanalysisofgenomerearrangementsstartedab out70yearsagowith theworkofDobzhanskyandSturtevant(see[18]),whocompar edthegenearrangementswithinachromosomeindierentspeciesofDroso philia.Sincethen, generearrangementshavebeenobservedinvarietyofcellsi ncludingeukaryotic cells(theciliates[11,48,52]),somaticcellssuchascanc ercells[8,10].Therefore,understandingtheprocessesofgenomerearrangement issignicantfrom bothevolutionaryanddevelopmentalpointofview.Studies ofthesecomplexbioprocessesapplytechniquesnotonlyfrombiologybutalsofr omothereldsincludingmathematicsandcomputerscience.Dierentmathematic almodelsforDNA recombinationandrearrangementhavebeendeveloped.Some ofthemmodelthe DNAsequencereorganizationthroughformallanguages[6,2 5,30],whileothers modelthespatialDNAstructurethroughtopologyandknotth eory[28,51]. ThemotivationforourstudyisthegenerearrangementandDN Arecombinationeventsthathappeninciliates,agroupofunicellula rorganisms.Abrief expositionofthebiologicalbackgroundandmotivationfor ourworkisincluded inChapter2.Certainspeciesofciliates,suchas Oxytricha and Stylonychia undergocomplexDNArearrangement(see[11,44,45]),which makestheman idealmodelorganismsforstudyingtheseprocesses.Ciliat espossestwotypesof nuclei:macronucleusandmicronucleus.Micronucleargene sareinterruptedby non-codingsegments(internaleliminatedsequences,IESs ),whichdividethegene intosegmentscalledmacronucleardestinedsequences(MDS s,forshort).During themacronucleardevelopment,whenthemicronuclearDNAis transformedinto 1

PAGE 14

macronuclearDNA,amassiverearrangementhappens,whichi nvolvesunscramblingoftheMDSs,IESsremovalandMDSinversion. ThemathematicalstudyofthesecomplexDNAoperationsinci liatesembarked withtheworkofLilaKariandLauraLandweberin[35].Theymo deltheDNA rearrangementinciliatesthroughrewritingoperationsde nedonformalwords, aspreviouslyintroducedin[25].Wereviewthisintermolec ularmodelinSection 3.2.OurmodelofassemblywordsintroducedinChapter7isbu iltuponthe intermolecularmodelfrom[35].Itwasprovenin[30]thatth ecomputational powerofthemodelin[35]isequivalenttothepoweroftheUni versalTuring Machine.ThemodelandtheresultsofKariandLandweberinit iatedacourseof researchinformallanguagetheoryfocusedonthecomplexit yandcomputational propertiesofdierenttheoreticalmodelsforgenerearran gementsinciliates[14, 16]. Ontheotherhand,A.Ehrenfeucht,T.Harju,I.Petre,D.M.Pr escottandG. Rozenberginitiatedcombinatorialstudyforgeneassembly inciliatesin[19,22, 46].Theirconcentrationisonthe\geneassemblyitself,in cludingtopicssuch asthepossibleformsofthegenesgeneratedduringgeneasse mblyandpossible strategiesforthegeneassembly"[19].Theymodelgeneasse mblythroughthree rewritingoperations ld hi and dlad .Thegenesareformalizedbywordscalled descriptors,andthecorrectgeneassemblyisachievedbyas equenceofsuccessive applicationsof ld hi and dlad operations,calledreductionstrategy.Wereview thismodelinSection3.2,whereweprovethatthesetofallde scriptorscanbe partitionedintotwosets:thesetofrealisticdescriptors (stringsthatpossiblycorrespondtonaturallyoccurringstructures)andthesetofno nrealisticdescriptors, suchthatbothsubsetsareclosedundertherewritingoperat ions ld hi and dlad Eventhoughtheintermolecularandintramolecularmodelgi verisetodierent assemblypathways(recombinationstrategies)forgenerea rrangementinciliates, theresultingsetsofstrings(molecules)isthesameinboth .InSection3.3we giveanalgorithmwhichtakesastringthatcorrespondstoas crambledgeneasan inputandoutputsthesetofallstringsobtainedafterappli cationofanyreduction 2

PAGE 15

strategyfromtheintermolecularorintramolecularmodel. Theresultingsetof stringscorrespondstotheexpectedsetofmoleculesobtain edaftercorrectgene assembly. ItisbelievedthattheDNArearrangementsareconductedbyh omologous recombinationbetweenpairsofidenticalsequences,calle dpointers.Pointersoccur inpairs;attheendofeach n thMDSandatthebeginningofeach( n +1)st MDS.Boththeinterandintramolecularmodelsassumethatac orrectpairof pointers\aligns"and\splices",andtherefore,wecallthe m\pointerguided" models.Someofthepointersequencesmightbeveryshort(se e[11]),andmight havemultipleappearances.Hence,ifthepointersaretheon lyassemblyguides, numerousincorrectalignmentsandrecombinationeventsco uldoccur.Totake intoconsiderationthecorrectalignmentofshortpointers ,Prescottetal.postulate thatDNAmolecules,originatingfromtheoldmacronucleus, serveastemplates toguidethecorrectpointerrecombination.Theyproposeam olecularmodel fortemplate-guidedrecombinationin stichotrichous ciliatesin[47].Ithasbeen observedin[42]thatRNAmoleculescanserveastemplatesfo rrecombinationand IESeliminationinthedevelopingmacronucleus.Theseresu ltssupportourexplicit modelsfor RNA-guidedDNArecombination [2],thatwedescribeinChapter4. TheRNA-guidedmodelsassumethatthetemplatesareeitherd oublestranded RNA(Section4.1)orsinglestrandedRNA(Section4.2).Weal soshowinSection 4.3thattheoperations ld hi and dlad canbesimulatedbytheRNAguidedmodel ofrecombination. Thismodelwasabaseforthegraphtheoreticalinvestigatio nofgenerearrangementsthatwepresentinChapter5.Weusespatialgraph sasaphysical representationoftheDNAatthetimeofrecombination.Theh omologousrecombinationofthepointersisrepresentedbya4-valentrigidv ertexinthespatial graph,andtheresultingmoleculesafterrecombinationare modeledbyremoval (smoothing)oftheappropriatevertex.Theedgesinthespat ialgraphrepresent MDSorIESsequences.InSection5.2weintroduceanotionofa n assemblygraph asaconnectedgraphwithrigidverticesofvalency4.Weden ea polygonalpath 3

PAGE 16

inanassemblygraphasapaththatmakes\90 turn"ateveryvertex.Apolygonalpathmodelsasinglemacronucleargeneandthereforeitd eterminesthetype ofthesmoothingofa4-valentvertex.Thecompleteassembly ofamacronuclear generequiresrecombinationofallpairsofpointers,i.e., smoothingofall4valent verticesintheassemblygraph.Thus,weconsidersetsofpol ygonalpathsthat visiteveryvertexintheassemblygraphonce.Wecallsuchse ts Hamiltoniansets Wealsodenethe assemblynumber asacharacteristicoftheassemblygraphs. TheassemblynumberisthecardinalityoftheminimalHamilt oniansetforthe assemblygraph.Thisnumbercorrespondstotheminimalnumb erofgenesthat canresultfromtheDNArearrangement.WeshowinSection5.6 thatforany naturalnumber n thereisanassemblygraphwithanassemblynumber n .The relationshipbetweenthenumberofverticesintheassembly graphanditsassembly numberisalsostudied.WeshowinSection5.5thatallgraphs withassembly number n haveatleast3 n 2vertices. Theassemblygraphcanbeviewedasaspatialrepresentation ofDNArearragements.Smoothingsoftherigidverticesrelativetodi erentsetsofpolygonal pathsmightleadtodistinctspatialembeddingsoftheresul tinggraph.Someof theresultingstructuresmightbeknottedorlinked.Usingt echniquesusuallyseen inknottheory(see[41])weshowinSection5.6thatanyassem blygraphwitha givenHamiltonianset H ofpolygonalpathscanbeembeddedin R 3 ,suchthat thesmoothedgraphrelativetothepathsin H isanunlink. Smoothingofthecrossingsinvirtualknotdiagramsbyprope rcoloringhas beenintroducedbyKaumanin[32].Suchsmoothingcanbeeas ilytranslated intosmoothingofverticesinassemblygraphs.Wecompareth esmoothingsdenedbyHamiltoniansetsofpolygonalpathswiththesmoothi ngsdenedby Kauman.Inparticular,weshowthattheassemblynumberisl essorequalto thenumberofonecoloredcomponentsinproperlycoloredsmo othedassembly graph.Wealsoprovethatforanypositiveintegers m and n thereisanassemblygraphsuchthatassemblynumberofis m andtheminimalnumberof monochromaticcomponentsofis n .Thisresultdemonstratesthatsmooth4

PAGE 17

ingsdenedbyHamiltoniansetsofpolygonalpathsandsmoot hingsbyproper coloringsdenedierentoperationsontheassemblygraph. Weshow,inChapter6,thatanassemblygraphwithasingleHam iltonian polygonalpathcancapturetheMDS-IESstructureofamicron ucleargene.A Hamiltonianpolygonalpathcanbeseenasaproperlyassembl edmacronuclear gene.Wealsoprovethatthesimultaneoussmoothingofallve rticeswithrespectto aHamiltonianpolygonalpathresultsinagraphwithacompon entthatcontains theHamiltonianpolygonalpath.Itwasexperimentallyobse rvedin[39,53]that allrecombinationeventsmightnothappensimultaneously. Thus,webroaden thestudywiththenotionofpartialsmoothingsofassemblyg raphs.Partial smoothingsthatkeepthepolygonalpathsconnectedmightbe consideredasrearrangementsthatdonotdispersetheMDSsondierentmolec ules.Suchpartial smoothingsarecalled successful .Wecharacterizethesuccessfulsmoothingsina givenassemblygraphandweincorporatethemtodenepossib lerecombination strategiesinSection6.3. Theresultsin[39]suggestthattherearerearrangementsth atarelikelyto happenbeforeothers,andfurthermore,theremightbedier entpathwaysfor unscramblingagene.Theseresultsleadtoadevelopmentofo urmathematical modelforrecombinationthatusesformalwordsandcircular strings.InChapter 7wedene assemblywords assetsconsistoflinearandcircularwordstomodel thesetofmoleculespresentatacertaininstanceofthegene assembly.The pointerrecombinationisformalizedbyintroducingthreet ypesoftransformations (rewritingrules)ofassemblywords: deletion,insertion and inversion .Notethat deletionandinsertionoperationsinthismodelcoincidewi thoperationsdened in[35]. Wefollowtheconsecutivestepsintheprocessofgeneassemb lybyorderingthe assemblywords.Namely,astrictpartialorderisdenedont hesetofassembly words,suchthattwoassemblywordsarerelatedbythisorder ifoneassemblyword canbeobtainedfromtheotheronebyapplicationofacomposi tionof deletion, insertion or inversion operations.Wedeneanassemblystrategytobelinearly 5

PAGE 18

orderedsubsetofassemblywords,wheretheminimalelement isthescrambled micronucleargeneandthemaximalelementistheassembledm acronucleargene. Basedontheintermediatemoleculesthatwereexperimental lydetectedin[39], wepostulatetwodierentpathwaysfordescramblingActinI genein O.Trifallax Eachofthesepathwayscanbemodeledthroughanassemblystr ategy.Therefore, wevieweachassemblystrategyasapossiblepathwayofagene rearrangement. WesummarizeourinvestigationinChapter8,whereweconclu dewithopen problemsandideasforfutureresearch. 6

PAGE 19

2BiologicalMotivation 2.1HomologousDNARecombination ADNA(deoxyribonucleicacid)moleculeisamacromoleculec omposedoftwo polynucleotidechains,knownasDNAstrands(see[1]).Each strandconsistsof smallerbuildingblocks,callednucleotides.Anucleotide hasthreecomponents:a phosphategroup,asugarmoleculeandanitrogenousbase. Nucleotideslinktogetherthroughphosphodiesterbondsfo rmingaDNAstrand whichisoriented.Theorientationisindicatedbythe5 0 -endononeendandthe 3 0 -endontheotherend. Therearefourtypesofnitrogenousbases:adenine(A),cyto sine(C),guanine (G)andthymine(T).Dependingonthenucleotidebase,wedis tinguishfourtypes ofnucleotidesA,C,G,andT.Therefore,eachDNAstrandcanb econsideredas asequenceoverfourletteralphabet f A;C;G;T g ,usuallywrittenfrom5 0 -endto 3 0 -end.FormoredetailsabouttheDNAstructuresee[9]. ThedoublehelicalstructureoftheDNAmoleculeisobtained bytwopolynucleotideDNAstrandstwistedtogetherandhybridizedbyhyd rogenbondsbetween thebasesfromdierentstrands.Thisbasehydrogenbonding isspecic:anadeninebondswithathymineandguaninebondswithacytosine.I tissaidthatAis complementarytoTandGiscomplementarytoC.Asaconsequen ceofsuchcomplementarybasepairing,oneDNAstrandofthedoublehelixi scomplementary toitspartnerasasequence. ForsimplicitythedoublestrandedDNAsegmentsareoftenab stractedasa pairofcomplementarysequencesoverthealphabet f A;C;G;T g (seeFigure2.1). ThelengthoftheDNAmolecule(strand)ismeasuredbythenum berofbase 7

PAGE 20

pairs(nucleotides)containedinthesegment.Theunitisof tenwrittenasbp (shortforbasepairs)ornt(shortfornucleotides).Forins tance,wesaythatthe DNAsegmentgivenonFigure2.1is14bplong. C C C C C C T T T T T T G A A A A A A G G G G G G G C C Figure2.1:DoublestrandedDNAsegmentrepresentedasapai rofcomplementary sequencesoveralphabet f A;C;G;T g .TheDNAmoleculescanrecombine,enablinganexchangeofgen eticmaterial. ThereareseveraltypesofDNArecombinationmechanisms,am ongwhichthe homologousrecombinationiswidelyobserved. TwoDNAsegmentsarecalledhomologousiftheyareidentical orverysimilar assequencesofbasepairs.Homologousrecombinationisapr ocessinwhich segmentsareexchangedbetweentwohomologousDNAportions .Eventhough thehomologousrecombinationisnotwellunderstood,afewc entralstepsinthe processareknown. C C C C C T G T T T T T G A C A A A A A G G G G G G C C C C C C C C T T T T T T G A A A A A A G G G G G G G C C C G C G G C T A X Y Figure2.2:TwodoublestrandedDNAmolecules(X-left,Y-ri ght).XandYcontain identicalsegments. ThehomologoussegmentsofdoublestrandedDNAmolecules\r ecognize" eachotherandtheyalign. Bothstrandsineachdoublehelixarecut. ThebrokenDNAstrandsmigratetowardsthebrokenendsofthe opposite DNAmoleculeandtheyjoinwitheachother. 8

PAGE 21

C C C C C T G T T T T T G A C A A A A A G G G G G G C C C C C C C C T T T T T T G A A A A A A G G G G G G G C C C G C G G C T A X Y C C C C C T G T T T T T G A C A A A A A G G G G G G C C C G C G X cut cut C C C C C C T T T T T T G A A A A A A G G G G G G G C C G C T A Y cut cut C C T T T T T A A A G G G G C C C C T T T T A A G G G G G C C G G C C C C G T G A C A A G G C C G C C C T T G A A A A G G C T A C C T T T T T A A A G G G G C C G C C C T T T T A A G G G G G C G C C C C G T G A C A A G G C C G C C C T T G A A A A G G C T A (A) (B) (D) (C)1 2 3 4 Figure2.3:StepbystephomologousrecombinationoftwoDNA molecules.Theintroducedcutsareassumedtobeatthesamepositionsat bothDNA molecules.Therefore,thecreatedsinglestrandedoverhan gsareidentical.The DNAmoleculesexchangesegmentsandtwoDNAmoleculesareob tained,each containingasegmentoftheotherone. TheDNAmolecules(XandY)inFigure2.2contain12bpsegment sofidentical sequences.AstepbystephomologousrecombinationoftheDN AmoleculesXand YisdepictedinFigure2.3.Thealignmentofthemolecules X and Y alongtheir homologoussegmentsisshowninFigure2.3(A).Backbonecut sareintroducedon eachDNAstrandatthesamepositioninbothmolecules.Thelo cationofthecuts isdesignatedbyarrowsinFigure2.3(B).FourDNAsegmentsl abeledby1,2,3and 4areobtained,eachofwhichhas7bplongoverhang(seeFigur e2.3(C)).Thesingle strandedoverhangofthesegmentlabeledby1iscomplementa rytothesingle strandedoverhangofthesegment4.Therefore,theycanbond andformanew DNAmoleculethatcontainssegmentsfromboth X and Y .Similarly,segments2 9

PAGE 22

and3recombine.TheresultingDNAmoleculesaregiveninFig ure2.3(D).More detailsaboutthehomologousrecombinationcanbefoundin[ 1,9]. 2.2GeneRearrangementsinciliates Inthissectionthebasiccharacteristicsofciliatesarein troduced.Ciliatesare microorganismsthatarenotonlyfascinatingfromabiologi cal,butalsofroma computationalpointofview.TheprocessofmassiveDNArear rangementthat takesplaceinsomespeciesofciliatesisamotivationforou rentirestudy. 2.2.1CharacteristicsofCiliates Ciliatesareunicellulareukaryoticorganisms.Asinglece lledorganismofaciliate performsallofthefunctionsthatmulticellularorganisms performbutwithout specializedcells(see[43]).Thischaracteristicofcilia tes,includingtheirabundanceandshortgenerationtime,makesthemconvenientmode lorganismsfor dierentenvironmentalandevolutionarystudiesandteach ingpurposes. 1 2 Figure2.4:Imagesofciliates.(1) Paramecium (courtesyoftheEnglishWikipedia),(2) Oxytrichatrifallax (courtesyoftheHumanGenomeResearchInstitute). .Ciliatesinhabitalmostallwaterenvironmentsonearthinc ludingfreshwater, saltwaterandevenalltypesofmoistsoil.Theycanbefoundi nponds,rivers, lakes,seasoroceansateachlatitudeonearth,fromthepole stotheequator. 10

PAGE 23

Ciliatesareoneoftheoldestknowngroupof\animallike"be ingsthatconsumeandtransformfoodintoenergy.Theymainlyfeedonbact eriaandother unicellularorganisms.Itisbelievedthatciliatesorigin atedabouttwobillionyears ago,whichmakesthemanancientgroupoforganisms.Theyare alsoconsidered fromanevolutionarypointofviewaverysuccessfulgroupof organisms. Throughtheirlongexistenceandevolution,ciliatesdevel opedintoabout8,000 dierentspeciesknowntoday.Imagesofsomespeciesofcili atesaregivenin Figure2.4.Theirbodysizevariesfromafew m toafew mm .Besidestheir morphologicalvariance,ciliatesarealsoknownbytheirge neticdiversity. Despitethelargediversityofciliates,therearetwomainc haracteristicsthat arecommontoallmembersofthegroup.Therstfeatureisthe possessionof cilia(see[43]).Allciliatespossessciliaatsomestageof theirlifecycle.Thecilia arehairlikeorganellesthatcanbeattachedtodierentpar tsofthebody(see Figure2.4).Theyaregenerallyusedforlocomotionandfeed ing.Thesecond commonfeatureisthenucleardimorphism(see[50]).Thesin glecellofciliate possessestwotypesofnuclei;macronucleus(bignucleusor MAC,forshort)and micronucleus(smallnucleusorMIC,forshort).Therecanbe multiplecopies ofthemacronucleusorthemicronucleusinthecell.Themacr onucleusandthe micronucleusdierintheirmorphology,functionalityand geneticstructure. Themacronucleusisresponsibleforthevegetativelifeoft hecell.Itsfunctions areanalogoustothoseofanucleusofasomaticcellinamulti cellularorganismand thereforeitisalsocalledsomaticnucleus.Themacronucle argenesareintensely transcribedandtranslatedintoRNA.TheRNAsynthesizedin themacronucleus encodesalloftheproteinsneededforthecellstructureand metabolism. Ontheotherhand,themicronucleusislatentthroughmostof thelifecycle ofthecell.Itisinvolvedonlyinthegenomeexchangeduring sexualreproduction.Themicronucleusisdiploid(hastwocopiesofeachchr omosome).The micronuclearDNAisnottranscribedortranslated,itisonl ycarriedtothenext generation.Thesecharacteristicsresemblegermcellinam ulticellularorganism. Therefore,themicronucleusisoftenrefereedtoasagermli ne.Moredetailsre11

PAGE 24

gardingthenucleardimorphisminciliatescanbefoundin[5 0]. 2.2.2Conjugationinciliates Theciliatesreproducebothasexuallyandsexually(see[38 ]).Theasexualreproductionisbybinaryssion.Inthiscasetheciliatedivi desintotwoparts (osprings),eachofwhichisacompleteindividualcontain ingthegeneticmaterialoftheparent. Thesexualreproductionofciliatesconsistsoftwogeneral steps: Conjugation.Duringtheprocessofconjugation,twocells( theparents)pair andtheyinterexchangetheDNAcontentoftheirmicronuclei .Aftermixing thegermlinematerial,theparentsseparatetoformtwo\ide ntical"adult individuals,dierentfromtheirparents. Division.Afterconjugationthetwonewlyformedadultcell saredividedby binaryssion.Thedivisionyieldsfourosprings. Thematingprocessisusuallytriggeredbystarvation(orot herformsofstress) oftheciliates.Mixingthegenomesthroughconjugationena blesmanychangesin theparentalDNA.TheDNArearrangementsthathappenduring conjugationare ofgreatinterestforthisstudy,sowedescribethewholepro cessinmoredetails. InFigure2.5,theconjugationoftwoorganisms(representa tivesofthesame typeofciliate)isdepicted.Weassumethatbothcellshaveo nemicronucleusand onemacronucleus,eventhoughsomeciliatesmaypossesmult iplecopiesofeach. Thetwocellscometogethersidebysideandattach(Figure2. 5(1)).Their membranesfuseattheplaceofattachment,andacytoplasmic bridgeisformed betweenthecells(seeFigure2.5(2)).Themicronucleusofe achcellundergoes meiosis,asshowninFigure2.5(3).Thismeansthateachdipl oidgermlineis dividedintofourhaploid(containingonlyonecopyofeachc hromosome)micronuclei.Atthispointthereareeighthaploidmicronucle i,fourpercell.Two oftheresultinghaploidnucleiandthemacronucleusfromea chorganismdisintegrate.ThisisdepictedinFigure2.5(4).Inthenextstep,th etwocellsexchange 12

PAGE 25

2n 2n 2n 2n n n 2n 2n n n 2n 2n n n n n n n 2n 2n 2 3 4 5 1 8 7 6 Figure2.5:Stepbystepconjugationprocess.haploidmicronucleusthroughthecytoplasmicbridge(seeF igure2.5(5)).The cellsseparate.Eachofthedaughtercellshastwotypesofha ploidmicronuclei; onemicronucleustypeinheritedfromoneparentandtheothe rmicronucleustype inheritedfromtheotherparent.Thetwohaploidnuclei,wit hdierentorigin, fuse.Therefore,anewdiploidmicronucleusisformedineac hdaughtercell, whichisillustratedinFigure2.5(6).Themicronucleusdiv idesintotwodiploid germlines.Atthispoint,thedaughtercellspossestwocopi esofthegermline(see Figure2.5(7)).Furtherinthedevelopment,onecopyofthem icronucleusremains, whiletheotheronetransformsintomacronucleus.Withthed evelopmentofthe somaticnucleus,theconjugationprocesshavebeencomplet ed.Theresultingcells 13

PAGE 26

aregiveninFigure2.5(8). Tocompletethereproductioncycle,eachofthedaughtercel lsundergoesbinary division.Formoredetailsaboutthereproductionofciliat essee[38]. Theformationofthemacronucleusfromacopyofmicronucleu s,insome speciesofciliates,involvesmassiveDNArearrangementev ents.Similarrearrangementshavebeenobservedinmanyothercellsandmoreco mplexorganisms. Therefore,theciliatesprovideanexcellentbiologicalsy stemforstudyingthose processes. 2.2.3FromMicronuleartoMacronuclearDNA Inthissectionwefocusonthegenomerearrangementeventst hataccompanythe macronucleusdevelopmentfromadiploidcopyofmicronucle us.Sucheventshave beenobservedindierenttypesofciliates[11,45].Weprim arilyuse Stylonychia and Oxytricha asmodelorganismsforthisstudy.Inordertobettercompreh end theDNAreorganizationprocessthatisperformedduringthi stransformation, itisnecessarytounderstandtheDNAstructureofboththemi cronucleusand macronucleus.Inparticular,wewanttoconsiderthegenomi corganizationbefore andaftertheprocess. ThemicronuclearDNAisorganizedintochromosomes,suchth ateachchromosomecontainsoneDNAmolecule(see[33]).Thesemolecule sareverylong andcontainabundantrepetitivesequences.Ontheotherhan d,themacronuclearDNAishomogenous.UnlikethemicronuclearDNA,thema cronuclearDNA moleculesareshort,linearmolecules(oftencalled\nanoc hromosomes").Each macronuclearDNAmoleculeinspirotrichousciliatetypica llycontainsasingle gene. Thegenes,expressedinthemacronucleus,arealsopresenti nthemicronucleus,butwithdramaticallydierentstructure.Themicro nucleargenesaredividedintonumeroussmallerfragments.Thegenefragmentsa redispersedalong amicronuclearDNAmolecule,separatedbylongportionsofn on-coding(junk) DNA.Moreover,thereisevidence(see[11])thatsomeofthef ragmentscanbe 14

PAGE 27

evenspreadondierentlociwithinthesamemolecule,farfr omeachother. Thegenefragmentsinthegermlinearecalledmacronucleard estinedsegments orMDSs,forshort.Thenon-codingDNAsegmentsthatrankthe MDSsarecalled internaleliminatedsegments,orIESs,forshort. MDS1MDS2MDS3MDS4IES1IES2IES3 MDS1MDS2MDS3MDS4 (A) (B) Figure2.6:SchematicrepresentationoftheC2geneof Oxytrichanova .(A)micronucleargene.(B)macronucleargene.InFigure2.6(A)adiagramofmicronucleargeneknownasC2in Oxitrichia nova isgiven(see[44]formoredetails).ItiscomposedoffourMD Ss,representedasrectangularblockswhicharelabeledbyMDS 1 ; MDS 2 ; MDS 3 andMDS 4 ThegeneisinterruptedbythreeIESs(representedbylinese gments),whichare eliminatedduringthedevelopmentofthemacronucleargene .Thefunctional macronuclearC2gene,composedonlyofMDSs,isdepictedinF igure2.6(B). TheIESscanbeoflengthrangingfrom5bptoover500bp(see[4 5]),butoften lessthan100basepairslong.TheIESscanappearanywherein themicronuclear gene.ThenumberoftheIESsalsovaries.Thepol genein Oxytrichianova ,for instance,issplicedinto45segments.SincetheIESsareeli minatedduringthe macronucleardevelopment,amajorDNAportion(about95%in somespecies)is deletedfromthegermlineDNA. Inaddition,thescatteredMDSsmightbescrambledinsomemi cronuclear genes[44,45,11].ThismeansthattheMDSorderdoesnotcoin cidewiththe orderthattheyarearrangedinthemacronucleargene.There fore,thetransformationintofunctionalMACgeneinvolvesMDSrearrangement .Anexampleofa scrambledgeneisgiveninFigure2.7.Themicronuclearvers ionofthe TPgene in Oxytrichatrifallax iscomposedof17scrambledMDSs:1 3 5 7 10 15

PAGE 28

MDS1 3 5 7 10 12 2 4 6 8 9 11 13 15 17 16 MDS1 3 5 7 10 12 2 4 6 8 9 11 13 15 17 16IES1 2 3 4 5 6 7810 11 12 13 14 15 14 14 16 9(A)(B) Figure2.7:Schematicrepresentationofthescrambledanda ssembledpol geneof Oxytrichatrifallax .(A)micronucleargene,(B)macronucleargene.12 2 4 6 8 9 11 13 14 15 16 17. Moreover,someoftheMDSscanappearinvertedinthemicronu clearDNA. Namely,someMDSs(consideredasanucleotidesequences)ar ereversedrelativeto theothers.Asaconsequence,theformationofthemacronucl eargenesrequires inversionofDNAsegments.AnMDSsequenceanditsinverteds egmentare illustratedinFigure2.8. ATAACTCTAAA. .AAGATCTATAA TATTGAGATTT. .TTCTAGATATT TTATAGATCTT. .TTTAGAGTTAT AATATCTAGAA. .AAATCTCAATA MDS MDS Figure2.8:AnMDSnucleotidesequenceoftheUSG1micronucl eargenein Uroleptus (right)anditsreverse(left).Welabeltheinvertedsequencesbythesamesymbolastheorig inalsequence withoverheadbar.InthemicronuclearversionoftheactinI genein Oxytrichia nova ,MDS 2 isinverted(seeFigure2.9).Thus,itneedstoberotatedfor 180 degreesrstandthenconnectedtoMDS 1 andMDS 3 whentheMACactinIgene isformed. MDS3MDS4MDS6MDS5MDS7MDS8MDS9MDS1MDS2 IES1IES2IES3IES4IES5IES6IES7IES8 Figure2.9:MicronuclearactinIgenein Oxytrichianova .16

PAGE 29

Themicronuclerandthemacronuclearstructuresofthegene sisanindicator forthecomplexityofthegenomeprocessingthattakesplace duringconjugation inciliates.BasedontheMDS-IESgeneorganizationdescrib edpreviously,the assemblyoftheMACgenesfromtheirMICcounterpartsconsis tofacomposition ofthefollowingevents:DNAelimination,DNApermutationa ndDNAinversion. ThemacronuclearformationoftheactinIgenein Oxytrichianova givenin Figure2.9requiresallthreeprocessingevents.Inparticu lar,theDNAsequences IES 1 ; IES 2 ;:::; IES 8 areremoved,MDS 2 isreversedandtheMDSsarerearranged inconsecutiveorder(MDS1-2-3-4-5-6-7-8-9).Theassembl edMACactinIgene isschematicallyrepresentedinFigure2.10. MDS3MDS4MDS6MDS5MDS7MDS8MDS9MDS1MDS2 Figure2.10:MacronuclearactinIgenein Oxytrichianova .TheprocedureofvastDNAmodicationthathappensduringma cronuclear developmentinciliatesisnotwellunderstood.However,th erearespecialnucleotidesequencesthatguidetheprocess.Thosesequences appearasidentical pairsattheendofthe n thMDSandthebeginningofthe( n+1 )stMDS.Theyare calledpointers,sincetheyresemblethedatatypespointer susedinprogramming languages. InFigure2.11apairofpointersequences(onecopyoccurrin gattheendof MDS 4 andtheothercopyoccurringatthebeginningofMDS 5 )inUSG1micronucleargenein Uroleptus isdepicted.Thepointeristhefollowingnucleotide sequence ATAAACTCTAAA TATTTGAGATTT ,representedbyunderlinedrepeatsinFigure2.11. Thelengthofthepointersequencescanvaryfromafewnucleo tidestoabout twentynucleotides.Actually,theyaresubsequencesofthe MDSs.Therefore,in adiagramrepresentationofthemicronucleargenes,wesket chthemassmaller rectangularblockswithintheMDSrectangle(seeFigure2.1 2).Welabelthe pointersbynumberssuchthatthepointerappearingattheen dofthe n thMDS andthebeginningofthe( n+1 )stMDSislabeledby( n+1 ).IfMDS n isinverted, 17

PAGE 30

ATAACTCTAAA AAGCG . .ATAA TATTGAGATTT TTCGC . .TATT MDS 5 AATA. . GTTCT ATAACTCTAAA TTAT. . CAAGA TATTGAGATTT MDS 4 Figure2.11:AsegmentcontainingMDS4andMDS5oftheUSG1micronucleargenein Uroleptus .thenitspointersaredenotedby n and n +1. MDS 3 3 4 MDS 4 4 5 5 MDS 6 6 6 MDS 5 7 7 MDS 7 8 8 MDS 8 9 9 MDS 9 MDS 1 2 2 3 MDS 2 IES 1 IES 2 IES 3 IES 4 IES 5 IES 6 IES 7 IES 8 Figure2.12:MicronuclearactinIgenein Oxytrichianova includingthepointersequences.Thepointersplayacrucialroleinthemacronucleargeneass embly.Sinceeach pointerconsistsofapairofhomologous(identical)sequen ces,itguidesthejoining oftwoconsecutiveMDSs.Inparticular,theDNAmoleculefol dsinspace,sothat thehomologouspointersarebroughtincloseproximitytoea chother.Theyare readytoalignandrecombine,asexplainedinSection2.1.In therecombination process,onecopyofthepointerpairiskeptinthemacronucl eargene,whilethe otheroneisremoved. Althoughthepointersguidethegeneassembly,thereareadd itionalfactors thataddtotheprecisionandeectivenessofthiscomplexpr ocess.Theoretical modelsforDNAeliminationandrearrangementduringmacron ucleardevelopment arediscussedindetailsinChapters3and4. 18

PAGE 31

3PointerGuidedModelsforDNARearrangements Therearedierentmolecularandtheoreticalmodelsthatde scribetheprocessof DNArearrangementinciliates.Someofthemodelssolelyrel yonthepointersas guidesthroughtheprocess,whileothersconsiderRNAtempl atesinaddition. Inthischapterwepresenttwo\pointerguided"modelsforge nerearrangementsinciliatesintroducedin[22,23,35,36].Inbothmode lsthemicronuclear andmacronuclearDNAmoleculesareabstractedbylinearand circularstringsover formalalphabets.InSection3.1.1,wepresenttheformaliz ationbystrings,called MDS-IESdescriptors,asproposedandstudiedin[19].TheDN Arearrangement eventsinbothmodelsareformalizedbyrewritingrules(ope rations)appliedtothe MDS-IESdescriptors.ThecompleteassemblyofaMACgenefro mitsMICgene counterpartisviewedasacompositionofrewritingoperati ons.Oneofthemodels (Section3.1)isintramoleculer,sinceitassumesrecombin ationsonlywithinasingleDNAmolecule.Theothermodel(Section3.2),calledinte rmolecular,permits recombinationofdierentmolecules.Theauthorsof[19]de nerealisticdescriptorsasaspecialtypeofMDS-IESdescriptorswhosestructur emayberealizedas aMICgene.InSection3.1.2wegiveacompletecharacterizat ionoftherealistic descriptorsbyTheorem3.1.18.Weusethischaracterizatio ntoprovethatitis notpossibletoderivearealisticdescriptorfromanonreal isticonebyapplying therewritingoperations. Itwasprovenin[24]thatthetwomodelsagreeonthestructur eoftheresulting moleculesafterthecorrectassemblyofmacronuclearintom acronucleargene(see Theorem3.3.1).InSection3.3,wegiveanalgorithmwhichou tputsthesetof MDS-IESdescriptorsobtainedbyassemblyofagivenMICgene descriptor. 19

PAGE 32

3.1IntramolecularModel Theintramolecularmodelforgeneassemblyinciliateswasi ntroducedbyA. Ehrenfeucht,T.Hajru,D.M.PrescottandG.Rozenberg(see[ 22,23,19]).This modelisbasedonthreemolecularoperations:looprecombin ation,hairpinrecombinationanddouble-looprecombination. MDS1MDS2MDS3MDS4IES1IES2IES3 3 3 MDS1MDS2MDS3 MDS4IES1IES2IES3 MDS3 MDS1MDS2MDS3 MDS4IES1IES2IES3 MDS3 3 3(A)(B) (C) Figure3.1:Pointerrecombination.(A)Apairofpointers3a lignsidebyside.(B)The pointersrecombineashomologoussequences.(C)Tworesult ingmolecules,suchthat oneofthemcontainsMDS2andMDS3connectedthroughacopyofpointer3.Alooprecombinationisanoperationthatisapplicabletoam icronucleargene ifthereisapairofpointersseparatedbyanIESonly.Thepoi ntersrecombine, sotheyguidetheremovaloftheIESthatisbetweenthem.Inad dition,the consecutiveMDSsthatcontainapairofpointersarespliced .InFigure3.1aloop recombinationofpointer3isshown.Intherststep(seeFig ure3.1(A)),the pointersaligninparallelbyformingaloop.Next,ahomolog ousrecombination 20

PAGE 33

takesplace,asshowninFigure3.1(A),(B). 8 MDS 8 9 MDS1 2 IES6IES7IES8 8 MDS8 9 MDS1 2 IES6IES7 2 8 MDS8 9 MDS1 2 IES7 MDS23 2 IES8IES6(A) (B) (C) 2 MDS 2 3 MDS 9 9 MDS 2 3 MDS 9 9 MDS 9 9 Figure3.2:Ahairpinrecombination.(A)Apairofpointers2 alignsidebyside,by formingahairpin.(B)Thepointersrecombineashomologous sequences.(C)The resultingmoleculecontains MDS1and MDS2connectedthroughacopyofpointer2.TheIES 2 isremovedfromthegenealongwithonecopyofthepointerasa circularmolecule.Inaddition,MDS 2 andMDS 3 arejoinedthroughacopyofthe pointer3formingacompositeMDS.ThecompositeMDScontain stwopointers; 2atthebeginningand4attheendofthesegment,whichcanser veforfurther assembly. Thehairpinrecombinationisamolecularoperationapplica bletoamicronucleargenethatcontainsreversedsegments.Inparticular, forahairpinrecombinationtohappen,theDNAmoleculeshouldcontainapairofpo interssuchthat 21

PAGE 34

oneisaninvertedcopyoftheother.Theoppositelyoriented pointers'alignment isenabledbyfoldingofthemoleculeinahairpinform.Thisa llowsthemtorecombine.Thehairpinrecombinationyieldsonelinearmolec ulethatcontainsa compositeoftwoMDSs.Onecopyofthepointerremainsbetwee nthecomposite MDSs,whiletheothercopyisremoved. IES2 MDS5 6(A) (B) (C) 67 MDS7 7 IES3MDS 6IES4 IES2 IES 3MDS 6IES4 MDS5 MDS7 MDS56 MDS67 MDS7 6 7 IES2IES4IES3 Figure3.3:Adoublelooprecombination.(A)Aportionofthe actinImicronuclargene in O.trialax .(B)Thepairofpointers6andthepairofpointers7align,by forminga doubleloopandtheyrecombine.(C)Theresultingmoleculec ontainMDS5; MDS6and MDS7joinedcorrectly.ThisoperationisschematicallyillustratedinFigure3.2. Theportionthat containsMDS 1 andinvertedcopyofMDS 2 oftheactinIgeneisconsidered.This 22

PAGE 35

providestwoinverselyorientedoccurrencesofthepointer 2.Therefore,thehairpin recombinationcanbeperformed(seeFigure3.2(B)).Inther esultingmolecule, giveninFigure3.2(C),MDS 1 andMDS 2 aresplicedinaconsecutiveorder,while IES 7 ; IES 8 andacopyofthepointer2havebeenexcised. Thethirdoperation,calleddouble-looprecombination,is applicabletoamicronuclearsegmentthatcontainstwopairsofoverlappingp ointers.Thepointers overlapwhentheDNAsequenceenclosedbetweenonepairofpo intersoverlaps withtheDNAsequenceenclosedbetweenanotherpairofpoint ers.Suchsegment isshowninFigure4.8(A),wherepointers6and7overlap.The moleculefoldsas showninFigure4.8(B),sothetwopairsofpointersalign.Th us,tworecombinationeventscanbeperformed.Therecombinationofthepai rofpointers6in Figure4.8(C)leadstoacompositeofMDS 5 andMDS 6 connectedthroughacopy ofpointer6,whiletherecombinationofthepairofpointers 7joinsMDS 6 toMDS 7 throughacopyof7.Thedouble-looprecombinationyieldsas inglemoleculethat containsacompositeofthreeconsecutiveMDSs. Theauthorsof[19]showthattheassemblyofallexperimenta llyobserved micronucleargenepatternsmaybemodeledbyapplyingacomp ositionofthese threemolecularoperations.Theapplicationofeachmolecu laroperationisviewed asaprocessofapointerremoval.Thus,allthepointersarer emovedbysuccessive applicationofoneormoremolecularoperations.Atthesame time,theIESsare excisedandallMDSsarejoinedinthecorrectorder.Themole culesobtainedin theprocess(aftereachapplicationofanoperation)arecal led intermediates 3.1.1MDS-IESdescriptors Themodelofgeneassemblybasedonthethreemolecularopera tionsisfurther formalized.Namely,theMDS-IESgenestructuresarerepres entedbyformal stringsandthemolecularoperationsarerepresentedbyrew ritingrulesapplicable tothestrings[22,23]. Therststepofformalizationisbyrepresentingthemicron ucleargenesor theintermediatesasasequencesofMDSs,calledMDSarrange ments.Thisrep23

PAGE 36

resentationcarriesthestructuralorganizationoftheMDS swithinthemolecule, disregardingtheIESs.Example3.1.1 TheC2genegiveninFigure3.4hasthefollowingMDSstructur e: M 1 M 2 M 3 M 4 .Afterapplyingalooprecombinationonpointer3,theinter mediate inFigure3.5isobtained.TheMDSstructureoftheintermedi atecanbewritten as: M 1 M 2 ; 3 M 4 Thealphabets M k = f M i;j j 1 i j k g and M k = f M i;j j 1 i j k g foreverynaturalnumber k areused.Eachsymbol(element)ofthealphabets M k correspondstoanMDSorblockofMDSs.Namely, M i;i standsforMDS i and isoftenwritten M i ,while M i;j isablockcomposedofMDS i ; MDS i+1 ;:::; MDS j assembledinacorrectorder.Theelementsfrom M k representtheinvertedMDSs. Astringoveralphabet M k [ M k iscalled MDSarrangement ofsize k A signedpermutation ofastring v = v 1 v 2 v n overalphabet M k iseachstring thatisapermutationofastring u = u 1 u 2 u n ,where u j = v j or u j = v j for every j 2f 1 ; 2 ;:::;n g .Ourconventionisthat v = v .Forastring v = v 1 v 2 v n itsinverseisdenedas v = v n v 2 v 1 Denition3.1.2 Let = M 1 ;i 2 1 M i 2 ;i 3 1 M i n ;k beastringover M k .Asigned permutationof iscalled realisticarrangement TherealisticarrangementsmodelpossibleMDSstructureso fmicronuclergenes, macronucleargenesortheintermediatemoleculesobtained intheprocessofrearrangement. Thismodelisbasedontheassumptionthatonlythepointersg uidetheprocess ofgenerearrangement.Therefore,theMDS-IESDNAstructur esarerepresented asastringsofpointersandIESs,calledMDS-IESdescriptor s. Denition3.1.3 Let D k = f ( b;e ) ; ( b;i ) ; ( i;e ) ; ( e; b ) ; ( i; b ) ; ( e; i ) j 2 i k g[ f ( i;j ) ; ( j; i ) j 2 i
PAGE 37

I k [ I k iscalledan IESdescriptor ofsize k .Astring over D k [ I k [ I k iscalled MDS-IESdescriptor ofsize k .Anelementfromtheset M = f b;e; b; e g iscalled a marker Weomitthesizeofthedescriptorsifitisunderstoodfromth econtextoritis irrelevantinthecontext. A signedpermutation ofastring v = v 1 v 2 v n overalphabet f ( b;e ) ; ( b;i ) ; ( i;e ) j 2 i k g[f ( i;j ) j 2 i
PAGE 38

MDS 1 MDS 2 MDS 3 MDS 4 IES 1 IES 2 IES 3 2 2 3 3 4 4 Figure3.4:SchematicrepresentationofthemicronuclearC 2geneincludingthepointers 2,3and4. MDS 1 MDS 2 MDS 3 MDS 4 IES 1 IES 3 2 2 3 4 4 Figure3.5:Schematicrepresentationoftheintermediatem oleculeobtainedfromthe micronuclearC2geneafterlooprecombinationofpointer3.combinationonpointer3,theintermediateinFigure3.5iso btained.TheMDSIESdescriptoroftheintermediateis( b; 2) I 1 (2 ; 4) I 3 (4 ;e ). TheMDS-IESdescriptoroftheactinImicronucleargene(see Figure2.12)is (3 ; 4) I 1 (4 ; 5) I 2 (6 ; 7) I 3 (5 ; 6) I 4 (7 ; 8) I 5 (9 ;e ) I 6 ( 3 ; 2) I 7 ( b; 2) I 8 (8 ; 9). NotethatitisnotthecasethateveryMDS-IESdescriptorcor respondstoan MDS-IESgenestructure.Example3.1.5 TheMDS-IESdescriptors 1 =( b; 2) I 1 ( b; 3) I 2 (2 ; 3)and 2 = ( b; 2)(2 ; 3) I 1 (3 ; 4) I 2 (3 ; 5) I 3 (4 ; 5) I 4 (5 ;e ),forexample,cannotbeformalizationsof MDS-IESstructures.Thestring 1 containsmorethanonebeginningmarker b whichmeansthatthecorrespondinggenestructurewouldhav emorethanone MDS 1 ,andthisisnotpossible.Thepointer3occursthreetimesin 2 ,which cannothappeninmicronucleargeneoranintermediate. TheMDS-IESdescriptorsthatmodelactualMDS-IESgenestru cturesare calledrealistic.TherealisticMDSdescriptorscanbechar acterizedthroughrealisticarrangements.Forthatpurpose,amorphism k ,betweentheset M k ofall MDSarrangementsover M k andtheset D k ofallMDSdescriptorsover D k ,is dened.Themorphism k isdenedonthegeneratorsof M k and D k asfollows: k ( M i;j )=( i;j +1)for2 i j k 1, 26

PAGE 39

' k ( M i;j )=( j +1 ; i )for2 i j k 1, k ( M 1 ;j )=( b;j +1)for j k 1, k ( M i;k )=( i;e )for2 i k ( M 1 ;j )=( j +1 ; b )for j k 1, k ( M i;k )=( e; i )for2 i Example3.1.6 Forexample,wehave 4 ( M 1 M 2 ; 3 M 4 )=( b; 2)(2 ; 4)(4 ;e ),and 9 ( M 3 M 4 M 6 M 5 M 7 M 9 M 2 M 1 M 8 )=(3 ; 4)(4 ; 5)(6 ; 7)(5 ; 6)(7 ; 8)(9 ;e )( 3 ; 2)( b; 2)(8 ; 9). Denition3.1.7 AnMDSdescriptor iscalled realistic ifthereisarealistic MDSarrangement suchthat k ( )= forsome k 2.Moreover,anMDS-IES descriptoriscalledrealisticifitiscomposedofarealist icMDSdescriptor. Let beanMDSdescriptorand p apointerthatoccursin .Wesaythat p is positive ifboth p and p appearin ,otherwise p is negative .Thepointer p has left(right,respectively)occurrence in if = 1 ( p;q ) 2 or = 1 ( p; q ) 2 ( = 1 ( q;p ) 2 or = 1 ( q; p ) 2 ,respectively)forsomedescriptors 1 ; 2 anda pointer q .Forpointers p and q thatoccurin ,wesaythat p ispairedwith q in ifoneof( p;q ),( q;p ),( p; q ),( q; p )isasymbolin Lemma3.1.8followsfromthedenitionofanMDSdescriptora ndthedenitionofleftandrightoccurrenceofapointer.Lemma3.1.8 Let beanMDSdescriptorand p apointerthatoccursin .If p ( p ,respectively)hasleft(right,respectively)occurrence in ,thenitispaired withsomepointer q suchthat q>p .Similarly,if p ( p ,respectively)hasright (left,respectively)occurrencein ,thenitispairedwithsomepointer r suchthat r

p ,foreachpointer p Theorem3.1.9wasprovenin[19]. 27

PAGE 40

Theorem3.1.9 Ifadescriptor isrealisticthenitsatisesthefollowingconditions: (A) Oneofthemarkers b or b occursexactlyoncein ,andoneofthemarkers e or e occursexactlyoncein (B) Eachpointer p (exceptfor b and e )thatoccursin occursexactlytwice. (C) Foranynegativepointer p thatoccursin ,oneoccurrenceisleftandone occurrenceisright. (D) Foranypositivepointer p thatoccursin ,bothoccurrencesareeitherleft orright. Weformulatethefollowingdenitioninordertoprovetheco nverseofTheorem 3.1.9.Denition3.1.10 Let beanMDSdescriptorand p apointerthatoccursin Awellpairedpartition P p relativetoapointer p isapartitiononsymbolsin containing p whoseelementscontainexactlytwosymbolsand f ( i;j ) ; ( s;t ) g2 P p ioneofthefollowingistrue: (i) j = s = p or j = s = p (ii) j = p and t = p or i = p and s = p Inaddition,thetwooccurrencesofthepointer p inthesymbols( i;j )and( s;t ) in arecalled wellpairedpointers if( i;j )and( s;t )satisfy( i )or( ii ). Weintroducegeneralizedstatement(C 0 )thatincludestheconditions(C)and (D)fromTheorem3.1.9as: (C 0 )Foreverypointer p thatoccursin ,thereisawellpairedpartition P p on thesymbolsof containing p Notethatifboth p and p occurinsingleelementofwellpairedpartition P p thentheysatisfy(D).Otherwise,theysatisfy(C).Inother words,ifadescriptor satises(C 0 )thentheoccurrencesofeachpointer p in canbegroupedinpairs, suchthateachpairsatiseseither(C)or(D). 28

PAGE 41

Example3.1.11 TheMDSdescriptor 1 =( b; 2)(3 ;e )( b; 3)( 3 ; 2)(3 ; 4)(4 ;e )satisesthecondition(C 0 ).Namely, P 2 = ff ( b; 2) ; ( 3 ; 2) gg P 3 = ff ( b; 3) ; (3 ;e ) g ; f (3 ; 4)) ; ( 3 ; 2) gg and P 4 = ff (3 ; 4) ; (4 ;e ) gg arewelldenedpartitionsforpointers 2,3and4,respectively. Ontheotherhand,theMDSdescriptor 2 =( b; 2)(2 ; 5)(2 ; 3)(3 ; 5)(2 ;e )does notsatisfy(C 0 ).Onecaneasilycheckthatnoneofthepossibletwo-element partitionsonthesymbolscontainingpointer2( P 2 = ff ( b; 2) ; (2 ; 5) g ; f (2 ; 3) ; (2 ;e ) gg P 0 2 = ff ( b; 2) ; (2 ; 3) g ; f (2 ; 5) ; (2 ;e ) gg and P 00 2 = ff ( b; 2) ; (2 ;e ) g ; f (2 ; 3) ; (2 ; 5) gg )is wellpairedpartition.Therefore, 2 violates(C 0 ). Itisclearthat(C)and(D)imply(C 0 )andthestatement(C 0 )implies(C)and (D)providedthatthereareexactlytwooccurrencesofeachp ointerinadescriptor .Inaddition,weproveinLemma3.1.12thatthecondition( B )fromTheorem 3.1.9isaconsequenceof(A)and(C 0 ). Lemma3.1.12 Let beanMDSdescriptorthatsatises: (A) Oneofthemarkers b or b occursexactlyoncein ,andoneofthemarkers e or e occursexactlyoncein ,and (C 0 ) Foreverypointer p thatoccursin ,thereisawellpairedpartition P p on thesymbolsof containing p Then,eachpointerthatoccursin occursexactlytwice. Proof: Let beadescriptorthatsatisestheconditions(A)and(C 0 ). Notethatthecondition(C 0 )impliesthateachpointerin hasevennumber ofoccurrences.ByLemma3.1.8andthedenitionofthealpha bet D k ,foreach pointer p andeach f ( i;j ) ; ( s;t ) g2 P p ,where P p isaxedwellpairedpartition, wehavethatinoneofthesymbols( i;j ) ; ( s;t ),thepointer p ispairedwithpointer q suchthat q

p .Notethat q mightbe b and r mightbe e .Let x 1 ;x 2 ;:::;x n bethe listofalldierentpointersthatoccurin and x 1
PAGE 42

Next,weshowbyinductionthatthereareexactlytwooccurre ncesofthe pointer x i ,suchthatoneoccurrenceispairedwiththepointer x i 1 in foreach i 2f 1 ; 2 ;:::;n g First,weshowthatoneoccurrenceofthepointer x 1 ispairedwith b in .For acontrary,assumethat b ispairedwithapointer x k in forsome k> 1.Let f ( i;j ) ; ( s;t ) g2 P k 1 .Then,bothsymbols( i;j )and( s;t )containthepointer x k 1 butoneofthem,say( i;j ),contains x k 1 pairedwithsomepointer x l 1 suchthat x k 1 >x l 1 .Inparticular,( i;j )=( x l 1 ;x k 1 )or( i;j )=( x k 1 ; x l 1 ).Then,thereis asymbol( s 0 ;t 0 )thatcontains x l 1 in suchthat f ( i;j ) ; ( s 0 ;t 0 ) g2 P l 1 .Since x l 1 is pairedwithpointerlessthanitselfinthesymbol( i;j ), x l 1 ispairedwithsome x j 2 where x j 1 >x j 2 in( s 0 ;t 0 ).Wecontinuethesequence x k 1 >x l 1 >x l 2 > >x l m untilpointer x l m thatispairedwiththebeginningmarker b isobtained.Since oneof b or b occursexactlyoncein x l m = x k .Thiscontradictsthefactthat x k >x k 1 .Hence, k =1.Similarly,onecanshowthatoneoccurrenceofthe pointer x n ispairedwiththeendingmarker e Weprovedthatoneoccurrenceofthepointer x 1 ispairedwiththemarker b Thereisasecondoccurrenceof x 1 pairedwithapointer x i >x 1 suchthatone of f ( b;x 1 ) ; ( x 1 ;x i ) g f ( b;x 1 ) ; ( x i ; x 1 ) g f ( x 1 ; b ) ; ( x 1 ;x i ) g f ( x 1 ; b ) ; ( x i ; x 1 ) g isin P 1 Assumethat x 1 occursmorethantwicein .Thenthereisatleastonemore element f ( i;j ) ; ( s;t ) g2 P 1 .Inoneofthesymbols( i;j )or( s;t ),thepointer x 1 ispairedwithapointerlessthenitself.But,thatcannotha ppensincethereare nosuchpointersexceptthebeginningmarker b ,whichisalreadypairedwithan occurrenceof x 1 .Therefore, x 1 hasonlytwooccurrencesin Now,assumethatthereareexactlytwooccurrencesofeachpo inter x i ,such thatoneoccurrenceispairedwiththepointer x i 1 in foreach i 2f 1 ; 2 ;:::;s 1 g Considerthepointer x s .Thereisanoccurrenceof x s in pairedwithsome pointer x k suchthat x k
PAGE 43

thatthepointer x s in canbepairedwithonlyonepointerlessthen x s andthat is x s 1 .Hence, x s canoccurexactlyonemoretimein (pairedwithapointer greaterthanitself).Thisshowsthat x s hasexactlytwooccurrencesin Therefore,eachpointer x i hasexactlytwooccurrencesin foreach i 2 f 1 ; 2 ;:::;n g .Atthesametimeweprovedthateach x i ispairedwiththepointer x i 1 and x i +1 in foreach i 2f 2 ;:::;n 1 g 2 WecanstrengthentheresultofTheorem3.1.9byprovingitsc onversein Theorem3.1.13.Withthesetwotheoremsthestructureofrea listicdescriptors iscompletelycharacterized.Weusethischaracterization latertoproveTheorem 3.1.18.Theorem3.1.13 Adescriptor isrealisticiitsatisesthefollowingconditions: (A) Preciselyoneofthemarkers b; b occursexactlyoncein andpreciselyone ofthemarkers e; e occursexactlyoncein (C 0 ) Foreverypointer p thatoccursin ,thereisawellpairedpartition P p relative to p Proof: If isarealisticdescriptorthenbyTheorem3.1.9,itsatises theconditions(A),(B),(C)and(D).Sinceeachpointerhasexactlytw ooccurrencesin ,theconditions C and D imply(C 0 ).Thus, satises(A)and(C 0 ).Conversly,let beadescriptorthatsatises(A)and(C 0 ).Assumethat x 1 ;x 2 ;:::;x n arealldierentpointersthatoccurin and x 1
PAGE 44

3.1.2ReductionStrategies Threetypesofrewritingrulesweredenedonthesetofallre alisticMDS-IES descriptorsin[19].Eachoftherewritingoperationsmodel soneofthethree molecularoperationsdenedearlier.DependingonthetheM DS-IESdescriptor towhichtherewritingruleisapplicable,thereareafewcas esforeachrewriting rule.Foranystring u = u 1 u 2 u n ,anyelementfromtheset f u 2 u 3 u n u 1 ;u 3 u 4 u n u 1 u 2 ;:::;u n u 1 u n 1 g isdenotedby[ u ]andwecall[ u ]a circularstring .Circularstringsmodelcircular DNAmolecules.Intheequationsthatdenetherewritingrul esbelow,thepointer towhichtheruleisappliedisboldedoutforeasierreading. 1.Therstruleiscalledan ldrule .Itcorrespondstothemolecularloop recombination(seeFigure3.1).Itisapplicabletoanegati vepointer p that occursinanMDS-IESdescriptorofthefollowingform: 1 ( q;p ) l 1 ( p;r ) 2 or l 1 ( p;m ) l 2 ( m 0 ;p ) l 3 ,i.e.,adescriptorthatcontainstwocopiesofthesame symboldividedbyastringover I k [ I k ,andpossiblymarkers. Theldruleforapointer p isdenedasfollows: ld p ( 1 ( q; p ) l 1 ( p ;r ) 2 )= f 1 ( q;r ) 2 ; [ l 1 ] g ld p ( l 1 ( p ;m ) l 2 ( m 0 ; p ) l 3 )= f l 1 l 3 ; [( m 0 ;m ) l 2 ] g where q;r arepointers, 1 ; 2 areMDS-IESdescriptors, l 1 ;l 2 ;l 3 areIESdescriptorsand m;m 0 aremarkers. 2.Thesecondrulecalledan hirule formalizesthehairpinrecombination(see Figure3.2).ItisapplicabletoaportionofaMDS-IESdescri ptorthat containsbothsymbols p and p .Inotherwords,hiisapplicabletopositive pointersonly.Therewritingrulehiforapositivepointer p isdenedwith: hi p ( 1 ( p ;q ) 2 ( p ; r ) 3 )= 1 2 ( q; r ) 3 32

PAGE 45

hi p ( 1 ( q; p ) 2 ( r; p ) 3 )= 1 ( q;r ) 2 3 where q and r arepointersand 1 ; 2 areMDS-IESdescriptors. 3.Wesaythattwopointers p and q overlapiftheyappearintheMDS-IES descriptorattheorderof p q p q .Thethirdrewritingrule,calleda dladrule isapplicabletoanMDS-IESdescriptorthatcontainsoverla pping negativepointers.Itcorrespondstothedouble-loopmolec ularoperation (seeFigure4.8).IntermsofMDS-IESdescriptordladcanbea ppliedin fourwaysaccordingtotheleftorrightappearanceoftheneg ativepointers p and q inthedescriptor.Itisdenedwith: dlad p ; q ( 1 ( p ;r 1 ) 2 ( q ;r 2 ) 3 ( r 3 ; p ) 4 ( r 4 ; q ) 5 )= 1 4 ( r 4 ;r 2 ) 3 ( r 3 ;r 1 ) 2 5 dlad p ; q ( 1 ( p ;r 1 ) 2 ( r 2 ; q ) 3 ( r 3 ; p ) 4 ( q ;r 4 ) 5 )= 1 4 3 ( r 3 ;r 1 ) 2 ( r 2 ;r 4 ) 5 dlad p ; q ( 1 ( r 1 ; p ) 2 ( q ;r 2 ) 3 ( p ;r 3 ) 4 ( r 4 ; q ) 5 )= 1 ( r 1 ;r 3 ) 4 ( r 4 ;r 2 ) 3 2 5 dlad p ; q ( 1 ( r 1 ; p ) 2 ( r 2 ; q ) 3 ( p ;r 3 ) 4 ( q ;r 4 ) 5 )= 1 ( r 1 ;r 3 ) 4 3 2 ( r 2 ;r 4 ) 5 where p;q;r;r 1 ;r 2 ;r 3 ;r 4 ;r 5 arepointersand 1 ; 2 ; 3 ; 4 ; 5 areMDS-IESdescriptors. Thesetofallld,hianddladrulesisdenotedby= Ld [ Hi [ Dlad .An operation 2 isapplicabletoaset= f 1 ; 2 ;:::; n g where i isMDS,IES orMDS-IESdescriptorforall1 i n ifthereissome s 2f 1 ; 2 ;:::;n g ,such that isapplicableto s Lemma3.1.14followsdirectlyfromthedenitionofwellpai redpointersand thedenitionsofrewritingrules.Lemma3.1.14 Operationsfrom areapplicabletowellpairedpointersonly. Denition3.1.15 Acomposition ofld,hianddladrulesiscalleda reduction strategy inthismodel.Furthermore, isa successfulstrategy foranMDS-IES 33

PAGE 46

descriptor ifoneofthefollowingistrue: (1) ( )= f l 1 ( b;e ) l 2 ; [ l 3 ] ;:::; [ l t ] g (2) ( )= f l 1 ( e; b ) l 2 ; [ l 3 ] ;:::; [ l t ] g (3) ( )= f [( b;e ) l 1 ] ;l 2 ; [ l 3 ] ;:::; [ l t ] g where l 1 ;l 2 ;:::;l t forsomepositiveinteger t areIESdescriptors. Ifareductionstrategy satisesthecondition( i )forsome i 2f 1 ; 2 ; 3 g fromthe Denition3.1.15,thenwesaythat isasuccessfulstrategyoftype( i ). Example3.1.16 Let 1 = ld 2 ld 3 .Then 1 isareductionstrategyforthe descriptor =( b; 2) I 1 (2 ; 3) I 2 (3 ; 4) I 3 (4 ;e ),since ld 3 isapplicableto and ld 2 isapplicableto ld 3 ( ).Wehavethat 1 ( )= ld 2 ( f ( b; 2) I 1 (2 ; 4) I 3 (4 ;e ) ; [ I 2 ] g )= f ( b; 4) I 3 (4 ;e ) ; [ I 1 ] ; [ I 2 ] g Furthermore, 2 = ld 4 ld 2 ld 3 isasuccessfulreductionstrategyoftype(1) for .Namely, 2 ( )= f ( b;e ) ; [ I 1 ] ; [ I 2 ] ; [ I 3 ] g Let 3 = hi 3 dlad 8 ; 9 ld 5 dlad 6 ; 7 hi 2 ld 4 .Then 3 isasuccessfulreduction strategyoftype(2)forthedescriptor 1 =(3 ; 4) I 1 (4 ; 5) I 2 (6 ; 7) I 3 (5 ; 6) I 4 (7 ; 8) I 5 (9 ;e ) I 6 ( 3 ; 2) I 7 ( b; 2) I 8 (8 ; 9) ; since 3 ( 1 )= f I 6 ( e; b ) I 7 I 8 I 5 ; [ I 1 ] ; [ I 2 I 4 I 3 ] g Lemma3.1.17 Let 0 beadescriptorobtainedfromadescriptor byapplication ofarulefrom .Thenforeachpointer k thatoccursinboth and 0 ,thereis aone-to-onecorrespondencebetweenthesetofwellpairedp artitions P k onthe symbolscontaining k in andthesetofwellpairedpartitions P 0 k onthesymbols containing k in 0 Proof: Let 0 beadescriptorobtainedfromadescriptor byapplicationofarule fromandlet k beapointerthatoccursinboth and 0 .Theconstrictionofa one-to-onecorrespondencebetweenthesetofwellpairedp artitions P k onthe 34

PAGE 47

symbolscontaining k in andthesetofwellpairedpartitions P 0 k onthesymbols containing k in 0 isdescribedbelow. First,assumethat 0 = ld p ( )forsomepointer p in .Then,bythedenition oftheldrule = 1 ( q;p ) l 1 ( p;r ) 2 and 0 = 1 ( q;r ) 2 ,where q;r arepointers, 1 ; 2 areMDS-IESdescriptorsand l 1 ;l 2 ;l 3 areIESdescriptors.Let P k beawell pairedpartitiononthesymbolscontaining k in .Weconsiderfourcases: (1)Let k 6 = p k 6 = q and k 6 = r .Then P k iswellpairedpartitiononthesymbols containing k in 0 .Therefore,( P k )= P k (2)Let k 6 = p k 6 = q and k = r .Weconstruct P 0 k inthefollowingway: f ( q;k ) ; ( s;t ) g2 P 0 k i f ( i;j ) ; ( s;t ) g2 P k and( i;j )=( p;k ), f ( i;j ) ; ( q;k ) g2 P 0 k i f ( i;j ) ; ( s;t ) g2 P k and( s;t )=( p;k ), f ( i;j ) ; ( s;t ) g2 P 0 k i f ( i;j ) ; ( s;t ) g2 P k and( p;k ) = 2f ( i;j ) ; ( s;t ) g Inotherwords, P 0 k isobtainedfrom P k bysimplyreplacingthesymbol( p;k ) by( q;k ).Itisclearthat P 0 k iswellpairedpartitiononthesymbolscontaining k in 0 .Inthiscase,weset( P k )= P 0 k (3)Let k 6 = p k = q and k 6 = r .Similarlyasincase(2),weconstruct P 0 k from P k andwelet( P k )= P 0 k f ( k;r ) ; ( s;t ) g2 P 0 k i f ( i;j ) ; ( s;t ) g2 P k and( i;j )=( k;p ), f ( i;j ) ; ( k;r ) g2 P 0 k i f ( i;j ) ; ( s;t ) g2 P k and( s;t )=( k;p ), f ( i;j ) ; ( s;t ) g2 P 0 k i f ( i;j ) ; ( s;t ) g2 P k and( k;p ) = 2f ( i;j ) ; ( s;t ) g Inthiscase, P 0 k isobtainedfrom P k byreplacingthesymbol( k;p )by( k;r ). Itisclearthat P 0 k isawellpairedpartitiononthesymbolscontaining k in 0 (4)Let k = p .Thentherearesymbols( i;j )and( s;t )in suchthat f ( i;j ) ; ( q;k ) g2 P k and f ( k;r ) ; ( s;t ) g2 P k .Let P 0 k =( P k ff ( i;j ) ; ( q;k ) g ; f ( k;r ) ; ( s;t ) gg ) [ ff ( i;j ) ; ( s;t ) gg .Since P k iswellpairedpartitionandboth f ( i;j ) ; ( q;k ) g and 35

PAGE 48

f ( k;r ) ; ( s;t ) g satisfy( i )or( ii )fromthedenitionofwellpairedpartition, f ( i;j ) ; ( s;t ) g willsatisfy( i )or( ii ).Thus, P 0 k isawellpairedpartitionon thesymbolscontaining k in 0 .Therefore,( P k )= P 0 k If 0 isobtainedfrom byapplicationofsomehiordladrule,thenonecansimilarly constructawellpairedpartition P 0 k onthesymbolscontaining k in 0 fromawell pairedpartition P k onthesymbolscontaining k in .Forsuchpartitions,welet ( P k )= P 0 k .Notethatgivenwellorderedpartition P 0 k forapointer k in 0 ,awell orderedpartition P k of k in canbeeasily(byfollowingthereverseconstruction totheonedescribedabove)constructedsuchthat( P k )= P 0 k : 2 TheMDS-IESdescriptorsarewelldenedformalizationofth eMDS-IESstructures.ItisclearthatthesetofallMDS-IESdescriptorsisc losedundertherewritingrulesld,hianddlad.Inaddition,weproveinTheorem3.1 .18thatapplying anystrategycomposedofld,hianddladrulestoanonrealist icdescriptorcannot resultinarealisticdescriptor.Thismeansthatitisimpos sibletoobtainmicronucleargenestructure,macronucleargenestructureoranint ermediatemoleculein therearrangementprocessformanarbitraryDNAmoleculewi thrandomlyrepeatedhomologoussegmentsbyseriesofhomologousloop,ha irpinanddouble looprecombinations.Theorem3.1.18 Let beanonrealisticMDSdescriptor.Foreveryreduction strategy = 1 2 n ,where i 2 ( ) isanonrealisticdescriptor. Proof: Let beanonrealisticdescriptorandlet beareductionstrategy.Then, byTheorem3.1.13, violatessomeoftheproperties(A)and(C 0 ).Asaconsequence,weshowthat ( )violatessomeoftheproperties(A)and(C 0 )and therefore ( )isanonrealisticdescriptor. Supposethat doesnotsatisfy(A).Thensomeofthemarkers b or e occur morethanoncein .Noneoftheoperationsfromcanbeappliedtoamarker. Thus,applicationofanyrewritingrulewouldresultinades criptorwiththe samenumberofoccurrencesof b and e as .Therefore, ( )hasmorethanone occurrenceofamarker.Hence, ( )isnotrealistic. 36

PAGE 49

Ifthecondition(C 0 )isnotsatised,thenthereisapointer p in suchthatfor eachpartition P p onthesymbolscontaining p P p violates(C 0 ).Inotherwords, foreverytwoelementpartitiononthesymbolscontaining p ,thereisapairof symbolscontaining p whichviolatesoneofthecondition(i)or(ii).Let( i;j )and ( s;t )beaxedpairofsymbolsin thatcontainthepointer p anddoesnotsatisfy (i)and(ii).Thismeansthattheoccurrenceof p within f ( i;j ) ; ( s;t ) g isnotwell paired.Then,byLemma7.5.1,noneoftherewritingrulesisa pplicabletothis pair.Therefore, 1 ( )containsthesymbols( i;j )and( s;t ). Next,weshowbyinductionthatthepointer p occursin ( )andthereisno wellpairedpartitiononthesymbolsof ( )thatcontain p Let 1 = 1 ( ).Then 1 containsthesymbols( i;j )and( s;t ).Assumethat thereisawellpairedpartition P 0 p relativeto p in 1 .Then,byLemma3.1.17,there isawellpairedpartition P p relativeto p in containing p suchthat( P p )= P 0 p Thiscontradictsthefactthatpointer p cannotbewellpairedin .Therefore, thereisnowellpairedpartitiononthesymbolsof 1 thatcontains p Assumethat p occursin k = 1 2 k ( )andthereisnowellpaired partitiononthesymbolsof k thatcontains p .Thereisapairofsymbolscontaining p whichviolatesoneofthecondition(i)or(ii).Let( i k ;j k )and( s k ;t k ) bexedpairofsymbolsin k thatcontainthepointer p anddoesnotsatisfy(i) and(ii).Itisclearthatnorewritingruleisapplicabletot hispairofpointers andtherefore( i k ;j k )and( s k ;t k )arepresentin k +1 = k +1 ( k ).Assumethat thereisawellpairedpartition P 0 p onthesymbolsin k +1 containing p .Then,by Lemma3.1.17,thereisawellpairedpartition P p onthesymbolsin k containing p suchthat( P p )= P 0 p ,whichiscontradiction.Therefore,thereisnowellpaired partitiononthesymbolsof k +1 thatcontains p Thisshowsthatthatthepointer p occursin ( )= 1 2 n ( )and everytwoelementpartitiononthesymbolscontaining p isnotwellpaired.Thus, ( )isadescriptorthatdoesnotsatisfythecondition(C 0 ),whichimpliesthat ( )isanonrealisticdescriptor. Inanycase,wederivethat ( )isanonrealisticdescriptor. 2 37

PAGE 50

Theorem3.1.19followsdirectlyfromTheorem3.1.18,Theor em3.1.13andthe denitionoftherewritingrulesfrom.Theorem3.1.19 ThesetofallMDSdescriptors D ispartitionedintotwosubsets,thesubsetofallrealisticdescriptors R andthesubsetofallnonrealistic descriptors N ,suchthatboth R and N areclosedundertheoperationsfrom 3.2IntermolecularModel Thetopicofthissectionistheintermolecularmodelforgen erearrangementsin ciliates,introducedbyL.KariandL.Landweberin[35].Thi smodelisalsobased onthefactthatspecialrepeatedsegments(pointers)arepr esentintheMICgenes. Twomolecularoperationswereproposedforgeneassemblyin thismodel. Unliketheintramolecularmodel,theoperationsintheinte rmolecularmodelcan beperformedonseveralmoleculesatonce.TheDNAmolecules areviewedas sequencesofnucleotidesi.e.stringsoveralphabet= f A;C;G;T g ,whilethe molecularoperationsareformalizedbyrewritingrules.Th esetofallstringsover isdenotedby Figure3.6:Schematicrepresentationoftheintermolecula roperation op 11.Therstoperation( op 1)isapplicabletoasingleDNAmoleculethatcontains twocopiesofthesamesegment(pointer) x .Theresultisonelinearandone circularmolecule(seeFigure3.6).Notethatthisisanoper ationverysimilar tothelooprecombination.Theonlydierenceisthattheapp licabilityof thelooprecombinationrequiresthatthepointersaredivid edbyIESsonly, whileintheintermolecularmodelthatisnotanecessarycon dition.This recombinationisconsideredtobereversibleandtherefore itisapplicableto 38

PAGE 51

apairoflinearandcircularmoleculeswhichcontainthesam esegment x Thereverseof op 1isdenotedby op 1 R .Formally, op 1and op 1 R aredened as: op 1( uxwxv )= f uxv; [ wx ] g op 1 R ( f uxv; [ wx ] g )= uxwxv where u;v;w;x 2 and[ wx ]isacircularstring.Theauthorsin[35]also introducecircularversionof op 1,denotedby op 1 .Theinputof op 1 is onecircularmoleculethatcontainsarepeatedsegment(poi nter x )andthe outputconsistsoftwocircularmolecules.Thisoperationa nditsreverseare denedasfollows: op 1 ([ uxwxv ])= f [ uxv ] ; [ wx ] g op 1 R ( f [ uxv ] ; [ wx ] g )=[ uxwxv ] Figure3.7:Schematicrepresentationoftheintermolecula roperation op 22.Thesecondrecombination( op 2)isintermolecular.Itisapplicableontwo linearDNAmoleculessuchthateachofthemcontainsacopyof thesame pointer x .Asaresulttwolinearmoleculesareobtained(seeFigure3. 7). Thisrecombinationisalsoreversibleanditisformallyde nedas: op 2( f uxv;u 0 xv 0 g )= f uxv 0 ;u 0 xv g where u;v;u 0 ;v 0 ;x 2 .Thereverseoperation op 2 R hasthesamedenition as op 2.Thus,itisnotlistedseparately. 39

PAGE 52

Theoperations op 1, op 1 R ,and op 2aretranslatedintermsofMDS-IESdescriptorsthroughverewritingrules( o 1 ;o 2 ;o 3 ;o 4 and o 5 )dierentthantheoperations in.Theyaredenedbelow,appliedtoapointer p : o 1p ( 1 ( q;p ) 2 ( p;r ) 3 )= f 1 ( q;r ) 3 ; [ 2 ] g o 2p ( 1 ( p;q ) 2 ( r;p ) 3 )= f 1 3 ; [ 2 ( r;q )] g o 3p ( f 1 ( p;q ) 2 ; [( r;p ) 3 ] g )= 1 3 ( r;q ) 2 o 4p ( f 1 ( p;q ) 2 ; [( p;r ) 3 ] g )= 1 3 ( r;q ) 2 o 5p ( f 1 ( p;q ) 2 ; 3 ( r;p ) 4 g )= f 1 4 ; 3 ( r;q ) 2 g where 1 ; 2 ; 3 ; 4 areMDS-IESdescriptorsand p q and r arepointers.Notethat eachoftherewritingoperations o 1 ;o 2 ;o 3 ;o 4 and o 5 isconsideredasnonreversible. Therewritingrules o 1 and o 2 correspondto op 1,therules o 3 and o 4 correspond to op 1 R ,andtherule o 5 correspondsto op 2. Thesetofallrewritingrulesoftype o i for i 2f 1 ; 2 ; 3 ; 4 ; 5 g isdenotedbyn. Similarlyasintheintramolecularmodel,acomposition ofreductionrulesfrom niscalled reductionstrategy intheintermolecularmodel.Furthermore, isa successfulstrategy foranMDS-IESdescriptor ifoneofthefollowingistrue: (1) ( )= f l 1 ( b;e ) l 2 ; [ l 3 ] ;:::; [ l t ] g (2) ( )= f l 1 ( e; b ) l 2 ; [ l 3 ] ;:::; [ l t ] g (3) ( )= f [( b;e ) l 1 ] ;l 2 ; [ l 3 ] ;:::; [ l t ] g where l 1 ;l 2 ;:::;l t areIESdescriptors. Ithasbeenshownin[19]thattheld,hianddladrulescanbesi mulatedby rulesfromn,providedthattheinputMDS-IESdescriptorisa vailableintwo copies.Namely,ldisthespecialcaseof o 1 and o 2 where 2 isanIESdescriptor. Let = 1 ( p;r 1 ) 2 ( q;r 2 ) 3 ( r 3 ;p ) 4 ( r 4 ;q ) 5 beanMDS-IESdescriptortowhich dlad p;q isapplicable.Then, dlad p;q ( )= o 4q o 2p ( ).Similarly, dlad p;q canbe simulatedforalldierentstructuresofMDS-IESdescripto rstowhich dlad p;q is 40

PAGE 53

applicable.Next,let = 1 ( p;q ) 2 ( p; r ) 3 beanMDS-IESdescriptorsuchthat hi p isapplicableto .Providedtherearetwocopiesof ,inparticular and (theinverseof ), hi p canbesimulatedasfollows: f hi p ( ) ; hi p ( ) g = o 5 p o 5p ( ). Thefollowingresultsareprovenin[19,24]: Theorem3.2.1 EveryrealisticMDS-IESdescriptorhasasuccessfulreduct ion strategyintheintramolecularmodel(successfulreductio nstrategyin ). Theorem3.2.2 EveryMDS-IESdescriptoravailableinatleasttwocopiesha s asuccessfulreductionstrategyintheintermolecularmode l(successfulreduction strategyin n ). 3.3InvariantPropertiesundertheIntramolecularand IntermolecularModels Therearetwomaindierencesbetweenthemodelsforgeneass emblydescribed inSection3.1andSection3.2: 1.Firstmodelisintramolecularandthesecondoneisinterm olecular; 2.Inthesecondmodeltheproposedoperationsarereversibl eunlikeld,hiand dlad. Eventhoughthetwomodelsforgeneassemblymightgivedier entsuccessful reductionstrategiesforthesameMDS-IESdescriptor,the nalresultoftheassemblyisthesame.Thefollowingtheorem,provenin[19,24] ,illustratesthatfor anysuccessfulreductionstrategytheresultingsetsofMDS -IESdescriptorsare thesame.Theorem3.3.1 ( InvarianceTheorem ) Let beanMDS-IESdescriptor.If 1 and 2 areanytwosuccessfulreductionstrategiesfor ,intramolecularor intermolecular,then (a) 1 ( ) isissuccessfulstrategyoftype(i)for i 2f 1 ; 2 ; 3 g i 2 ( ) issuccessful strategyoftype(i). 41

PAGE 54

(b)Thereisanequalnumberofcircularstringsin 1 ( ) and 2 ( ) (c)TheIESdescriptors l 1 ;l 2 ;:::;l t arethesamein 1 ( ) and 2 ( ) TheInvarianceTheoremimpliesTheorem3.3.2.Theorem3.3.2 Let beanMDS-IESdescriptor.If 1 and 2 areanytwosuccessfulreductionstrategiesfor ,intramolecularorintermolecular,then 1 ( )= 2 ( ) Theobservationsdiscussedbelowwereusedin[19]toprovet heInvariance Theorem. Let = I 1 ( p 1 ;p 2 ) I 2 ( p 3 ;p 4 ) I 3 ( p 2 k 1 ;p 2 k ) I k +1 beanMDS-IESdescriptorand let I k +1 = f I 1 ;:::;I k +1 g ; I k +1 = f I 1 ;:::; I k +1 g Asymbol I m from I k +1 [ I k +1 is rightbordered bypointer p in if I m ( p;q )isa substringof .Inthiscasewesaythat I m isleftborderedbypointer p .Similarly, asymbol I m is leftbordered bypointer p in if( q;p ) I m isasubstringof .Inthis casewesaythat I m isrightborderedbypointer p Remark3.3.3 Let beanMDS-IESdescriptorand areductionstrategy.For anysymbol I m andanypointer p thatappearsinboth and ( ), I m isleft(right) borderedby p in i I m isleft(right)borderedby p in ( ).Inaddition,if p is apointerthatoccursin ,butnotin ( )thenthesymbolfrom I k +1 [ I k +1 that isrightborderedby p in isconcatenatedin ( )withthesymbolthatisleft borderedby p in .Thesymbolthatisleft(right)borderedbythemarker b ( e ) in ,staysleft(right)borderedbythemarker b ( e )in ( ). Example3.3.4 IntheMDS-IESdescriptor = I 1 (2 ; 3) I 2 ( 4 ; 3) I 3 ( b; 2) I 4 ( e; 4) I 5 thesymbol I 1 isrightborderedby2andthesymbol I 4 isleftborderedby2. Therefore, I 1 I 4 isasubstringofastringfrom ( )foranysuccessfulstrategy Furthermore, I 3 isleftborderedby 3,thus I 3 isrightborderedby3.Since I 2 isleftborderedby3, I 2 I 3 isasubstringofastringfrom ( )foranysuccessful strategy 42

PAGE 55

Next,wegiveanalgorithm,calledalgorithmforMDS-IESdes criptorassembly, whichoutputsthesetofMDS-IESdescriptorsobtainedasare sultofapplying arbitrarysuccessfulstrategytoaninputrealisticMDS-IE Sdescriptor. AlgorithmforMDS-IESdescriptorassembly ThealgorithmisbasedontheobservationsdiscussedinRema rk3.3.3.Letthe inputbeanMDS-IESdescriptor = I 1 ( p 1 ;p 2 ) I 2 ( p 3 ;p 4 ) I 3 ( p 2 k 1 ;p 2 k ) I k +1 : Ineachcycleofthealgorithmeitheralinearstring s oracircularstring[ s ]over I k +1 [ I k +1 [f ( b;e ) ; ( e; b ) g isaddedtothetheresultingset R .Theset U I isthe setofallsymbolsfrom I k +1 [ I k +1 appearinginsomestringfrom R ,and P is thesetofpointersfrom whichbordersomesymbolfrom U I .Theconstruction ofastring s stopswhenasymbol I notrightborderedbyanypointerin or borderedbyapointerthatisalreadyin P isencountered.Inthelatercase,the stringiscircularand[ s ]isaddedto R .Inthenextcycle,asymbol I 2 I k +1 U I israndomlychosenandstartingwith s = I anewstringtobeaddedin R is constructed. Thestringsareconstructedasfollows.Let s = s 0 J beastringover I k +1 [ I k +1 [ f ( b;e ) ; ( e; b ) g thatisalreadyconstructedandendswithasymbol J 2 I k +1 [ I k +1 If J isrightborderedbyapointer p (marker b e ,respectively)in thenthe algorithmsearchesforasymbol I thatisleftborderedby p ( e b ,respectively)in .ByRemark3.3.3thesymbols J and I appearconcatenatedin R (concatenated throughsymbol( b;e ),( e; b ) ;respectively )andthereforethestring s = s 0 J is recursivelyextendedbyconcatenating: -thesymbol I ,if p 6 = b; e -thestring( b;e ) I ,if p = b -thestring( e; b ) I ,if p = e Thealgorithmhaltswheneverysymbolfrom I k +1 isexhausted,i.e., U I = I k +1 43

PAGE 56

1. s = I 1 P = f p 1 g U I = f I 1 g p p 1 and R = ; 2.(a)If p 6 = b; e ,thennd I j 2 I k +1 [ I k +1 suchthat I j isleftborderedby p Set s sI j (b)If p = b ,thennd I j 2 I k +1 [ I k +1 suchthat I j isleftborderedby e Set s s ( b;e ) I j (c)If p = e ,thennd I j 2 I k +1 [ I k +1 suchthat I j isleftborderedby b Set s s ( e; b ) I j 3. U I = U I [f I j g if I j 2 I k +1 or U I = U I [f I j g if I j 2 I k +1 4.Ifthereisnopointer q suchthat I j isrightborderedby q ,set R = R [f s g andgoto6. 5.If q isapointersuchthat I j isrightborderedby q ,thengoto6. 6.(a)If q 2 P ,thenset R = R [f [ s ] g andgoto7. (b)If q= 2 P ,thenset P P [f q g ;p q andgoto2. 7.If U I I k +1 thenchoose I m 2 I k +1 U I ,otherwisegoto10. 8.Find p j and p m suchthat I m isrightborderedby p j and I m isleftbordered by p m 9.Set s = I m ;P P [f p j g ;U I U I [f I m g and p p j 10.If p m = p j ,thenset R = R [f [ s ] g andgoto7,otherwisegoto2. 11.Output R ThealgorithmaboveoutputsthesetofallMDS-IESdescripto rs R = ( ), whichisaresultofapplicationofanysuccessfulstrategy to .Therefore, givenanMDS-IESdescriptor,thealgorithmprovidesthenum berofallresulting descriptors j R j ,theirstructureandthenumberofcirculardescriptors( j C j ,where C = f [ s ] j [ s ] 2 R g ). 44

PAGE 57

Intermsofgeneassembly,givenmicronuclearMDS-IESgenes tructure,thealgorithmpredictsthenumberofmoleculesobtained,thenumb erofcircularexcised moleculesandtheMDS-IESstructureofeachresultingmolec ule. Hereisanexample. Example3.3.5 Let = I 1 (3 ; 4) I 2 (4 ; 5) I 3 (6 ; 7) I 4 (5 ; 6) I 5 (7 ; 8) I 6 (9 ;e ) I 7 ( 3 ; 2) I 8 ( b; 2) I 9 (8 ; 9) I 10 : Then I 10 = f I i j 1 i 10 g Whenweapplyouralgorithm,weobtainthefollowingstepss = I 1 and I 1 isrightborderedby3.Since I 7 isrightborderedby 3, I 7 isleft borderedby3.So,afteranexecutionofstep2ofthealgorith mwehavethestring s = I 1 I 7 U I = f I 1 ;I 7 g .Next, I 7 isrightborderedby e and I 8 isleftborderedby b .Hence,byrepeatedlyapplyingstep2ofthealgorithm,weex tendthestring as s = I 1 I 7 ( e; b ) I 8 and U I = f I 1 ;I 7 ;I 8 g and P = f 3 g .Afterthreemoresteps, thestring I 1 I 7 ( e; b ) I 8 I 9 I 6 I 10 isobtainedandtheprocessstops,sincethereisno pointerthat I 10 isleftborderedwith.Thestring s = I 1 I 7 ( e; b ) I 8 I 9 I 6 I 10 isadded to R Atthispoint, U I = f I 1 ;I 6 ;I 7 ;I 8 ;I 9 ;I 10 g6 = I 10 and P = f 2 ; 3 ; 8 ; 9 g Ifwechoose I 2 2 I 10 U I .Then s = I 2 U I = f I 1 ;I 2 ;I 6 ;I 7 ;I 8 ;I 9 ;I 10 g and P = f 2 ; 3 ; 4 ; 8 ; 9 g .Since I 2 isrightandleftborderedby4,[ I 2 ]isaddedto R Ifwechoose I 3 2 I 10 U I thenwestartbuildinganewstringfrom s = I 3 Thesymbol I 3 isrightborderedby6, I 5 isleftborderedby6and I 5 isright borderedby7, I 4 isleftborderedby7,sothestring s = I 3 I 5 I 4 isconstructed. Wehave U I = f I 1 ;I 2 ;I 4 ;I 6 ;I 7 ;I 8 ;I 9 ;I 10 g and P = f 2 ; 3 ; 4 ; 5 ; 6 ; 7 ; 8 ; 9 g Since, I 4 is leftborderedby5and5 2 P ,weaddthecircularstring[ I 3 I 5 I 4 ]to R Theoutputis R = f I 1 I 7 ( e; b ) I 8 I 9 I 6 I 10 ; [ I 3 I 5 I 4 ] ; [ I 2 ] g NotethattheMDS-IESdescriptor describestheMDS-IESstructureofthe micronuclearactinIgenein O.nova .Intheprocessofgenerearrangement 45

PAGE 58

amoleculecontainingthecorrectlyassembledMACgeneandf ewmolecules composedofIESsonlyareobtained.Accordingtothealgorit hm,twocircular moleculesareexcised,onecomposedofIES 2 onlyandthesecondonecomposed ofIES 3 ; IES 4 andIES 5 .Theleft(right)contextofthemacronuclearactinIgene isexpectedtobe I 1 I 7 ( I 8 I 9 I 6 I 10 ). 46

PAGE 59

4TemplateGuidedModelsforDNARearrangements ThemodelsforhomologousDNArecombinationdescribedinCh apter3assume thatthecorrectpairsofpointersalignandsplice.Eventho ughthepointersare upto20bplongsequences,someofthemmaybeasshortas2bp(s ee[11,45]). Whenapointersequenceisveryshort,itmayrepeatmorethan twicewithina micronucleargene.Therefore,therecognizingandalignin gofthecorrectpairsof shortpointerscannotbeaccurate. Toaccountforthecorrectalignmentandsplicingoftheshor tpointerswe proposedatemplateguidedrecombinationmodelin[2],whic histhemainsubject ofthischapter.InSection4.1andSection4.2,wedescribet hemodelforthecases wherethetemplatesaredsRNAorssRNA,respectively.These templateguided modelscansimulatethethreemolecularoperationsfromthe intramolecularmodel discussedinSection3.1.InSection4.3,wetranslatetheRN AguidedDNA recombinationmodeltosimulatelooprecombination,doubl elooprecombination andhairpinrecombination.Inaddition,wepostulatethata nextratwistoccurs inthedoublestrandedDNAmoleculeafterapointerrecombin ationisperformed. Thisobservationmaysupportthetopologicalmechanismfor genedescrambling introducedin[13]. 4.1HomologousRecombinationwithdsRNATemplates Inthemodeldescribedinthissection,ourassumptionistha tthetemplatesare dsRNAmoleculesi.e.,theportionofthemoleculethatplays theroleofatemplate isdouble-stranded. First,wesetupthenotationandthelabelingofthemolecule sinvolvedin 47

PAGE 60

therecombiningprocess.AlthoughtheDNAmoleculeshavehe licalstructure,for simplicitywerepresentthemasladders.Theupperandthelo werboundaryof theladdercorrespondtothecomplementaryDNAorRNAstrand s. Let T bethedsRNAmoleculethatplaystheroleofatemplate,andle t X and Y betwoportionsofaDNAmolecule(s)thatcontainthesamepoi nter(see Figure4.1). uT uX 3'3' 3' 3' lT lY uY lXX YT3' a b 1 b 2 a 1 IES pointer th MDS i st MDS (i+1) g b e ab d a b 2 a 2 g pointer 2 b 1 b a IES 1 Figure4.1:LadderrepresentationofDNAsegments X;Y and T ,where corresponds toMDSi, r toMDSi+1, topointer i +1and and correspondtoIESs.Theportions X Y and T ofthemoleculesarerepresentedintheguresas ribbonswhosesugar-phosphatebackbonesareorientedfrom 5 0 to3 0 .Thebase pairingisrepresentedwithverticalbarsconnectingtheba ckbones.Figure4.1 introducesthenotationusedbelow.The\upper"strandof X indirection5 0 { 3 0 (denoted uX )hasblock composedofanucleotidesequences and Weassumethat isaportionofthe( i 1)stMDS, isthe i thpointerand isaportionofanIES.The\upper"strandof Y indirection3 0 {5 0 (denoted uY )containsablockcomposedofnucleotidesequences and r .Inthiscase r isaportionofthe i thMDSand isaportionofanIES.Thebarredsymbols indicatethatthesequenceisWatson-Crickcomplementofth esequencedenoted byunbarredsymbolandviceversa. WeproposeadsRNAtemplate T ,suchthatits\upper"strandindirection3 0 { 5 0 (denotedwith uT )hasablock r composedofsequences and r .Clearly, thetemplatecontainsthe ith pointer,andportionsofboththe MDS i 1 and MDS i .Thelowerstrandsof T X and Y (denoted lT lX lY )arecomplementary totheupperstrands.Theproposedstepsofthehomologousre combinationare asfollows: A.Themoleculescontainingportions X Y and T arepresentintheenviron48

PAGE 61

X T Y T a b b a b g a b g g a b e g g g a 1 a 2 b 2 b 11 cc 2 3 c c 4 3 c c 4 c 2 X Y T 1 c b b d a 3' b Template guides 5' a b b b is the assembled molecule cuts e g alignment (A) (B) (C) (D) (E) dsRNA only four uT T uX lX X Y lT lY uY The blue piece branch migration The green piece contains the IESs as one circular molecule (F) X X Y Y T X and Y in DNA braiding position Template gone Figure4.2:StepbystepmodelforDNArecombinationguidedb ydoublestrandedRNA template.mentatthesametime.Sincetheyarecomposedofhomologouss equences ( isthehomologousportionof T and X and r isthehomologousportion of T and Y )theyaligninparallelasshowninFigure4.2(A). Evenifthepointersequence isasshortastwonucleotidesandoccursmore thantwiceintheDNAsequence,thecontextof in T ( r ),theleftcontext in X ( )andtherightcontextin Y ( r )wouldbesucienttoleadtothe alignmentofthecorrectpointersequences. B.Anunzippingofthethreedouble-strandedstripesoccurs ,frompoint a 1 to 49

PAGE 62

a 2 on X ,from b 1 to b 2 on Y andfrom a 1 to b 2 on T (seeFigure4.1). Thesequencebetween a 1 and a 2 from uT iscomplementarytothesequence between a 1 and a 2 from uX .So,theyanneal.Similarly,theportionon lT thatisbetween b 1 and b 2 pairswiththeportionof uY between b 1 and b 2 ,sincetheyarecomplementarysequences.Aportionof lX containing andaportionof lY containingasubsequence remainsinglestranded. Sincetheyareincloseproximityofeachotherandsingle-st randed,hydrogen bondsformbetweenthecomplementaryregionsconnecting lX and lY ,as showninFigure4.2(B).Notethatthepoints a 1 and a 2 on X b 1 and b 2 on Y wheretheunzipping andlateronthepairingoccursarenotxed.Theyareconside redtobe probabilistic.Atsomepointduringthisprocess,cutsaremadeonthelowera ndupper backbonesof X and Y ;between and on uX ,between and on lX between and on uY andbetween and r on lY .Labeledby c 1 ;c 2 ;c 3 and c 4 thecutspositionsareshownonFigure4.3.Thecutsmaybesym metrically reversed(seeFigure4.4). uX lX uY lY a b g d e a b e g d b b = cut Figure4.3:Thepositionsofcutsinmolecules X and Y correspondingtocutpoints c1;c2;c3;c4inFigure4.2(E).C.ThenextstepisdepictedinFigure4.2(C).Thesingle-str andedsubsequences and from T formhydrogenbondswiththecorrespondingcomplementary sequences of uX and from uY ,respectively. D.InFigure4.2(D),thehydrogenbonds,between uX and uT ononehandand lT and uY ontheotherhand,starttodissociate.SincetheRNAduplexe s 50

PAGE 63

uX lX uY lYa b d e b g a b d e b g c 4 c 1 c 3 c 2 Figure4.4:Thepossiblepositionsofcutsinmolecules X and Y .aremorestablethanRNA-DNAduplexes,thetemplatestrands release uX and uY .Thesinglestrandedsequencesat uX and uY arecomplementary, sotheystarttohybridize. E.ThenalresultisshowninFigure4.2(E).Thepointersequ ence of uX bindswith sequencein uY .Withthatthetemplateisreleasedandits hydrogenbondsarerepaired.Thus,thetemplateremainsunc hangedand mayserveforfurtherrecombination. F.Figure4.2(F)showstheresultingmoleculesobtainedaft errecombination. Thecutsareintroducedatlocationslabeledby c 1 ;:::;c 4 .Theblueportion ofthebraidingmoleculeindicatesthenewrecombinedmolec ulecontaining thesequence r ,i.e.,themoleculecontainingablockofconsecutiveMDSs. Assumingthattheportionsthathaveundergonerecombinati ons,portions X and Y ,belongtothesameDNAmolecule,afterrecombination,ther emainingfragments(containingsequence )couldbereleasedasacircular molecule,indicatedwithgreeninFigure4.2(F). Thenalstepoftherecombination,afterthecutshavebeeni ntroduced,is schematicallyshowninFigure4.5. Therightportionofmolecule Y rotatestowardsmolecule X (\fallsdown") andtheleftportionofmolecule X rotatestowardsmolecule Y (also\fallsdown"), permittingthenickstobeligated.Similarly,thebackbone softheotherDNA portionareligated. Thewholeprocessofhomologousrecombinationdescribedab oveisirreversible. Oneoftheresultingmoleculesalwayscontainsapairofcorr ectlyjoinedMDSs, 51

PAGE 64

X X Y Y Figure4.5:Theresultingmoleculesaftertherecombinatio n.whiletheIESsareremoved.TheRNAtemplateisnotphysicall yincorporatedin theresultingmoleculeanditcanguideotherrecombination eventsifnecessary. 4.2HomologousRecombinationwithssRNATemplates InthissectionamodelforDNAhomologousrecombinationgui dedbysingle strandedtemplateisproposed.Theprocessisverysimilart otheonedescribed inSection4.1,wherethetemplatewasdoublestrandedRNA.H erewedescribea modelwhichisbasedontheassumptionthatthetemplateisss RNA. AsinthecaseofdsRNA,weassumethatthessRNAcontainsaseg ment r forsomesequences and r ,where correspondstoapointer.Thesegments and r correspondtoportionsoftwoconsecutiveMDSs.Sincethete mplateis singlestranded,werepresentitbyalinesegmentdenotedby T (seeFigure4.6). WeconsidertwodoublestrandedDNAsegments X and Y possiblypartsofthe sameDNAmolecule.Both X and Y containthepointersequence WeusethesamenotationasintroducedinFigure4.1.Thedoub lestranded DNAmoleculesarerepresentedasribbons.Thebasepairingi srepresentedby verticalbars.Upperstrandsarelabeledby uX and uY ,whilethelowerstrands by lX and lY .ThestepbysteprecombinationprocessisgiveninFigure4. 6. A.Thethreemoleculesaligninparallel,asshowninFigure4 .6(A).The segmentfromthetemplate T ishomologoustothe blockin X and templatesegment r ishomologousto r in Y .Therefore,thealignmentis possibleevenifthepointersequence isveryshort. 52

PAGE 65

Figure4.6:Schematicrepresentationofstepbysteprecomb inationguidedbyassRNA template.B.Thehydrogenbondsstarttobrakein X and Y .Theunzippingoccurs betweenpoints a 1 and a 2 in X alongtheportion( )where X and T are homologous.Similarly,thehydrogenbondsin Y starttobreakbetween points b 1 and b 2 alongtheportion( r )where Y and T arehomologous. Thesequence from lX iscomplementaryto portionfrom T ,sothey hybridize.Ontheotherhand,hydrogenbondsstarttoformbe tweenthe sequence r from uY anditscomplementary r from T Thisprocessisprobabilistic,suchthatnotnecessarilyal lof and from X arecapturedinthedouble-strandedportionbetween T and X ,andnot 53

PAGE 66

necessarilyallof r and from Y arecapturedinthedouble-strandedportion between T and Y .Weprobabilisticallydividethe sequenceintotwoparts labeledby 1 and 2 ,suchthat 1 istheportionwhere lX and T hybridize and 2 istheportionwhere uY and T hybridize.Therefore, uX and uY arecomplementaryalong 1 ,while lX and lY arecomplementaryalong 2 Hydrogenbondsareformedbetweenthosecomplementaryport ionsasshown inFigure4.6(B). C.TheRNA-DNAduplexesarenotstable,sothebondsbetween T and X and between T and Y brakealongportions and r ,respectively.Thehydrogen bondsinthe portionof X andthe r portionof Y arereestablishedas showninFigure4.6(C). D.Next,thehydrogenbondsbetween T and X and T and Y begintobrake along .Aftercompletedissociationofthesebonds,thetemplatei sentirely released.Itisunchangedandreadytoguideotherrecombina tionevents.At thispoint,the 2 portionsofboth uX and uY arefreeandcomplementary toeachother.Thus,aformationofhydrogenbondisenabled. Similarly, lX and lY bindalong 1 Duringthisstep,fourcutsareintroducedonthestrandsof X and Y .The cuttingpositionsarelabeledby c 1 ;c 2 ;c 3 and c 4 asdepictedinFigure4.6(D). ForanotherpossibilityofcuttingpositionsseeFigure4.4 Theresultingmoleculesofthisprocessareassembledasexp lainedinSection4.TheyaregiveninFigure4.5.SameasthecaseofdsRNAt emplate,the recombinationguidedbyssRNAtemplateisireversable. Thebiologicalexplanationoftheboth(ssRNAanddsRNA)tem plateguided modelsforDNArecombinationcanbefoundin[2]. 54

PAGE 67

4.3Loop,HairpinandDoubleLoopRecombinationthroughthe TemplateGuidedModel Threemolecularoperations,looprecombination,hairpinr ecombinationanddoublelooprecombination,wereproposedinthepointerguided intramolecularmodel forgenerearrangementsdiscussedinSection3.1.Inthisse ction,wetranslatethe modelofDNArecombinationguidedbyanRNAtemplate,asdesc ribedinSection4.1andSection4.2tosimulateeachoftheseoperations .Furthermore,we observethatanextratwistisformedinthedoublestrandedD NAmoleculeafter correctrecombinationisperformed. 4.3.1LoopRecombinationwithTemplates Let G betheMICgenethatencodestheproteinknownas -telomerebinding protein( TBP ).Itcontainstheportion MDS 1 IES 1 MDS 2 IES 2 MDS 3 IES 3 (seeFigure4.7(A)).Itisclearthatalooprecombinationis applicableto G atthe pointer3.Asaresult(afterrecombinationofpointer3)one copyof3andIES 2 areexcisedfromthemolecule,andMDS 2 issplicedwithMDS 3 throughonecopy of3.Theblockthatisunderconsiderationbecomes MDS 1 IES 1 MDS 2 ; 3 IES 3 andtherestofthegeneisunchanged. Atemplatethatcontainsaportioncorrespondingto MDS 2 ; 3 guidesthe alignmentofthepointers3.Uponthisalignmentthegenefor msaloopasshown inFigure4.7(B).Now,lettheupperstrand(thatis5 0 -endto3 0 -endstrand)of MDS 2 (withoutthepointersthatMDS 2 isborderedwith)bedenotedby and thelowerstrand(3 0 -endto5 0 -endstrand)by .Similarly,lettheupperstrandof pointersequence3bedenotedby andtheloweroneby ,theupperstrandof 55

PAGE 68

MDS 1 IES 1 IES 2 MDS 4 IES 3 22 MDS 2 33 MDS 3 44 MDS 1 IES 1 IES 2 22 MDS 2 3 IES 2 MDS 4 IES 3 3 MDS 3 4 4(A)(B)(C)(D)(E)IES 2MDS 2 3 3 MDS 3 Figure4.7:Step-by-stepprocessoflooprecombinationwit hatemplate.MDS 3 (excludingpointersequences3and4)by r ,thelowerstrandby r andthe upperstrandofIES 2 by ,thelowerstrandby .Inaddition,letthe partof IES 2 bedenotedbyIES 02 and partbyIES 002 .Inthesetermsthethreeportions of G :MDS 2 3 IES 02 ,IES 002 3 MDS 3 ,andMDS 2 MDS 3 translateintothe ribbons X Y and T fromFigure4.1,respectively. TherecombinationcanbeperformedexactlyasexplainedinS ections4.1, 4.2andtheobtainedresultisillustratedinFigure4.7(C), withthetemplate alreadygone.Introductionofthecuts c 1 c 2 c 3 and c 4 andtheappropriate strandpairingseparatesthemolecule G intwopartsthatareunlinkedasshown inFigure4.7(D).Thedoublehelixformsineachoftheseregi onswhereligation takesplaceconnectingpoint c 1 with c 4 andpoint c 2 with c 3 .Notethatinorder fortheligationtooccur,points c 1 and c 4 ; c 2 and c 3 mustbebroughtnexttoeach 56

PAGE 69

other,whichintroducesanextratwistinthedouble-strand edDNA.Asanal resulttwomoleculesareobtained;onecircularandoneline ar,exactlyasexpected fromeithermodel.Thecircularmoleculeisexcisedfromthe geneandcontains oneexcisedcopyofpointer3andIES 2 .Thelinearmoleculeisthesameas G exceptthatMDS 2 andMDS 3 aresplicedtogetherthrough(acopyof)pointer3. 4.3.2DoubleLoopRecombinationwithTemplates IES1MDS4MDS6IES2IES3MDS5IES44 6 5 7 56IES1MDS4MDS6IES2IES3MDS5IES44657 56(A)(B)(C)(D) (F) (E) e e Figure4.8:Step-by-stepprocessofdoublelooprecombinat ionguidedbyatemplate.57

PAGE 70

Themicronuclearactingene G in Oxytrichanova containsasegment MDS 4 IES 2 MDS 6 IES 3 MDS 5 IES 4 asshowninFigure4.8(A).Themoleculesatisestherequire mentsforthedoublelooprecombinationtobeperformedonpointers5and6sin cethesequence betweenthetwooccurrencesof5overlapsthesequencebetwe enthetwooccurrencesof6.Afterapplyingdoublelooprecombinationon5an d6,amoleculethat containsthesegment IES 1 MDS 4 ; 5 ; 6 IES 3 5 IES 2 6 IES 4 isobtained,whereMDS 4 ,MDS 5 andMDS 6 aresplicedtogether.Notethatout ofthisblock, G remainsunchanged.Themoleculefoldsintotwoloops(Figur e 4.8(B))sothatthetwocopiesofpointer5alignwitheachoth erinoneloopand thetwocopiesofpointer6alignwitheachotherintheotherl oop.Inthiscasethe alignmentisguidedeitherbyoneortwotemplates;oneforea chpairofpointers. Ifthetemplateconsistsofonemolecule,thenitcontainsth esegmentMDS 4 ; 5 ; 6 andiftherearetwoshorttemplates,thanonemustcontainMD S 4 ; 5 andtheother onemustcontainMDS 5 ; 6 Lettheupperstrand(labeled5 0 to3 0 )ofMDS 4 (notincludingthepointer sequencesrankingMDS 4 )bedenotedby andthelowerstrand(3 0 to5 0 )by Similarly,lettheupperstrandofpointer5bedenotedby andthelowerone by ,theupperstrandofMDS 5 by r ,thelowerstrandby r ;theupperstrandof IES 2 isdenotedby ,thelowerstrandby ,theupperstrandof IES 3 by ,the loweroneby .Also,denotetheupperstrand(5 0 to3 0 )ofMDS 6 by andthe lowerstrand(3 0 to5 0 )by .Theupperstrandofpointer6isdenotedby and theloweroneby ;theupperstrandof IES 4 by ,thelowerstrandby Inthissettingtworecombinationsareapplied,oneatthe -partandoneatthe -partofthemolecule,asshowninFigure4.8(C).Points c 1 ;c 2 ;c 3 ;c 4 ;s 1 ;s 2 ;s 3 and s 4 representthecutsthatoccur.TheresultisdepictedinFigu re4.8(D),whichis 58

PAGE 71

thesameastheoneinFigure4.8(E)withthemoleculeunloope d.Atthispoint thebackbonesofthemoleculeareligated.Theligationshap penbetween: c 1 and c 4 ; c 2 and c 3 ; s 1 and s 4 ; s 2 and s 3 ;asshowninFigure4.8(E),suchthatoneextra twistoccurs.TheresultingmoleculeinFigure4.8(F)conta ins MDS 4 ; 5 ; 6 IES 3 5 IES 2 6 IES 4 whichisthesamemoleculethatshouldbeobtainedafterdoub lelooprecombinationof G 4.3.3HairpinRecombinationwithTemplates Theactingene G containsthesegment MDS 2 IES 1 MDS 1 IES 2 .InFigure 4.9(A)MDS 2 isinvertedrelativetoMDS 1 .Hence,thehairpinrecombination operationisapplicabletopointer2.Itsapplicationleads toinversionofMDS 2 andsplicingittogetherwithMDS 1 throughonecopyof2.Thisisaccomplished byaligningthetwocopiesofpointer2suchthatthemolecule formsahairpinas showninFigure4.9(B). Thealignmentofthecorrectpointersisguidedbyatemplate ,whichcontains thesegmentMDS 1 ; 2 .Inordertoconvertthissettingintothenotationsusedin Section4.1andSection4.2,lettheupperstrand(5 0 to3 0 )ofMDS 1 (excluding thepointers)bedenotedby andthelowerstrand(3 0 to5 0 )by .Similarly,let theupperstrandofpointer2bedenotedby andtheloweroneby ,theupper strandofMDS 2 withoutpointersequences2and3by r ,thelowerstrandby r theupperstrandofIES 2 isdenotedby ,andthelowerstrandby ,theupper strandofIES 1 is ,thelowerone Inthissetting,therecombinationbetween -regions(Figure4.9(C))yields theoutcomeillustratedinFigure4.9(D).Next,thebackbon esofthemoleculeare ligatedinfourplaces;namelybetweenpoints c 1 and c 4 andbetweenpoints c 2 and c 3 .TheassembledmoleculecontainstheexpectedsegmentMDS 1 ; 2 IES 2 2 IES 1 (Figure4.9(E)). 59

PAGE 72

IES1MDS2MDS13 22 IES2MDS12 IES2IES12 MDS23(A)(B)(C)(D)(E) Figure4.9:Step-by-stepprocessofhairpinrecombination guidedbyatemplate.Remark4.3.1 M.Daleyetal.proposeatopologicalmechanismforgenedescramblingin[13].Theyconsiderthecontributionofthe3dimensionalDNA structureintheprocessofgeneassembly.Theirmodelhypot hesizesthateach recombinationeventintheprocessofgeneassemblyalterst he3-dimensionalpositionoftheDNAmoleculeasaresultofsupercoiling.Thene wDNAtopology bringstogethertheMDSsthataretobedescramblednext. AsexplainedinSection4.3.1andSection4.3.2,thecorrect ligationoftherecombiningDNAmoleculecanoccur,onlyifanextratwistinth edouble-stranded DNAisintroduced.AccordingtoourRNAguidedDNAmodelforg enerearrange60

PAGE 73

mentsfewpointersmightrecombinesimultaneously.Theref ore,fewtwistswould beaddedtotheDNAmoleculeatonce.Increasingthenumberof twistsincreases thesupercoiling,accordingtotheformula Lk = Tw + Wr ,whereLkthelinking number,TwisthetwistingnumberandWristhewrithe.Formor edetailssee [41].Therefore,theextratwistthatweproposemightserve asanexplanationfor theDNAsupercoiling,whichisexpectedtochangetheDNAsec ondarystructure andcontributetothefurtherassemblyprocessaspostulate din[13]. 61

PAGE 74

5AssemblyGraphs TheproposedmodelforDNArecombinationdescribedinChapt er4isthemotivationforintroducingandstudyingtheassemblygraphs. Inthischapterwe considerspatialgraphsasarepresentationoftheDNAstruc turesinspaceduringtherecombinationprocesses.InSection5.1weexplainh owanMDS-IES micronucleargenestructuretranslatesintoagraphwithri gidvertices(assembly graph).Thedenitionsandthenotationsusedinbuildingth eassemblygraph modelcanbefoundinSection5.2.InSection5.3apolygonalp athisdened asamodelforsinglemacronucleargeneandaHamiltonianset isdenedasa setofpolygonalpathswhoseunioncontainseveryvertexoft heassemblygraph exactlyonce.Weintroducethenotionofanassemblynumberi nSection5.6. Theassemblynumbercorrespondstotheminimalnumberofmac ronucleargenes obtainedaftercorrectassemblyofthemicronuclearDNAmol eculeandtherefore, itisdenedascardinalityoftheminimalHamiltonianset.W eprovethatfor everynaturalnumber n ,thereisanassemblygraphwithassemblynumber n Wealsocharacterizetheassemblynumberofacompositionof assemblygraphs. Theminimumrealizationnumber, R min ( n ),foranaturalnumber n isdened inSection5.5asthecardinalityoftheminimalassemblygra phwithassembly number n .Furthermore,werepresenttheassemblygraphstroughdoub leoccurrencewordsinSection5.5,similarlyastheknotdiagramsha vebeenrepresented throughGausscodesinknottheory.Usingsuchrepresentati on,weprovethat R min ( n )
PAGE 75

smoothingofallverticesofcorrespondstotheresultings etofDNAmolecules obtainedbycorrectgeneassemblyofthemicronucleargene( s)representedby. Weshowthatthesmoothingswithrespecttopolygonalpathss tudiedinSection 5.6andthesmoothingsbypropercoloringintroducedbyKau manin[32]dene dierentoperationsontheassmemblygraphs. 5.1GeneRearrangementsviaAssemblyGraphs Inthissectionwedescribeaconstructionofaspatialgraph thatmodelsgiven MICgene. X Y X and Y in pointer-aligning position (4-valent vertex) MDSi IES MDS(i+1) IES MDSi MDS(i+1) IESIES X Y smoothing pointer alignment Figure5.1:Schematicrepresentationofthepointeralignm entshownasa4-valent vertexandtherecombinationresultshownassmoothingofth evertex.ConsiderthesituationwheretwosegmentsofDNAmoleculesh avealigned pointersandarereadyforrecombinationaspresentedinFig ure5.1(left).Assume that X and Y arepartsoftheprecursorDNAmoleculefromamicronuclearg ene. Theleft(blue)portionof X andtheright(blue)portionof Y areregardedas sequencesofthe i thand( i +1)stMDS,respectively.Themiddle(pink)regionof bothmoleculesdenotesthesequenceofthe( i +1)stpointer.Therestof X and Y belongtoIESssequences. Thestageatwhichthepointersalign,justbeforethehomolo gousrecombinationtakesplace,isdepictedbyanintersectionofthetwomo leculesatthepointer 63

PAGE 76

region(showninpinkinFig5.1,left).Theresultingproduc tofhomologous recombinationisdepictedontherightofFigure5.1.Aftert herecombination, oneoftheresultingmoleculescontainstwonewlyorderedMD Ss,andtheother containstheexcisedIES(s). MDS33 4 MDS44 5 5 MDS6 6 6 MDS5 7 7 MDS7 8 8 MDS8 9 9 MDS9MDS1 2 2 3 MDS2 IES1IES2IES3IES4IES5IES6IES7IES8 Figure5.2:SchematicrepresentationofmicronuclearActi nIgenein Oxytrichanova .ConsidertheexampleinFigure5.2.Weassociateagraphtoth eMICsequence depictedinFigure5.2inthefollowingway.Foreachpairofp ointersthatoccurs intheMICsegment,weplacea4-valentvertexintheplane.La belthevertices withthecorrespondingnumber.Thereare8suchverticesint hisexample,labeled 2through9(seeFigure5.3(A)).Next,wepickapointinthepl ane,calledthebase point(theblackdottotheleftofpoint3),andaswereadthes tringofpointers intheMICgene(3-4-4-5-6-7-5-6-7-8-9-3-2-2-8-9),follo wingachosendirection,we startconnectingthevertices:vertex3connectstovertex4 ,loopsbacktovertex4, connecttovertex5etc.Whenallvertices(pointersinthest ring)areexhausted, weconnectthepathbacktothebasepoint.Inthisway,weobta inagraphwith 4-valentvertices.Theresulting4-valentgraphiscalleda spatial graph,whenthe graphisconsideredtobeembeddedinspace. WelabeleachedgeconnectingapairofverticesinbyMDS i orIES j such that,startingatthebasepoint,ifonetravelsfollowingi tsorientation,the edgelabelsfollowtheappearanceoftheMDSsandIESsasthey appearinthe scrambledmicronucleargene.FortheexampleshowninFigur e5.2,aftervisiting vertex3thelabelofthenextedgeisMDS 3 ,followedbyIES 1 ,thenbyMDS 4 and soon.ThislabelingisshowninFigure5.3(A). NotethattherepresentationoftheActinImicronucleargen efromFigure5.2 asaspatialgraphinFigure5.3(A)completelycapturesitsM DS-IESstructure suchthateachpairofpointersthatoccurinthegeneisrepre sentedbyavertex in.TheDNArecombinationcanappearateveryalignmentoft hepointers. 64

PAGE 77

4 5 6 7 8 9 (A) (B) MDS3 IES6 MDS9 MDS8 MDS7 MDS6 MDS5 MDS4 MDS2 MDS1 IES1 IES2 IES4 IES3 IES7 IES8 IES5 MDS3 IES6 MDS9 MDS8 MDS7 MDS6 MDS5 MDS4 MDS2 MDS1 IES1 IES2IES4 IES3 IES7 IES8 IES5 3 2 Figure5.3:Schematicrepresentationofthesimultaneousb raidingandrecombination process.AsshowninFigure5.1,thisisrepresentedbysmoothingofth everticesof, accordingtotheMDS-IESstructure.Therearetwotypesofve rtexsmoothings relativetotheorientationofthegraph.Thesmoothingsare studiedinmore detailsinSection5.6.Theappropriatetypeofsmoothingca nbeperformedon everyvertexofatonce.FortheexampleofActinIgenefromF igure5.2, theresultofthesimultaneoussmoothingofeachvertexissh owninthediagram depictedinFigure5.3(B).Asshown,thisresultiscomposed ofthreeconnected components.TwoofthemarelabeledonlywithIESswhichindi catestheIES excision.OneofthecomponentscontainsMDS 1 MDS 2 MDS 3 MDS 9 whichrepresentstheassembledmacronucleargeneincorrec tMDSorder. Followingthisexample,wecanassociateaspatialgraphtoa nyMICDNA segment.InordertopreciselyformulateandstudytheMDS-I ESmicronuclear genestructuresandtheirrecombinations,weintroducethe notionofassembly graphsinSection5.2asaspecialtypeofgraphs. 5.2DenitionsandNotations Foran n -tuple x =( x 1 ;:::;x n )ofelementsofaset X ,wedeneits reverse tobe ( x n ;:::;x 1 )whichisdenotedby x R =( x 1 ;:::;x n ) R .Alsodenetheset x rev = ( x 1 ;:::;x n ) rev = f x ; x R g .Two n -tuples x and y arecalled reverseequivalent if 65

PAGE 78

x rev = y rev ,anddenotedby x rev y .Givenan n -tuple x =( x 1 ;:::;x n ),we denoteby x cyc =( x 1 ;:::;x k ) cyc theset: ( x 1 ;:::;x k ) cyc = f ( x 1 ;:::;x k ) ; ( x 2 ;:::;x k ;x 1 ) ;:::; ( x k ;x 1 ;:::;x k 1 ) ; ( x k ;:::;x 1 ) ; ( x k 1 ;:::;x 1 ;x k ) ;:::; ( x 2 ;:::;x k ;x k 1 ) g whichconsistsofallcompositionsofcyclicpermutationsa ndreversesofthe n tuple( x 1 ;:::;x k ).Two n -tuples x and y arecalled cyclicallyequivalent if x cyc = y cyc ,whichisdenotedby x cyc y Let=( V;E )beanitegraphwithasetofvertices V andasetofedges E .Weallowmultipleedgesandloops.Denoteby E ( v )thesetofedgesthatare incidenttoavertex v 2 V .Everyloopatvertex v iscountedastwodierent edgesincidentto v .Thecardinalityof E ( v )(countingtwicetheloops)iscalled valencyof v Let v 2 V and E ( v )= f e 1 ;:::;e k g .Foreach v wexanorderof E ( v )written by( e i 1 ;:::;e i k ).Wedenoteby E cyc ( v )theset( e i 1 ;:::;e i k ) cyc .Diagrammatically, thevertex v canbeconsideredasasmalldisksuchthatincidentedgesare attachedtotheboundaryofthediskatapointcalled\entering point"of e .The edgesincidentto v aresketchedsuchthatifonetracestheboundaryofthedisk clockwise,theenteringpointsoftheedgesfrom E ( v )areencounteredintheorder foundin E cyc ( v ). Denition5.2.1 A rigidvertex inanitegraph=( V;E )isapair( v;E cyc ( v )). If( v; ( e 1 ;:::;e k ) cyc )isarigidvertexthentheedges e i 1 and e i +1 arearecalled neighbors of e i ,for i =2 ;:::;k 1;theneighborsof e k are e k 1 and e 1 ;the neighborsof e 1 are e k and e 2 Example5.2.2 Considerthegraph 1 giveninFigure5.4.Wehave E ( v 3 )= f e 3 ;e 4 ;e 6 ;e 7 g andaccordingtothecyclicorderoftheedges,the4-tuple( e 3 ;e 6 ;e 4 ;e 7 ) canbeassociatedtovertex v 3 .Therefore,( v 3 ; ( e 3 ;e 6 ;e 4 ;e 7 ) cyc )isarigidvertex in 1 66

PAGE 79

Denition5.2.3 An assemblygraph isaniteconnectedgraph,whereallverticesarerigidverticesofvalency1or4.Avertexofvalency 1iscalledan end point Furtherinthedissertationwhenwerefertoverticesofanas semblygraphwe alwaysconsiderthemasrigidvertices. Thenumberof4-valentverticesinisdenotedwith j j .Theassemblygraph iscalled trivial if j j =0. Denition5.2.4 Twoassemblygraphs 1 =( V 1 ;E 1 )and 2 =( V 2 ;E 2 )are isomorphic ifthereisagraphisomorphismthatpreservesthecyclicor derofeach rigidvertex.Morespecically,foragraphisomorphism=( v ; e ): 1 2 with v : V 1 V 2 and e : E 1 E 2 ,foreveryrigidvertex( v; ( e 1 ;e 2 ;e 3 ;e 4 ) cyc ) in 1 ,wehave( v ( v ) ; ( e ( e 1 ) ; e ( e 2 ) ; e ( e 3 ) ; e ( e 4 )) cyc )=( v ( v ) ;E cyc ( v ( v ))). Insteadofwriting( v e ),wewillusenotationforassemblygraphisomorphismbywritingwheneveritisclearfromthecontextwhich mapisused. Denition5.2.5 A transversepath inisasequence r =( v 0 ;e 1 ;v 1 ;e 2 ;:::;e n ;v n ) if v 0 ;v n areendpoints,or( v 0 ;e 1 ;v 1 ;e 2 ;:::;e n ),if v 0 isa4-valentvertexand e n 2 E ( v 0 ),satisfyingthefollowingconditions: (1) v 0 ;:::;v n isasequenceofasubsetofverticesof,withpossiblerepet ition ofthesamevertexatmosttwice, (2) f e 1 ;:::;e n g isasetofdistinctedges,and (3)each e i isnotaneighborof e i 1 withrespecttotherigidvertex v i 1 i =2 ;:::;n ,andinthecasewhere v 0 isa4-valentvertex, e 1 isnotaneighborof e n withrespecttotherigidvertex v 0 Atransversepathcanbeconsideredasanimageofamapfromth eunit interval[0 ; 1]to,wheretheimageoftheboundarypoints( f 0 g[f 1 g )consists ofeithertwoendpointsof,orasingle4-valentvertex. 67

PAGE 80

Denition5.2.6 Twotransversepathswithendpointsare equivalent iftheyare eitheridentical,or,oneisthereverseoftheother.Twotra nsversepaths r;r 0 withoutendpointsare equivalent iftheyhavethesamecyclicorder: r cyc = r 0 cyc Denition5.2.7 Anassemblygraphiscalled simple ifthereisatransverse Eulerian pathin,meaning,thereisatransversepath r thatcontainseveryedge fromexactlyonce. Withthefollowingtwolemmasweshowthat: Inasimpleassemblygraph,thereisauniqueequivalencecla ssoftransverse Eulerianpaths(Lemma5.2.8). Thesimpleassemblygraphsareuniquelydetermined(uptois omorphism) bythetransverseEulerianpath(Lemma5.2.9). Lemma5.2.8 Inasimpleassemblygraph,thereisauniqueequivalencecla ssof transverseEulerianpaths.Proof. Bythedenitionoftransversepaths,anEulerianpath r inasimple assemblygrapheitherstartsandendswithtwoendpoints,o rnoneofthevertices in r areendpoints.If r hastwoendpoints,theneveryEulerianpathinhasto startandendattheendpoints.Startingfromanendpoint,fo llowingitsincident edge,ateveryconsecutivevertexthereisonlyoneedgethat isnotaneighborof theedgejustpassed.HencethereisonlyonetransverseEule rianpathstarting atoneendpoint.Sincethereverseofthispathisatransvers eEulerianpathfrom theotherendpoint,thetwotransverseEulerianpathsareeq uivalent. Nowconsiderthecasewhenandtherefore r hasnoendpoints.Wewillshow thateveryEulerianpathinisequivalentto r ,meaning,itbelongsto r cyc .Let r =( v 0 ;e 1 ;v 1 ;e 2 ;:::;e n ).Sincenoneoftheverticesareendpoints,allofthemare 4-valent.Therefore,eachvertex v appearsin r exactlytwice.Denotethepositions of v in r by i and j (i
PAGE 81

r =( 1 ;v i ; 2 ;v j ; 3 ).TherearefourpossibilitiesforanEulerianpathtostart at v = v i = v j choosingeitheroneoftheedges e i ;e i +1 ;e j ;e j +1 .Notethatineither oneofthesechoicesthesuccessiveedgesareuniquelydeter mined,sinceateach vertexthereisonlyonenon-neighboredgetotheedgejustpa ssed.Hencethe choiceoftherstedgeuniquelydeterminestheEulerianpat h.Thereforethere areonlyfourEulerianpathsthatstartat v :( v i ; 2 ;v j ; 3 ; 1 ),( v i ; R 1 ; R 3 ;v j ; R 2 ), ( v j ; 3 ; 1 ;v i ; 2 ),and( v j ; R 2 ;v i ; R 1 ; R 3 )andallfourofthesepathsbelongto r cyc 2 Lemma5.2.9 Let =( V;E ) and 0 =( V 0 ;E 0 ) betwosimpleassemblygraphs with r and r 0 theirtransverseEulerianpaths,respectively.Then and 0 are isomorphicifandonlyifthereisamap =( v ; e ): 0 withbijections v : V V 0 and e : E E 0 suchthat ( r ) isequivalentto r 0 Proof. Ifand 0 areisomorphicbyanisomorphism,then( r )isatransverse Eulerianpathin 0 ,andbyLemma5.2.8, r 0 and( r )areequivalent.Conversely, thepairofbijections=( v ; e )providesagraphisomorphismasthetransverse Eulerianpath r mapstoatransverseEulerianpath( r ).Henceitsucestoshow thattherigidityoftheverticesispreserved: ( v ( v ) ; ( e ( e 1 ) ; e ( e 2 ) ; e ( e 3 ) ; e ( e 4 )) cyc )=( v ( v ) ;E cyc ( v ( v ))) foreveryvertex v in.Let v bea4-valentvertexin,andlet r =( v 0 ;e 1 ;v 1 ;e 2 ;:::;e n ; ( v n )) : Thenthereare i;k 2f 1 ; 2 ;:::;n g suchthat v = v i = v k .Then v ( v )= v ( v i )= v ( v k )andwehave E cyc ( v )=( e i 1 ;e k 1 ;e i ;e k ) cyc and E cyc ( v ( v ))= ( e ( e i 1 ) ; e ( e k 1 ) ; e ( e i ) ; e ( e k )) cyc ,therefore ( v ( v ) ; ( e ( e 1 ) ; e ( e 2 ) ; e ( e 3 ) ; e ( e 4 )) cyc )=( v ( v ) ;E cyc ( v ( v ))) holds. 2 69

PAGE 82

Nextweconsiderdirectedassemblygraphsandwedenetheir composition. Givenasimpleassemblygraphwithtwoendpoints,chooseon eofthemtobe initial( i )andtheothertobeterminal( t ).Wecallsucha directed simple assemblygraphwithdirectionfrom i to t .Weconsiderthetransversepathofa directedsimpleassemblygraphasapathstartingatthevert ex i andterminating atthevertex t Denition5.2.10 A composition 1 2 oftwo(directedsimple)assemblygraphs 1 and 2 isthedirectedsimpleassemblygraphobtainedbyidentifyi ngtheterminalvertexof 1 withtheinitialvertexof 2 Notethatinitialvertexof 1 2 istheinitialvertexof 1 andterminalvertex of 1 2 istheterminalvertexof 2 Denition5.2.11 If= 1 2 forsomenon-trivialdirectedassemblygraphs 1 and 2 ,theniscalled reducible .Otherwiseitiscalled irreducible Example5.2.12 EachgraphinFigure5.4isanassemblygraph. 1.Theassemblygraphs 1 and 2 inFigure5.4arenotisomorphicasassemblygraphs,eventhoughtheyareisomorphicasgraphs.Na mely,the rigidvertex v 3 hasacyclicorder( e 3 ;e 6 ;e 4 ;e 7 ) cyc in 1 andacyclicorder ( e 3 ;e 6 ;e 7 ;e 4 ) cyc in 2 .Since( e 3 ;e 6 ;e 4 ;e 7 ) cyc 6 =( e 3 ;e 6 ;e 7 ;e 4 ) cyc ,wehave ( v 3 ; ( e 3 ;e 6 ;e 4 ;e 7 ) cyc ) 6 =( v 3 ; ( e 3 ;e 6 ;e 7 ;e 4 ) cyc ). 2.Theassemblygraph 1 isasimpleassemblygraph.ThetransverseEulerianpathis:( v 1 ;e 1 ;v 2 ;e 2 ;v 2 ;e 3 ;v 3 ;e 4 ;v 4 ;e 5 ;v 4 ;e 6 ;v 3 ;e 7 ;v 5 ).Theassembly graphs 3 and 4 aresimpleassemblygraphsaswell. 3.Theassemblygraph 2 isnotasimpleassemblygraph.Thereisnosingletransversepaththatcontainseveryedgeexactlyonce.T herearetwo trasversepaths:( v 1 ;e 1 ;v 2 ;e 2 ;v 2 ;e 3 ;v 3 ;e 7 ;v 5 )and( v 3 ;e 4 ;v 4 ;e 5 ;v 4 ;e 6 ;v 3 ). 4.Theassemblygraph 1 isreducible,since 1 = 4 3 .Theassemblygraphs 3 and 4 areirreducible. 70

PAGE 83

v 1 v 2 v 3 v 4 v 5 e 4 e 1 e 2 e 3 e 5 e 6 e 7 v1v2v3v4v5e4e1e2e3e5e6e7 G 1 G 2 G 4 G 3 Figure5.4:AssemblyGraphs5.3PolygonalPathsandHamiltonianSets Inthissectionwedenepolygonalpathsinassemblygraphsa samodelfora singlemacronucleargene.Therefore,apolygonalpathrepr esentsanorderedMDS sequence( MDS 1 MDS 2 MDS 3 MDS n ),whereeachedgecorresponds toanMDS.Theverticesofapolygonalpathwheretheedgesmee tmodelthe pointersequenceswhichguidetheassemblyofthegene.Apol ygonalpathmakes \90degreeturn"ateveryvertex,suchthattheconsecutivee dgesofthepolygonal pathareconsecutiveMDSs(say,MDS i andMDS i +1 ).Thus,apolygonalpath determinesasmoothingofthe4-valentvertices.Thecomple teassemblyofa singleMACgenerequiresrecombinationofallpairsofpoint ersequences,i.e., smoothingofall4-valentverticesintheassemblygraph.Th erefore,weconsider setsofpolygonalpathsthatvisiteveryvertexintheassemb lygraphonce.We callsuchsetsHamiltonian. 71

PAGE 84

RecentstudiesshowthatamicronuclearDNAsequencemaycon tainscrambled MDSportionsofseveralgenes,orevenasinglegenemayappea rondierentloci [34].Thus,asetofpolygonalpathswhichcoverallvertices inanassemblygraph modelsthesetofcorrespondingscrambledmacronucleargen espresentintheDNA molecule.Denition5.3.1 Letbeanassemblygraph.An openpath inisahomeomorphicimageoftheopeninterval(0 ; 1)in. Anopenpathisalsorepresentedbyasequence: ( e 1 n v 0 ) ;v 1 ;e 2 ;v 2 ;e 3 ;:::;v m 1 ;e m ;v m ; ( e m +1 n v m +1 ) ; where v i 'sareverticesinfor i 2f 1 ; 2 ;:::;m g suchthat v i 6 = v j when i 6 = j e i 's areedgesinfor i 2f 1 ; 2 ;:::;m g withendpoints v i 1 and v i respectively,such thattheinitialvertexof e 1 (andpossiblypartof e 1 )andtheterminalvertexof e m +1 (andpossiblypartof e m +1 )arenotincludedanditisdenotedby( e 1 n v 0 ) and( e m +1 n v m +1 ),respectively.Wesaythattheopenpathisa cycle if e 1 = e m +1 Twoopenpathsare disjoint iftheydonothaveavertexincommon. Denition5.3.2 Asetofpairwisedisjointopenpaths f r 1 ;:::;r k g iniscalled Hamiltonian iftheirunioncontainsall4-valentverticesof.Anopenpa th r is called Hamiltonian iftheset f r g isHamiltonian. Denition5.3.3 A polygonal pathisanopenpath r : ( e 1 n v 0 ) ;v 1 ;e 2 ;:::;v m 1 ;e m ;v m ; ( e m +1 n v m +1 ), suchthat e i and e i +1 areneighborsforevery i 2f 1 ; 2 ;:::;m g Hamiltonianpolygonalpathsareofspecialinterest,since theymodelthecorrectorderoftheMDSsinthemacronuclearDNA.Example5.3.4 InFigure5.5(A)anassemblygraphisgivenwithHamiltonian polygonalpath r .Twoopenpaths, r 1 and r 2 ,aredepictedintheassemblygraph 72

PAGE 85

3 4 5 6 7 8 9 g 2 g1 g2 3 4 5 6 7 8 9 2 (A) (B) Figure5.5:PolygonalPathsandHamiltonianSetsinFigure5.5(B).Thepath r 1 isapolygonalpath,butnotaHamiltoniansince itdoesnotvisitallverticesoftheassemblygraph.Thepath r 2 doesnotvisit everyvertexoftheassemblygraphandalsoatvertex5connec tsnonneighboring edges.Therefore, r 2 isneitherpolygonal,norHamiltonian. 5.4AssemblyNumbers Denition5.4.1 Letbeanassemblygraph.The assemblynumber of,denotedbyAn(),isdenedbyAn()=min f k j thereexistsaHamiltonianset ofpolygonalpaths f r 1 ;:::;r k g in g Asmentionedearlier,anassemblygraphmodelsamicronucle arDNAsequence, whileeachpolygonalpathintheassemblygraphcorresponds toamacronuclear gene.Therefore,theassemblynumberrepresentstheminima lnumberofmacronucleargenesthatcanbeobtainedaftercompleterearrangeme ntofthemicronuclear DNAsequence. Motivatedbyrealizablewordsdiscussedin[19],asimpleas semblygraphis called realizable ifAn()=1,otherwiseitiscalled unrealizable Example5.4.2 TheassemblygraphinFigure5.5(A)isrealizable. Example5.4.3 Theassemblygraph 1 depictedinFigure5.6(left)hasaHamiltonianpolygonalpath r .Therefore,An( 1 )=1and 1 isrealizable.Theassemblygraph 2 inFigure5.6(right)hasaHamiltoniansetoftwopolygonalp aths 73

PAGE 86

G 1 G 2 g g 1 g 2 Figure5.6:Theassemblygraph1hasassemblynumber1(left).Theassemblynumber of2is2(right).f r 1 ;r 2 g .Byexhaustivesearchonecandeducethatthereisnosinglep olygonal Hamiltonianpathin 2 ,andthereforeAn( 2 )=2.Hence, 2 isunrealizable. Lemma5.4.4 Foreachpairofdirectedsimpleassemblygraphs 1 and 2 ,one ofthefollowingequalitieshold: An( 1 2 )=An( 1 )+An( 2 ) ,or An( 1 2 )= An( 1 )+An( 2 ) 1 Proof. LetAn( 1 )= k 1 andAn( 2 )= k 2 .ThereareHamiltoniansetsofpolygonal paths S 1 = f 1 ; 2 ;:::; k 1 g for 1 and S 2 = f 1 ; 2 ;:::; k 2 g for 2 ,respectively. Then S 1 [ S 2 = f 1 ; 2 ;:::; k 1 ; 1 ; 2 ;:::; k 2 g isaHamiltoniansetofpolygonal pathsfor 1 2 .Hence An( 1 2 )= k k 1 + k 2 =An( 1 )+An( 2 ) : Let S beaHamiltoniansetofpolygonalpathsfor 1 2 withminimalnumber ofcomponents(i.e., S consistsof k distinctpaths).As 1 2 isobtainedby identifyingoneendpointfrom 1 andoneendpointof 2 S containsatmostone paththatincludesverticesfromboth 1 and 2 .If S doesnotcontainapath thatincludesverticesfromboth 1 and 2 ,thenthereareatleast k 1 pathsof S thatcontainallverticesof 1 1 2 ,andatleast k 2 pathsfor 2 1 2 ,so that k k 1 + k 2 ,andweobtain k = k 1 + k 2 .Suppose S containsapath r that includesverticesbothfrom 1 and 2 .Let r = r 1 r 2 where r i containsvertices from i only,andlet S i S bethepathsthatvisitverticesfrom i only( i =1 ; 2). Each S i containsatleast k i 1componentssince S i [f r i g isaHamiltonianset 74

PAGE 87

ofpolygonalpathsfor i .Henceweobtain k ( k 1 1)+( k 2 1)+1.Therefore inthiscase, k = k 1 + k 2 1aswell. 2 Example5.4.5 Theassemblygraph 2 depictedinFigure5.6isreducible.Namely, 2 = 1 1 .ByLemma5.4.4,An( 2 )=An( 1 1 )=An( 1 )+An( 1 )=2, orAn( 2 )=An( 1 1 )=An( 1 )+An( 2 ) 1=1.Byinspection,likeinthe proofofLemma5.4.4,wehaveAn( 2 )=2. Remark5.4.6 WenotethatAn( 1 2 ) 6 =An( 2 1 ),ingeneral.Fortake 1 and 2 inFigure5.7,thenAn( 1 2 )=2andAn( 2 1 )=1. G 1 G 2 i t i t Figure5.7:Assemblygraphs1and2suchthatAn(1 2) 6 =An(2 1).Corollary5.4.7 j An( 1 2 ) An( 2 1 ) j 1 Proof. ByLemma5.4.4,eachofAn( 1 2 )andAn( 2 1 )isequaltoAn( 1 )+ An( 2 )orAn( 1 )+An( 2 ) 1,hence j An( 1 2 ) An( 2 1 ) j 1. 2 Corollary5.4.8 Ifoneof 1 or 2 isnotrealizable,thenboth 1 2 and 2 1 arenotrealizable.Proof. Withoutlossofgeneralityassumethat 2 isnotrealizable.Then,An( 2 ) 2.ByLemma5.4.4,eachofAn( 1 2 )andAn( 2 1 )isequaltoAn( 1 )+An( 2 ) orAn( 1 )+An( 2 ) 1,henceAn( 1 2 ) 2andAn( 2 1 ) 2.Therefore, both 1 2 and 2 1 arenotrealizable. 2 75

PAGE 88

Proposition5.4.9 Foranypositiveinteger n ,thereexistsanassemblygraph suchthat An()= n Proof. Let n beapositiveinteger.Let 1 betheassemblygraphwithtwoend pointsthatisdepictedinFigure5.6.Considerthe n -foldcomposition n = 1 1 of 1 ,showninFigure5.8.WehaveAn( 1 )=1andAn( 2 )=2, where 2 = 1 1 asshowninFigure5.6.Thenbyinductionon n andusinga similarargumentasintheproofofLemma5.4.4,itfollowsth atAn( n )= n 2 n copies Figure5.8:Assemblygraphwithassemblynumber n .5.5WordsandAssemblyGraphs Inthissectionweconsiderformalwordsandtheirconnectio ntoassemblygraphs. Inparticular,weestablishaconventionofrepresentingsi mpleassemblygraphsby wordsmotivatedbytheGausscoderepresentationof(virtua l)knotdiagrams.For avirtualknotdiagramwithlabeledclassicalcrossings,aG ausscodeisasequence ofcrossinglabelstoindicateawalkalongthediagramfroma givenstartingpoint andreturningtothatpoint(see[31]).Eachsymbolisrepeat edexactlytwiceina Gausscode,sincethewalkcontainstwoencountersofeachcr ossing.Similarly,if wechooseabasepointinasimpleassemblygraph,thentrave lingfromthe basepointalongtheEulerianpathandreturningtothebasep ointonecanobtain asequenceofvertexlabels.Werefertosuchsequenceasawor dthatcorresponds to.Eachsymbolthatappearsinawordthatcorrespondstoso mesimple assemblygraphappearsexactlytwice,sowedenedoubleocc urrencewords. 76

PAGE 89

Denition5.5.1 Aword w overanalphabetiscalleddoubleoccurrenceword ifeverysymbolthatappearsin w appearsexactlytwice. Foraword w = w 1 w 2 w n ,wecontinuetousethenotation w cyc denoting ( w 1 ;w 2 ;:::;w n ) cyc ,and w rev for f w;w R g where w R = w n w 2 w 1 .Wesaythat v is reverseequivalent to w ( v rev w )if v rev = w rev ,and v is cyclicequivalent ( v cyc w )to w if v cyc = w cyc .Intherestoftheexpositionwealwaysworkwith anequivalenceclassrepresentativeofeither rev or cyc Lemma5.5.2 ( a ) Thereisaonetoonecorrespondencebetweentheisomorphism classesofsimpleassemblygraphswithtwoendpointsandthe reverseequivalence classofdoubleoccurrencewords. ( b ) Thereisaonetoonecorrespondencebetween theisomorphismclassesofsimpleassemblygraphswithnoen dpointsandthecyclic equivalenceclassesofdoubleoccurrencewords. TheaboveLemma5.5.2iswellknowninknottheoryasamethodo frepresentingknotdiagramsbyGausscodes.Herewebrierydescr ibethebijective correspondence. Letbeasimpleassemblygraphwithtwoendpoints,andlet r =( v 0 ;e 1 ;v 1 ;e 2 ;:::;e n ;v n ) beatransverseEulerianpathin.Thenthecorrespondingdo ubleoccurrence wordisdenedtobe w = v 1 v n 1 ,wheretheverticesareregardedassymbols. Inthecaseofasimpleassemblygraphswithnoendpointsandw ithatransverse Eulerianpath r =( v 0 ;e 1 ;:::;e n ),thecorrespondingdoubleoccurrencewordis v 0 v 1 v n 1 .Conversely,foragivendoubleoccurrenceword a 1 a n oversome alphabet,wecanconstructasimpleassemblygraphbyconne ctingverticeslabeledby a 1 ;:::;a n consecutivelyfrom a 1 to a 2 a 2 to a 3 ,andsoon,inatransverse manner,i.e.,ateveryvertextheoutgoingedgeisnotaneigh boroftheincoming edge.Similarlyonecanobtainanassemblygraphwithnoendp ointsbyadding anedgefrom a n to a 1 77

PAGE 90

Fordoubleoccurrenceword w ,thecorrespondingassemblygraphisdenoted by w .Whether w isassemblygraphwithtwoendpointsornoendpointswould bestatedwithinthecontext.Example5.5.3 Therearethreereverseequivalencesfordoubleoccurrence words inalphabet= f a;b g thatcorrespondtothreeequivalenceclassesofsimple assemblygraphswithtwoendpoints,representedbydoubleo ccurrencewords aabb abab and abba .Thereareonlytwocyclicequivalencesfordoubleoccurren ce wordsinalphabet= f a;b g thatcorrespondtotwoequivalenceclassesofsimple assemblygraphswithnoendpoints.Theyarerepresentedbyw ords aabb and abab Denition5.5.4 Ifadoubleoccurrenceword w canbewrittenasaproduct w = uv oftwonon-emptydoubleoccurrencewords u;v ,then w iscalled reducible otherwiseitiscalled irreducible Notethatif w correspondstoanassemblygraphwithnoendpoints,then every v 2 w cyc isalsoarepresentationofandtherefore,thenotionof\re ducible" wordsand/orgraphsdoesnotapply.Forexample, abba cyc bbaa Intherestofthesection,weusedoubleoccurrencewordstod escribeandshow somepropertiesoftheassemblygraphsandtheassemblynumb ers. WeproveTheorem5.5.5,whichisanextensionofTheorem5.4. 9usingdouble occurrencewordrepresentationofassemblygraphs.Proposition5.5.5 Foranypositiveinteger n ,thereexists (i) areducibleassemblygraph suchthat An()= n (ii) anirreducibleassemblygraph suchthat An()= n ,and (iii) anassemblygraph withnoendpointssuchthat An()= n Proof. (i)Let n beapositiveinteger.Let u betheassemblygraphwith twoendpointsthatcorrespondstothedoubleoccurrencewor d u = v 1 v 2 v 2 v 1 v 3 v 3 (thisisthegraph 1 depictedinFigure5.6).Considerthe n -foldcomposition 78

PAGE 91

0n = u u of u .Let w n bethedoubleoccurrencewordthatcorrespondsto 0n ,andlet n bethesimpleassemblygraphwithtwoendpointsthatcorresp onds totheword v 0 v 0 w n Thenbyinductionon n andusingasimilarargumentasintheproofof Lemma5.4.4,itfollowsthatAn( n )= n +1.(ii)Toprovethepropositioninanirreduciblecase,observethatitissucientto\nd" n describedaboveembedded inanirreduciblegraph.Considerthedoubleoccurrencewor d zuz where u isthe doubleoccurrencewordthatcorrespondsto n .Let zuz betheassemblygraph thatcorrespondsto zuz .Clearly, zuz isirreducible,andAn( zuz )=An( n )as theHamiltoniansetfor zuz canbeobtainedfromtheHamiltoniansetfor n by addingthevertex z totheappropriatepathinHamiltoniansetfor n (iii)Thecaseofassemblygraphwithnoendpointsfollowsdi rectly,byobservingthatthegraph n +1 withthetwoendpointsidentiedhasassemblynumber n 2 Denition5.5.6 Forapositiveinteger n ,wedene minimalrealizationnumber for n tobe R min ( n )=min fj j :An()= n g ,where j j isthenumberof4-valent verticesin.Agraphsuchthat R min ( n )= j j iscalled arealizationof R min ( n ). Basically,theminimalrealizationnumberfor n isthe\size"ofthesmallest graphwithassemblynumber n Example5.5.7 Throughcasebycaseinspection,wendthatAn()=1forall assemblygraphswith j j 3,andthegraph 2 inFigure5.6hasAn()=2 and j j =4,sowehave R min (1)=1and R min (2)=4. Next,wederiveassemblygraphsfromagivenonebyvertexdel etion,asdescribedinDenition5.5.8.Weusethisdenitionintheproo fofProposition 5.5.9.Denition5.5.8 Let 1 =( V 1 ;E 1 )beanassemblygraph.Wedenetheassemblygraph 2 obtainedfrom 1 bydeletingavertex v 2 V 1 asfollows.Let r =( v 0 ;e 1 ;v 1 ;e 2 ;:::;e n ;v n )beatransverseEulerianpathin 1 .Dene 2 = 79

PAGE 92

( V 2 ;E 2 )bysetting V 2 = V 1 f v g ,and E 2 =( E 1 f e i ;e k ;e i +1 ;e k +1 g ) [f l i ;l k g where v i = v k = v ( i +1 6 = k ), l i isanewedgefrom v i 1 to v i +1 ,and l k isanew edgefrom v k 1 to v k +1 .Theorderofthenewedges l i and l k atvertices v i 1 ;v i +1 and v k 1 ;v k +1 areinheritedfromtheedges e i ;e i +1 and e k ;e k +1 respectively,such thatlocallytherigidvertices v i 1 ;v i +1 ;v k 1 ,and v k +1 remainthesame.Incase i +1= k ,thenonlyonenewedgeconnecting v i 1 with v i +2 = v k +1 isaded.The newedge l i isdenotedby l i = \ e i v i e i +1 .SeeFigure5.9. Itisclearthat 2 doesnotdependonthechoiceof r i k ii+1 k k+1e vi-1 k+1 k-1 i+1 k+1 k-1v v v v e e ei+1v vi-1v v Figure5.9:Removingavertex v = vi= vk,thecasewhen i +1 6 = k .Proposition5.5.9 Thefollowingpropertiesholdfor R min (i) Foreverypositiveinteger n R min ( n )
PAGE 93

containing v andpartsoftwoneighboringedgesin 1 ,aHamiltoniansetofpaths with m +1componentsfor 1 iscreated.But, m +1
PAGE 94

2 = e 03 v 3 r 3 aresuchthat e 04 isapropersegmentof e 4 withboundaryvertex v 4 ,and e 03 isapropersegmentof e 3 withboundaryvertex v 3 .When r and r 0 areseparate components, 0 = nf r;r 0 g[f 1 ; 2 ; 3 g with 1 = e 04 v 4 r 4 2 = r 1 v 1 e 1 ve 2 v 2 r 2 and 3 = e 03 v 3 r 3 .Because 1 isarealizationof R min ( n +1),itisnotpossibleto have m
PAGE 95

u 2345 6 1 7 2345 6 1 7 G w G G v Figure5.10:Unrealizablegraphs.tovisitallverticesin w beforeitexiststhroughvertex7.IfaHamiltonianpath startsin w ,thenrstithastovisiteveryvertexin w andthenitexits w throughvertex7.Since u isnotrealizable,therestofthepathcannotvisitevery vertexin f 1 ;:::; 7 g andthereforecannotvisiteveryvertexin v .Ifthepath startsoutsideof w ,inordertovisitverticesof w ,thepathmustpassvertex 7.However,since u isnotrealizable,thepathcannotvisiteveryvertexfrom f 1 ;:::; 7 g beforevistingavertexin w .Henceforanyrealizable w ,thereexists anunrealizableirreducible v suchthat w v and j v jj w j < 8. 2 Corollary5.5.11 Thereexistsaconstant N suchthatforany w with An( w )= k forsomepositiveinteger k ,thereexistsanirreducibledoubleoccurrenceword v with w v ,suchthat An( v )= k +1 ,and j v jj w j
PAGE 96

(i)Foreveryrealizableassemblygraph,thereisanunreal izablewordwithina radiusof8from.(ii)Foreveryassemblygraph,thereisan unrealizablegraph withinradius8from.(iii)Foreveryrealizablegraph,th ereisanassembly graphwithassemblynumber k withinradius3+5( k 1)from. Proposition5.5.13 Foranyunrealizabledoubleoccurrenceword v with j v j = m thereexistaconstant N ( m ) andarealizabledoubleoccurrenceword w suchthat v w ,and j w jj v j N ( m ) Proof. Let v beanunrealizabledoubleoccurrenceword,andlet v betheassembly graphthatcorrespondsto v .IfAn( v )= k ,thenthereisaHamiltoniansetof polygonalpaths S = f r 1 ;r 2 ;:::;r k g for v .Let v i 1 and v i 2 betheinitialand terminalverticesvisitedbythepath r i for i 2f 1 ; 2 ;:::;k g .Construct w from v asfollows:addasymbol p i aftertherstappearanceof v i 1 andbeforethe rstappearanceof v ( i +1)2 in v for i 2f 1 ; 2 ;:::;k 1 g suchthat p i 6 = p j when i 6 = j and p i 'saredierentfromanysymbolin v .Byconstructionwehavethat v w .Notethatin w ,foreach i ,thepath v i 2 p i v ( i +1)2 ispolygonal,andtherefore, thepath r 1 ;v 12 ;p 1 ;v 21 ;r 2 ;v 22 ;p 2 ;v 31 ;r 3 ;:::;v 2( k 1) ;p k 1 ;v 1 k ;r k isaHamiltonian polygonalpathin w .So w isrealizable.Notethatthereare m = k 1new symbolsaddedin w ,hencewecantake N ( m )= k +1. 2 5.6SmoothingsofAssemblyGraphs Inthissectionwedenesmoothingofa4-valentvertexindir ectedassemblygraph tomodeltheprocessofDNArecombination.The4-valentvert icesinanassembly graphrepresentapairofalignedpointersequences,prepar edforrecombination. Therecombinationasaresultofsplicingisrepresentedasa smoothingofthe vertex.Ifthepointersequencesarealignedinparallel,th enthesmoothingofthe vertexfollowsthepredetermineddirectionofthegraph,wh ichwecallparallel smoothing.Ifthealignmentisantiparallel,thenthesmoot hingofthevertexis performedoppositethedirectionofthegraph,callednon-p arallelsmoothing.A 84

PAGE 97

smoothingofa4-valentvertexcanbeseenasaremovalofthev ertexanditsneighborhood,followedbyattachingtwoparallelarcsasshownin Figure5.11.Ifthe smoothingisorientationpreservingthenitis parallelsmoothing (or p -smoothing asinFigure5.11left).Otherwise,thesmoothingis non-parallelsmoothing (or n -smoothing asinFigure5.11right).Althoughwestartwithadirectedsi mple assemblygraph,if n -smoothingsareperformedatsomeofthevertices,thereis nodirectiondenedfortheresultinggraph. n-smoothing p-smoothing Figure5.11:Twotypesofsmoothings,parallel( p -)smoothing(left)andnon-parallel ( n -)smoothing(right).Onecanseesmoothingatavertex v asasplittingofthevertex v totwo2valentvertices,suchthatthepairsoftheedgesincidentto thesamevertexafter thesmoothingareneighborsat v Asmentionedearlier,apolygonalpathcanpredeterminethe typeofasmoothingatavertex.Therefore,wedeneasmoothingofavertexwi threspecttoa polygonalpath.Letbeanassemblygraphand r apolygonalpathin. g v g e 1212 ee e v Figure5.12:Smoothingatavertex v withrespecttoapolygonalpath^ r .Denition5.6.1 Asmoothingofa4-valentvertex v inwithrespectto r isa smoothingsuchthattheneighboringedgestraversedby r i at v remainconnected afterthesmoothing(seeFigure5.12). A smoothingof withrespecttoaHamiltoniansetofpolygonalpaths r = f r 1 ;:::;r k g isthegraphdenoted ~ r obtainedaftersmoothingofevery4-valent vertex v withrespecttothepathfrom r thatvisits v 85

PAGE 98

Notethatthesmoothing ~ r consistsofarcsandclosedcurves(topologically, circles). G gv 2 v 3 v 4 v 5 v 1 e 1 e 2 e 3 e 4 e 5 e 6 e 7 G g gv 2 v 3 v 4 v 5 v 1 e 1 e 2 e 3 e 4 e 5 e 6 e 7 v 4 v 3 v 2 Figure5.13:Smoothingtheassemblygraph(left)withresp ecttotheHamiltonian polygonalpath r .Theresultissmoothedgraph ~ r(right).Example5.6.2 TheassemblygraphgiveninFigure5.13hasHamiltonian polygonalpath r .Aftersmoothingofeachvertexinwithrespectto r we obtaintheassemblygraph ~ r giveninFigure5.13(right).Thegraph ~ r hastwo components;onecircularandonelinearcomponent.TheHami ltonianpolygonal path r remainsconnectedaftersmoothinganditbelongstotheline arcomponent. Remark5.6.3 Letbeanassemblygraph, r = f r 1 ;:::;r k g beaHamiltonian setofpolygonalpaths,and ~ r bethesmoothingofwithrespectto r .Since foreach i =1 ;:::;k ,neighboringedgesof r i remainconnectedaftersmoothing,a pathcorrespondingto r i isnaturallydenedin ~ r asdepictedbyathickdotted lineintheupperarcinFigure5.12totheright.Wedenotethe sepathsusing thesamenotation f r 1 ;:::;r k g wheneveritisclearfromthecontext.Hence,after smoothing,thenewsetofpaths r = f r 1 ;:::;r k g isasubsetof ~ r .Furthermore, oneacharcorcirclein ~ r ,theedgesandvertices(their\traces,"or\scars") canberegardedaskeptintact,asdepictedinFigure5.12int heright.Wealso continuetousethesamesymbolsfortheseverticesandedges whenregardedas subsetsofarcsandcirclesin ~ r andcallthem verticesandedgesinheritedfrom WeviewtheassemblygraphasaspatialrepresentationofaDN Amolecule 86

PAGE 99

duringrecombinationevents.Namely,anassemblygraphisa 2-dimensionalprojectionofa3-dimensionalDNAstructure.Therefore,theas semblygraphsare notplanar,andtheiredgesmightintersect.Theseintersec tionsaredepicted withover-andunder-informationasinknotdiagrams(see,f orexample,[31])and theyarenotverticesofthegraph.Thewaythecrossinginfor mationisdepicted dependsonthewaythemoleculeissituatedinspace. 2 3 4 i t Figure5.14:AssemblygraphthatcorrespondstoMDS2 IES1 MDS4 IES2 MDS3 IES3 MDS1.Example5.6.4 Considertheassemblygraphthatisaphysicalrepresentat ion tothepossible\micronucleargenestructure"givenby:MDS 2 IES 1 MDS 4 IES 2 MDS 3 IES 3 MDS 1 .Thegraphhasthreevertices,labeledby2,3 and4(oneforeachpairofpointers)anditcorrespondstothe doubleoccurrence word234342.Furthermore,isnotplanar,sinceitcannotbe drawnwithout edgecrossing,seeFigure5.14. (C) (A) (B) Figure5.15:TwoHamiltonianpolygonalpathsresultingind istinctsmoothedembeddings.Weobservethatsmoothingsgeneratedbydierentsetsofpol ygonalpaths 87

PAGE 100

mightleadtodistinctspatialembeddingsoftheresultingg raph.Someofthe resultingstructuresmightbeknottedorlinked.Example5.6.5 Thereexistsanembeddingofasimpleassemblygraphin R 3 andtwoHamiltonianpolygonalpaths r 1 and r 2 ,suchthatforthecorrespondingsmoothings ~ r 1 and ~ r 2 aredistinct,seeFigure5.15.Notealsothatthis providesanexampleofanassemblygraphwithtwonon-equiva lentHamiltonian polygonalpaths. IftherecombinedDNAmoleculesareknottedorlinked,thent heymightnot befunctional.Therefore,westudytheembeddingsofanasse mblygraphthat resultinanunlinkaftersmoothing. Namely,weshowthateveryassemblygraph,givenwithaHami ltoniansetof polygonalpaths r ,canbeembeddedin R 3 suchthat ~ r isunlinked.Thisimplies thatforeverysetofrecombinationsofagivenDNAsequence, themoleculecanbe positionedinspacesuchthatafterallrecombinationsarep erformed(someorall simultaneously),theresultingsequenceencodingafuncti onalgenecan\easily" beidentiedandseparatedfromtheotherpieces.Proposition5.6.6 Foranyassemblygraph andaHamiltoniansetofpolygonal paths r = f r 1 ;:::;r k g ,thereisanembeddingof in R 3 suchthat ~ r isunlinked. Proof. Consideragenericmapfromtotheplane R 2 .Thediagram,thatis animageofunderthegenericmap,asidefromthe4-valentve rticesmight containverticesasaresultoftransversedoublepoints.We needtodeterminethe crossinginformationforeachsuchtransverseintersectio n,sothattheresultcan beregardedasadiagramofanembeddingofin R 3 .Totheimageofin R 2 correspondsanimageof ~ r in R 2 suchthatthe4-valentverticesofaresmoothed andthetransversedoublepointsfromtheimageofcoincide withthetransverse doublepointsof ~ r .Denotethe(circular)componentsof ~ r by c 1 ;c 2 ;:::;c ` Sincesmoothingsareperformedlocallyateach4-valentver tex, f c 1 ;c 2 ;:::;c ` g canberegardedasimmersed,(closed)planecurves.Chooseb asepoints b 1 ;:::;b ` 88

PAGE 101

on c 1 ;:::;c ` ,respectively.Eachcomponent c i istracedinonedirection,starting frompoint b i .Thecrossinginformationattransversedoublepointsisgi veninan descendingorder;thearcvisitedrstistheover-arc,andt hearcvisitedsecond timeisunder-arc.Ifacrossingisbetweentwocomponents c i and c j for i
PAGE 102

1 0101 00 1 1 0 1 (B) (A) 0 0 0 1 1 Figure5.16:(A)Propercoloringofavertexandsmoothingco rrespondingtotheproper coloring(Kauman[32]).(B)Anexampleofapropercoloring foranassemblygraph \representing"thetrefoilandtheresultingcomponentsaf terthesmoothing.totheassemblynumberoftheassemblygraph.Denition5.6.7 Let C bethesetofpropercoloringsforanassemblygraph withoneormoretransversecomponentsandnoendpoints.Giv enacoloring 2C let 0 and 1 denotethenumberofcomponentscolored0,respectively 1,aftersmoothingcorrespondingto .Theminimalnumberofcomponentsof smoothedcircleswithonecolor,isdenedtobe K min =min 2C f 0 ; 1 g Lemma5.6.8 Foranyassemblygraph withnoendpoints, An() K min () Otherthanthisinequality,however,thesenumbersareinde pendentinthe followingsense.Proposition5.6.9 Foranypositiveinteger m and n with m n ,thereisan assemblygraph withnoendpointssuchthat An()= m and K min ()= n TheseresultsshowthatthesmoothingsthroughHamiltonian setsofpolygonal pathsandthesmoothingsbypropercoloringsdenedierent operationsonthe assemblygraphs.Formoredetails,see[4]. 90

PAGE 103

6SimultaneousDNArearrangementsthroughAssemblyGraphs Intheprocessofgenerearrangementanumberofpointersmig htrecombineatthe sametime.Therefore,inthischapterwestudysimultaneous smoothingofsetof verticesinanassemblygraph.Inparticular,inourmodelof assemblygraphaset ofsimultaneousrecombinationstranslatesintoasetofsim ultaneoussmoothings. InSection6.1weshowthatthegeneassemblymightbetheoret icallyconsidered asasinglestepprocess.Namely,weprovethatthesmoothing ofallvertices atoncewithrespecttoaHamiltonianpolygonalpathresults inanassembly graphthatcorrespondstotheexpectedsetofDNAmoleculesa ftercompletegene rearrangement.Consideringthegeneassemblyasasinglest episnotrealistic, sinceithasbeenobservedthatsomepairsofpointersrecomb inebeforeothers [39].Thus,weintroducethenotionofpartialsmoothingsof assemblygraphsin Section6.2.Ourassumptionisthatallofthemacronuclearg enesegments(MDSs) stayinasinglemoleculeduringtheentireprocessofrearra ngement.Therefore,we introduceandcharacterizethesubsetsofverticeswhosesi multaneoussmoothing keepallofthegenesegmentsonasingleDNAmoleculeandweca llsuchsets successful.Furthermore,inSection6.3wedeneasmoothin gstrategyinassembly graphasasequenceofsuccessfulsetswhichcorrespondstoa successiveDNA recombinationstrategy.Westudyandcharacterizethe"suc cessful"strategiesi.e. thestrategiesthatresultinacompletecorrectDNAassembl y. 6.1SimultaneousSmoothings FollowingtheexampleinSection5.1,anyMDS-IESmicronucl eargenestructure canbemodeledbyasimpleassemblygraph.Theverticesofthe assemblygraph 91

PAGE 104

representthepointeralignmentandtheedgesoftheassembl ygraphcorrespond toMDSsorIESs. Asnumerouspointerscanalignatonce,afew,orevenallofth erecombinationeventsmayoccursimultaneously.Inthissection,weco nsidersimultaneous smoothingofall4valentverticesinanassemblygraphthatm odelrecombination ofallpairsofpointersofthemicronucleargeneatonce. Everymicronucleargene G canbeformallyrepresentedasastringofthe followingform: (*) G = I 0 N i 1 I 1 N i 2 I k 1 N i k I k forsomepositiveinteger k;i 1 ;:::;i k 2f 1 ; 2 ;:::;k g (for i j 6 = i s if j 6 = s ), where N i j isthreesymbolwordintheset f ( i j ) M i j ( i j +1) ; ( i j ) M i j ( i j +1) g for every j 2f 1 ; 2 ;:::;k g Notethatsuchrepresentationof G completelycapturesitsMDS-IESstructure, as I 1 ;I 2 ;:::;I k representtheIESsand N i j for1 j k representtheMDSs. Each i j 6 =1and i j 6 = k +1denotesapointersequenceandwerefertothese symbolsaspointers.Let w G = w 1 w 2 w 2 k bethescatteredsubsequenceof G thatcontainsthepointersonly.Notethat w G isadoubleoccurrencesequence. Toeachmicronucleargeneweassociateasimpleassemblygra ph G witha basepointandnoendpoints,suchthat j G j = k .Place k verticesontheplane andlabelthembyintegers i j forevery j 2f 1 ; 2 ;:::;k g suchthat i j 6 =1and i j 6 = k +1,representingpointers.Connecttheverticesconsecuti velyfroma vertexwithlabel w 1 to w 2 w 2 to w 3 ,andsoon,inatransversemanner,i.e.,at everyvertextheoutgoingedgeisnotaneighboroftheincomi ngedge.Theedges of G arelabeledbyeither I j or M j ( M j ).Since G issimpleassemblygraph, ithasauniquetransverseEulerianpath.Ifonetravels G alongitstransverse Eulerianpathalongitsorientation,startingandendingat thebasepoint,one rstencounterstheedgelabeledby I 0 ,thenavertexlabeledby w 1 ,followedby edgelabeledby M i 1 or( M i 1 ),andsoon,untiltheedgelabeledby I k istraversed nishingbackatthebasepoint.Theconstructionofanassem blygraph G froma micronucleargene G isdiscussedin[2]indetails.Anexampleofsuchconstructi on 92

PAGE 105

isgiveninSection5.1. Givenpolygonalpath r in G ,thesmoothingof G withrespectto r isdened inSection5.6andtheresultingdiagram(withnovertices)i sdenotedby ~ G r .Each componentof ~ G r islabeledwithalabelthatisinheritedfrom G .Thelabelof acomponent C of ~ G r isdenotedby ( C ). Theorem6.1.1 Forevery G oftheform(*),thereisHamiltonianpolygonalpath r in G ,yielding ~ G r withacomponent C suchthat ( C )=J M 1 M 2 M k J' where J and J' arewordsoveralphabet f I 0 ;I 1 ;:::;I k g Proof:Supposethat G isgivenintheform(*).Then g G = p 1 p 2 p n where n =2 k 2,whichimpliesthat G hasexactly( k 1)4-valentverticeseach labeledby p for p 2f 2 ; 3 ; ;k g .Let r =( M 1 ; 2 ;M 2 ; 3 ;:::;M k 1 ;k;M k )bean alternatingsequenceofverticesandedgesin G .Notethat,ifadirectededge ( i +1 ;i )islabeledby M i in G ,thenwewrite M i todenotethesameedge( i;i +1), butwiththeoppositedirection.Thenbytheconstructionof G r isHamiltonian polygonalpathsincetheedges M i 1 and M i arenonneighboringedgesincident tovertex i foreach i 2f 2 ; 3 ;:::;k g Next,observethatavertexlabeled i canhavefourdierentwaysofoccurrence in G 1. I m ( i 1) M i 1 ( i ) I m +1 and I s ( i ) M i ( i +1) I s +1 aresubstringsof G ,forsome m;s 2f 0 ; 1 ;:::;k 1 g 2. I m ( i 1) M i 1 ( i ) I m +1 and I s ( i +1) M i ( i ) I s +1 aresubstringsof G ,forsome m;s 2f 0 ; 1 ;:::;k 1 g 3. I m ( i ) M i 1 ( i 1) I m +1 and I s ( i ) M i ( i +1) I s +1 aresubstringsof G ,forsome m;s 2f 0 ; 1 ;:::;k 1 g 4. I m ( i ) M i 1 ( i 1) I m +1 and I s ( i +1) M i ( i ) I s +1 aresubstringsof G ,forsome m;s 2f 0 ; 1 ;:::;k 1 g Thesefourcasesforthevertex i in G ,togetherwithitsincomingandoutgoing edgesandtheirlabels,areshowninFigure6.1.Whenthevert ex i issmoothed 93

PAGE 106

Figure6.1:Fourdistinctoccurrencesofpointer i inGandtheircorrespondingsmoothings.accordingtothepolygonalpath r ,anedgewithlabel M i 1 M i asasubstringis obtained(seeFigure6.1). Asthesmoothingofrequirescorrespondingsmoothingatea chpointer,a similarobservationholdsforeachpointer.Namely,splici ngofevery i givesan edgewithlabelthatcontains M i M i +1 asasubstringforeach i =2 ;:::;k .But, sincethereisonlyoneoccurrenceof M i in G foreach i 2f 1 ; 2 ;:::;k g as(a substringof)alabelofanedge,andbythesmoothingprocess nolabelhasbeen addedexceptthepointerlabelsbeingremoved,in ~ G r theremustbeatmostone appearanceof M i asalabelaswell.Thismeansthattheparts M i oftheedges labeled M i 1 M i and M i M i +1 obtainedaftersmoothingpointers i and i +1coincide and M i 1 M i M i +1 isalabelofasubstringofthesamecomponentin ~ G r .Hence, inductively,theremustbeacomponentwithlabelcontainin g M 1 M 2 M k 1 as asubstringinin ~ G r 2 Thetheorem6.1.1showsthateachmacronucleargene G canbemodeledbya uniqueHamiltonianpolygonalpath r intheassemblygraph G .Furthermore,the simultaneoussmoothingwithrespectto r alwaysleadstoonecomponenthaving allMDSsinacorrectmacronuclearorder. However,simultaneousrearrangementatallpointersequen cesmaynotbe 94

PAGE 107

realistic.Probablysomerearrangementsappearsimultane ously,orinaclose timeperiodstobeconsideredsimultaneous,andidentifyin gthesetsofpointers thatcouldundergosimultaneousrearrangement,aswellast heirpossibleorderof appearanceisthetopicofSection6.2. 6.2PartialSmoothings Asmentionedearlier,fewpointerrecombinationeventsmay occursimultaneously orwithinashorttimeframedierencetobeconsideredsimul taneous.Weassume thattherecombinationsthatoccursimultaneouslydonotdi sturbthenatural MDSorderinthemacronucleargene.Inourassemblygraphmod elasetof simultaneousrecombinationstranslatesintoasetofsimul taneoussmoothings, andtheconsistentorderoftheMDSsduringtherecombinatio nprocessbecomes arequirementthatthecorrespondingpolygonalpathremain sconnectedafter thesesmoothings.Inthissectionwecharacterizethesetso fverticesinasimple assemblygraphthatallowsimultaneoussmoothings. Letbeasimpleassemblygraph,andlet ~ r bethesmoothedgraphof withrespecttoaHamiltonianpolygonalpath r .Notethat ~ r doesnotcontain any4-valentvertices,andmighthavetwoverticesofdegree 1equivalenttothe endpointsof.Thegraph ~ r canhavemorethanonecomponent.Ifhas endpoints,thenoneofthecomponentsin ~ r isanarccomponentandtherest areclosedcurves(topologicallycircles).Denition6.2.1 Letbearealizableassemblygraph(i.e.,An()=1),and S a subsetofverticesin.An S -partialsmoothingof withrespecttoaHamiltonian polygonalpath r ,isanassemblygraphwithasetof4-valentvertices V () n S denotedby ~ ( r;S ) ,obtainedbysmoothingofallverticesin S withrespectto r Denition6.2.2 Letbearealizableassemblygraphand r apolygonalHamiltonianpath.Asubset S V ()iscalled linear withrespectto r ifthe S -partial smoothingofwithrespectto r isasimpleassemblygraph. 95

PAGE 108

Notethat ; V ()islinearwithrespecttoanypolygonalpath r iisa simpleassemblygraph.Let S V ().Recalltheprocessofdeletingavertex in(Denition5.5.8).Wedenotewith S theassemblygraphobtainedfrom bydeletingalloftheverticesfrom V () n S .Notethattheprocessofdeletionof avertexislocal,i.e.,ischangedatalocalneighborhoodo fthedeletedvertex withoutaectingthestructuralrigidityoftheotherverti ces.Henceifaset T of verticesisdeletedfrom,thenthiscanbedonebyxinganor derofthevertices in T ,andthenremovingverticesfrom T onebyoneaccordingtotheorder.Since theremovalofonevertexdoesnotaectthelocalstructureo ftheothers,the resultinggraphisinvariantwithrespecttothechoiceofth eorderofverticesin T Remark6.2.3 If isasimpleassemblygraph,withorwithoutendpoints,removalofallverticesin resultsinasinglecomponentdiagram(arcif has endpoints,orcircleif hasnoendpoints).Thisfollowsbecausethetransverse Eulerianpathin determinesthesinglecomponent. Wecancharacterizelinearsubsetsasfollows. Proposition6.2.4 Asubset S V () islinearwithrespecttoapolygonal Hamiltonianpath r i S hasapolygonalpath r 0 thatisEulerian ( i.e.,visitingeveryedgeexactlyonce ) ,andateveryvertexin S ,thepath r 0 eithercoincides with r ,orisdisjointto r ,(i.e.,ittracestheothertwoedgesincidenttothe vertex).Proof. Letbeasimpleassemblygraphwithorwithoutendpointshav inga Hamiltonianpolygonalpath r .Assumethatthesetofvertices S islinear. Denotewith S c thetheset V () n S .Byperformingsmoothingsatvertices in S withrespectto r ,weobtainthesimpleassemblygraph ~ ( r;S ) withsetof vertices S c .Nowtheremovaloftheverticesfrom S c meansremovalofallof theverticesformasimpleassemblygraph ~ ( r;S ) .Theremovalofeachvertexis donesuchthatthenewedgesaredenotedwiththepairofedges thatisremoved, 96

PAGE 109

i.e.,ifavertex v i isremovedandtheincidentedges e i and e i +1 aresubstituted withanewedge l i ,wewrite l i =( e i ;v i ;e i +1 )(seeFigure5.9).Theresulting graph( ~ ( r;S ) ) S c = 0 ,byRemark6.2.3,isasinglecomponentdiagram(anarc oracircle).Thenameofthiscomponentisthesequenceofedg esandvertices determiningtherequiredEulerianpathin S .Itisobtainedbytracingthesingle componentof 0 (eitherformoneendpointtotheother,orfromachosenverte x toareturntothatvertex).Inthispatheverypairofedgesin cidenttoavertex in S musteithercoincidewith r orisdisjointfrom r ,since 0 wasobtainedafter smoothingalloftheverticesin S withrespectto r Conversely,let r 0 beanEulerianpolygonalpathin S thatagreeswith r or followstheothertwoedgesatverticesof S .Suchapathcanbeconsideredasa pathin,simplybyreplacingeachedge l i in r 0 by e i ;v i ;e i +1 where l i isobtained byconnecting e i +1 and e i afterremovalof v i in S (seeFigure5.9).Callthispath r 00 .Asapathin, r 00 isEuleriansinceeachedgeofappearsexactlyonceas partofanedgein S ,and r 0 isEulerian.Atavertexin S r 00 eithercoincides with r ,orfollowstheothertwoedges,andatavertexnotin S ,itcoincides withthetransversepath.Nextconsider ~ ( r;S ) .If v 2 S ,then v issmoothedin ~ ( r;S ) asdepictedinFigure5.12.Hence r 00 canbeconsideredasapathin ~ ( r;S ) Thiscanbedoneif r 00 eithercoincideswith r (asdepictedinFigure5.12tothe rightbyadottedarc),orIf r 00 followstheothertwoedgescomplementing r ,(in whichcase r 00 istakingthearcoppositeofthedottedarcinFigure5.12).L et r 000 bethepathin ~ ( r;S ) thusobtained.Since r 00 coincideswiththetransversepath inatverticesnotin S r 000 isatransversepathin ~ ( r;S ) .Furthermore, r 000 is Eulerian,since r 00 isEulerian.Hence r 000 isatransverseEulerianpathin ~ ( r;S ) whichimpliesthat ~ ( r;S ) isasimpleassemblygraph.Therefore,Sislinear. 2 Denition6.2.5 Letbearealizableassemblygraph,and r beaHamiltonian polygonalpath.Asubset S V ()iscalled successful withrespectto r if the S -partialsmoothingofwithrespectto r isanassemblygraphthathasa transversearccomponentcontaining r .(Here,weregardthat r remainsintact afterthesmoothing,seeRemark5.6.3.) 97

PAGE 110

Asuccessful S -smoothingforaset S V ()canbeseenasasimultaneous recombinationofasetofpointers S .Asuccessful S -smoothingmodelsrecombinationswhichkeeptheMDSs(i.e.,theedgesfromthepolyg onalpath r )that areexpectedtobepartofanassembledgeneonasinglemolecu le.Hence,we regardsmoothingsofasuccessfulsubset S of V ()asapossibleonestepinthe processofassemblyoffunctionalMACgene,whererecombina tionsdetermined bythepointersin S occursimultaneously.Bytheorem6.1.1,wehavethatfor anysimpleassemblygraph,thesetofallvertices, S = V ()issuccessful. Lemma6.2.6 Let beasimple,realizableassemblygraphand r beaHamiltonianpolygonalpath.Theneverylinearsubset S of V () issuccessfulwithrespect to r Proof. Let S bealinearsubsetof V ().Then ~ ( r;S ) isasimpleassemblygraph. Hence ~ ( r;S ) itselfisatransversearccomponentof ~ ( r;S ) thatcontains r .Therefore, S issuccessfulwithrespectto r 2 3 4 5 6 7 8 9 g 2 Figure6.2:AnassemblygraphmodelingactinIgeneof S.nova [2].AHamiltonian pathinthisassemblygraphisindicatedwithathickenedlin e.Example6.2.7 LetbetheassemblygraphdepictedinFigure6.2,wherea polygonalHamiltonianpath r isalsoindicatedbyathickline.Considerthree setsofvertices S 0 = f 2 ; 3 g S 00 = f 2 ; 3 ; 4 g and S 000 = f 4 ; 8 g Theassemblygraphs ~ ( r;S 0 ) ~ ( r;S 00 ) and ~ ( r;S 000 ) depictedinFigure6.3(A), Figure6.3(B)andFigure6.3(C),respectively,areobtaine dfrombyapplying S 0 -, S 00 -and S 000 -partialsmoothingswithrespectto r .Weobservethat S 0 is 98

PAGE 111

5 6 7 8 9 g (A) (B) 4 5 6 7 8 9 g 3 9 2 5 6 7(C) g Figure6.3:Threechoicesofpartialsmoothings.(A)Exampl eofalinearset: f 2 ; 3 g smoothingofgraphfromFigure6.2resultsinasingletrans versecomponentcontaining r .(B)Exampleofasuccessfulbutnotlinearset: f 2 ; 3 ; 4 g -smoothingofgraphfrom Figure6.2resultsintwotransversecomponentsoneofwhich contains r .(C)Example ofanon-linearnon-successfulset: f 4 ; 8 g -smoothingofgraphfromFigure6.2results inthreetransversecomponentstwoofwhichcontain r .linear,since ~ ( r;S 1 ) isasimpleassemblygraphwithaHamiltonianpolygonalpath indicatedwiththickredline.Therefore S 0 isalsosuccessful. Ontheotherhand, S 00 issuccessful,since ~ ( r;S 00 ) hasatransversepath(i.e., component)thatcontains r .However, S 00 isnotlinear,because ~ ( r;S 00 ) hastwo transversecomponents:onecontaining r andtheotherbeingaseparatecyclic component.Theset S 000 isneitherlinearnorsuccessful,since ~ ( r;S 000 ) hasthree transversecomponents:oneindicatedwithdotedline,anot herindicatedwitha solidlinecontainingpartsof r andthethirdbeingasmallcyclicpartwhich doesn'tcontainanypartof r .Sincetherearetwocomponentsthatcontainedges from r S 000 isneitherlinear,norsucessful. Remark6.2.8 Theunion,intersection,complement,subsetorsupersetof successfulsetsarenotnecessarilysuccessfulsets.Forexamp le,letbetheassembly graphand r apolygonalHamiltonianpathinasdepictedinFigure6.2.T hen, theset f 5 ; 6 ; 7 g issuccessful,but f 5 g and f 5 ; 6 ; 7 ; 8 g arenot.Also f 6 ; 7 g and f 5 ; 7 g aresuccessful,but f 5 ; 6 ; 7 gnf 6 ; 7 g and f 6 ; 7 g\f 5 ; 7 g arenot. Denition6.2.9 Letbeasimplerealizableassemblygraph,andlet r bea Hamiltonianpolygonalpath( e 1 n v 0 ) ;v 1 ;e 2 ;:::;v m 1 ;e m ;v m ; ( e m +1 n v m +1 )in. AHamiltoniansetofpolygonalpaths f r 1 ;r 2 ;:::;r k g iscalled complementary to r ifnoedgesfrom r except(possibly)when e 1 6 = e m +1 ( e 1 n v 1 )and( e m +1 n v m ) 99

PAGE 112

appearinthepaths r 1 ;r 2 ;:::;r k Acomplementaryto r Hamiltoniansetofpolygonalpathsiscalled minimal andisdenoted r c ,ifitscardinalityisminimalamongallcomplementaryHami ltoniansetsofpolygonalpaths. Onecandeducefromthedenitionthatallpathsin r c arewithmaximal lengthsuchthat r c iscomplementaryto r g 1 g 2 g 3 3 4 5 6 7 8 9 g 2 Figure6.4:ComplementarypathsfortheHamiltonianpathde pictedinFigure6.2.Example6.2.10 AminimalcomplementaryHamiltoniansetofpolygonalpaths r c = f r 1 ;r 2 ;r 3 g forthesimpleassemblygraphandthepolygonalHamiltonia n path r isdepictedinFigure6.4. Fortherestofthesection,onlysimpleassemblygraphswith noendpointsare considered.Thereisnolossofgeneralitywithsuchanappro ach,sinceeachsimple assemblygraphwithtwoendpointscanbetransformedintoas impleassembly graphwithnoendpointsbyconnectingtheendpoints.InLemm a6.2.11,some propertiesofcomplementaryHamiltoniansetsofpolygonal pathsareexamined. Lemma6.2.11 Let beasimpleassemblygraphwithnoendpoints,and r =( e 1 n v 0 ) ;v 1 ;e 2 ;:::;v m 1 ;e m ;v m ; ( e m +1 n v m +1 ) beaHamiltonianpolygonalpathin and r c = f r 1 ;r 2 ;:::;r k g beaminimal Hamiltoniansetofpolygonalpathscomplementaryto r .Thefollowingholds. 100

PAGE 113

(i) Theelementsof r c arepairwiseedgedisjoint,i.e.,noedgeof appearsin morethanonepathfrom r c (ii) Foreachedge e 2 E () e appearseitherin r orthereissome i suchthat e appearsin r i 2 r c (iii) Each r i 2 r c thatdoesnotcontainedgesfrom r startsandendsatthesame edge. (iv) If r i 2 r c containsanedgefrom r ,thenitcontainsboth e 1 and e m +1 (v) Thesetofpaths r c isunique. Proof. (i)Let r i and r j betwodistinctpolygonalpathsin r c containingacommon nonemptypartofanedge e .Let v 1 and v 2 betheendpointsof e (possibly v 1 = v 2 ). Since r c isaHamiltonianset,neither v 1 nor v 2 appearsinboth r i and r j .Hence therearethreepossibilities:(1)eachof r i r j containsexactlyoneof v 1 or v 2 ,(2) both v 1 and v 2 appearinoneofthepaths r i or r j ,(3)oneof v 1 or v 2 doesnot appearin f r i ;r j g .Weshowthatnoneofthesecasesholds.Ifthecase(1)happen s, thenassuming v 1 appearsin r i ,( e n v 2 )isaninitialorterminaledgeof r i and ( e n v 1 )isaninitialorterminaledgeof r j .Therefore,thepath r ij = r i ;v 1 ;e;v 2 ;r j isapolygonalpathin,and( r c nf r i ;r j g ) [f r ij g isaHamiltoniansetofpolygonal pathscomplementaryto r .Thiscontradictstheminimalityof r c .Hence,thecase (1)isnotpossible.(2)Considerthecasewhenboth v 1 and v 2 appearin r i .That impliesthatneitherof v 1 and v 2 appearin r j ,andboth( e n v 1 )and( e n v 2 )areend edgesof r j and e nf v 1 ;v 2 g isanedgein r j .Since r j isconnected,itmustconsist onlyofapartof e withouttheendpoints.Butthen r j isnotapolygonalpathand canberemovedfrom r c ,whichviolatestheminimalityconditionof r c .Forthe case(3),assume,withoutlossofgenerality,that v 1 doesnotappearin f r i ;r j g Then( e n v 1 )isaninitialorterminaledgeofboth r i and r j .If v 2 appearsin r i then( e n v 2 )isaninitialorterminaledgeof r j andtherefore r j =( e n v 1 ) ; ( e n v 2 ), otherwise, v 2 isin r j butnot r i andhence r i =( e n v 1 ) ; ( e n v 2 ).Ineithercase theresultingpathisnotapolygonalpathandthereforecase (3)isnotpossible either.Hence r i and r j areedgedisjoint. 101

PAGE 114

(ii)Let e 2 E ().Notethatboth r and r c areHamiltonian.Soeveryvertex v isinboth r and r i forsome r i 2 r c .Thereforeeach r and r i containsapair ofneighboringedgesincidentto v .So,if e isaloopatavertex v ,twopairsof neighboringedgesat v mustcontain e .Suppose e isnotaloopand e doesnot appearin r .Let v and v 0 beincidentverticesof e .If v (or v 0 )isnot v 0 or v m +1 thenthefouredgesincidentto v (or v 0 )mustbelongtotwocomplementarypairs ofneighboringedges(onepairbelongingto r andtheotherto r i forsome r i 2 r c ). Hence e mustbein r i .Thesameholdsif e 1 = e m +1 isincidentto v .Anedge incidentto v belongstoboth r and r c onlyifitis e 1 or e m +1 when e 1 6 = e m +1 If v = v 0 (or v = v m +1 ),theHamiltonianpath r containstwoedgesincidentto v thataredistinctfrom e 1 (i.e., e m +1 ),henceoneofthepathsin r c mustvisit v through e 1 and e (i.e., e m +1 and e ). (iii)Let r i 2 r c beapolygonalpaththatdoesnotcontainedgesfrom r Supposethat r i startswiththeedge e andendswiththeedge e 0 andassumethat e 6 = e 0 .let v and v 0 betheendpointsof e suchthat r i startswith( e n v ).Note thatif v = v 0 then e isaloop,i.e., e isaneighborofitselfand r i muststartand endat e .If v appearsin r i ,then( e n v 0 ) ;v issubsequenceof r i ,implyingthat r i startsandendsat e ,so r i iscircular.Thiscannothappenif e 6 = e 0 .Hence, v does notappearin r i .Then,thereissome j 6 = i suchthat v appearsin r j .Therefore ( e n v 0 ) ;v;e 00 isasubsequenceof r j ,where e 00 aneighborof e withrespectto v not in r .But,thatcontradicts(i),sinceinthatcase e wouldbeinboth r i and r j Hence, e = e 0 (iv)Notethatif e 1 = e m +1 then r i 2 r c cannotcontainanyedgefrom r by denitionof r c ,soif r i 2 r c isapolygonalpaththatcontainsanedgefrom r e 1 6 = e m +1 .Bythedenitionof r c r i contains e 1 or e m +1 .Supposethat r i starts with( e 1 n v 1 ),i.e., r i =( e 1 n v 1 ) ;v 0 2 ;:::;v 0 k ( e n v ).Firstnotethat e 1 6 = e ,since otherwise v = v 0 and r i contains( e 1 n v 0 )whichcontradictsthedenitionof r c .If v 0 k = v then e isaloop.TheHamiltonianpath r visits v eitherbythetwoedges incidentto v thataredistinctfrom e ,oritcontains e andoneedgedistinctfrom e Intheformercase r i =( e n v ) ;v; ( e n v )andcannotstartwith( e 1 n v 1 ).Soitmust 102

PAGE 115

bethat r contains e ,inwhichcaseitcontains( e n v ),i.e.,( e n v )=( e m +1 n v m +1 ). If v 0 k 6 = v ,considertheotherthreeincidentedgesof v ,say e 0 ;f;f 0 .Assume e isaneighbortoboth e 0 and f withrespectto v .Since v isnotvisitedby r i ,there mustbe r j 2 r c thatcontains v .By(i), r j cannotcontain e ,thereforeitcontains either f;f 0 or e 0 ;f .InbothcasestheHamiltonian r hastovisit v containing edges e;e 0 or e;f .Since e isinboth r and r i ,bythedenitionof r c ,itmustbe thatitistheendof r ,i.e., v = v m and v 0 k = v m +1 (v)First,notethatthereisnoelementof r c thatcontainsedges e 1 or e m +1 from r if e 1 = e m +1 .If e 1 6 = e m +1 ,thenby(iv),thereisexactlyoneelementof r c that containsboth e 1 and e m +1 .Let r c = f r 1 ;:::;r k g and^ r c = f r 0 1 ;:::;r 0 k 0 g betwo distinctminimalHamiltoniansetsofpolygonalpathscompl ementaryto r .From theaboveargument,both r c 1 and r c 2 containeitheroneornoelementthathasedges from r .Now,letshowthatthepathsthatdonotcontainedgesfrom r arethe sameinboth r c and^ r c .Let r i =( e 1 i n v 0 i ) ;v 1 i ;e 2 i ;:::;v s 1 i ;e si ;v si ; ( e s +1 i n v s +1 i ) beanelementof r c thatdoesnotcontainedgesfrom r .By(iii)wehavethat e 1 i = e s +1 i i.e., r i isacycleand v 0 i = v si ;v 1 i = v s +1 i Thereissome j 2f 1 ;:::;k 0 g suchthat r 0 j 2 ^ r c isapolygonalpaththatvisitsthevertex v 1 i .Then e 2 i appears in r 0 j .If r 0 j doesnotvisitthevertex v 2 i ,thenthereis r 0 l 2 ^ r c ( l 6 = j )thatvisits v 2 i .But,then e 2 i wouldhavetoappearin r 0 l too,whichcontradicts(i).Hence, r 0 j mustvisitthevertex v 2 i .Usingthesamearguments,onecanshowthat r 0 j containthevertices v 3 i ;:::;v s +1 i andtheedges e 3 i ;:::;e s +1 i .Thereforeweobtain r cyc i = r 0 j ,since r i isacycle. Bythesymmetryofthearguments,theaboveshowsthattherei saone-to-one correspondencebetweentheelementsof r c and^ r c thatdonotcontainedges e 1 and e m +1 .Further, r c hasanelement r 1 thatcontainstheedges e 1 and e m +1 ,i thereisanelement r 0 1 in^ r c thatcontainsboth e 1 and e m +1 .Moreover,itmust be r 1 = r 0 1 sinceboth r 1 and r 0 1 containtheedges/verticesnotvisitedby r and therestofthepathsin r c and^ r c .Thisprovesthat r c =^ r c i.e., r c isuniquefor agivenandagivenHamiltonianpolygonalpath r in. 2 Lemma6.2.12 Let beasimpleassemblygraphwithnoendpoints,andlet r be 103

PAGE 116

aHamiltonianpolygonalpath,with r c itsminimalcomplementaryHamiltonian setofpolygonalpaths.Thenthenumberofconnectedcompone ntsof ~ r is j r c j if r isacycle,and j r c j +1 ,otherwise. Proof. Letbeasimpleassemblygraph,withaHamiltonianpolygona lpath r =( e 1 n v 0 ) ;v 1 ;e 2 ;:::;v m 1 ;e m ;v m ; ( e m +1 n v m +1 ) ; andlet r c = f r 1 ;:::;r s g beaminimalcomplementaryHamiltoniansetofpolygonalpathsto r .Let C = f C 1 ;:::;C p g bethesetofcomponentsof ~ r .AsmentionedinRemark5.6.3,eachcomponentin C containsedgesandverticesinherited from.Denotewith E ( C i )thesetofedgesinheritedfromthatarecontained in C i ,andlet E 0 = f E ( C 1 ) ;:::;E ( C p ) g .Notethat E ( C i )isnonemptyforeach i 2f 1 ; 2 ;:::;p g ,andforeachedge e 2 E (),thereisexactlyone i 2f 1 ; 2 ;:::;p g suchthat e 2 E ( C i ).Hence E 0 isapartitionof E (). First,assumethatthereisapathin r c thatcontainseithertheedgesfrom r ThenbyLemma6.2.11(iv),suchpathisuniquein r c andcontainsboth e 1 and e m +1 .Withoutlossofgenerality,let r 1 2 r c besuchapath.Dene G 1 = f e 2 E () j theedge e intersectswith r or r 1 g ; and G i = f e 2 E () j theedge e intersectswith r i g for i =2 ;:::;s: Notethatifanedge e 2 E ()intersectswith r i ,thenitiscontainedin r i ,or intersectsatoneoftheendsof r i .By(i)and(ii)fromLemma6.2.11 G = f G 1 ;:::;G k g isapartitionon E ( V ). Nextweshowthat G = E 0 .Let r i =( e i1 n v i 0 ) ;v i 1 ;e i2 ;:::;v i m i 1 ;e im i ;v i m i ; ( e im i +1 n v i m i +1 ) ; for i 2f 1 ; 2 ;:::;s g : Bydenitionofsmoothingalledgesfrom r appearinonecomponentof C ,and thereforeinoneofthesetin E 0 ,say E ( C 1 ).Since e 11 = e m +1 2 E ( C 1 ),bythe denitionofvertexsmoothingwehave e 12 2 E ( C 1 ).Then e 12 2 E ( C 1 )implies 104

PAGE 117

e 13 2 E ( C 1 ),andinductively,oneobtainsthateveryedgefrom r 1 isin E ( C 1 ).For anyxed i 2f 2 ;:::;s g ,byLemma6.2.11(iii)wehavethat e i1 = e im +1 ,i.e., r i iscyclic.If C 0 isacycliccomponentin C with e i1 2 E ( C 0 ),bythedenitionof smoothing,theedges e ij belongtothesamepartitionset E ( C 0 )from E 0 forall j 2 f 2 ;:::;m i +1 g .Hence,foreach G i 2 G; thereis E ( C j ) 2 E 0 suchthat G i E ( C j ). Asboth G and E 0 arepartitionsof E () G = E 0 .Theproofissimilarifweassume thatthereisnoelementin r c thatcontainsedgesfrom r .Inthatcase,consider G = f G i j i 2f 0 ; 1 ; 2 ;:::;s gg ,where G 0 = f e 2 E () j theedge e intersectswith r g and G i = f e 2 E () j theedge e intersectswith r i g for i 2f 1 ; 2 ;:::;s g 2 Letbeasimpleassemblygraphwithnoendpoints, r beaHamiltonian polygonalpath,and r c = f r 1 ;r 2 ;:::;r k g beaminimalHamiltoniansetofpolygonalpathscomplementaryto r .Denoteby r c 0 thesubsetof r c thatconsistsof thepathsthatareedgedisjointfrom r Let S V ().For r i 2 r c 0 ,let C i bethecomponentof ~ ( r;S ) thatcontains edgesinheritedfrom r i .When S = V ()suchacomponentisconnectedin ~ ( r;V ()) ,bytheproofofLemma6.2.12. Lemma6.2.13 Let r i 2 r c 0 and S i = f v j v 2 V () ;v appearsin r i g .Then,for anysuccessfulset S V () C i isaconnectedcomponentof ~ ( r;S ) ifandonlyif S i S Proof. First,assumethat C i isaconnectedcomponentof ~ ( r;S ) .Thisimplies thateachedgeof r i appearsin C i .Thatispossibleonlyifeveryvertexof r i is smoothed.Therefore, S i S Conversely,let S i S .Since r i hasnoedgesfrom r ,byLemma6.2.11(ii)it iscyclic.Eachvertexthatbelongsto r i issmoothed.Bysmoothingthevertices from r i (i.e., S i ),oneobtainsaconnectedcomponentthatcontainseachedge from r i asexplainedintheproofofLemma6.2.12.Bydenitionthatc omponent is C i ,so C i isacomponentof ~ ( r;S ) 2 Fortherestofthesection,wecall C i thecomponentof ~ r;S thatcorresponds to S i ,iftheyarerelatedasinLemma6.2.13.Keepingthesameden itionsof 105

PAGE 118

;r;r c ;S i ; and C i ,wecharacterizethesuccessfulsetsasfollows. Proposition6.2.14 Asubset S V () issuccessfulithereis I f 1 ;:::;k g suchthat S canbewrittenas S I [ S 0 ,where S I = S i 2 I S i S 0 islinearin ~ ( r;S I ) n ( S i 2 I C i ) Proof. Let S V ()beasuccessfulset.If S islinearin,then ~ r isasingle componentandwecantake I = ; .So, S = ;[ S .If S isnotlinear,then ~ ( r;S ) hasmorethanoneconnectedcomponent.Let C = f C 1 ;:::;C k g bethe setofallconnectedcomponentsof ~ ( r;S ) .Oneofthemcontains r ,since S is successful.Let I = f i j C i isacomponentin ~ ( r;S ) thatdoesnotcontain r g .If C 0 iscomponentof ~ r;S thatdoesnotcontain r ,thenthecorrespondingvertex set S 0 = f v j v 2 V () ;v appearsin r 0 2 r c 0 g issubsetof S ,byLemma6.2.13. Hence, S i S forany S i thatcorrespondstosomeconnectedcomponent C i such that i 2 I .Therefore, S I S .Furthermore,because S issuccessfulnotethat ~ ( r;S ) n S i 2 I C i hasonetransversearccomponentthatcontains r .Thisimplies that S 0 = S n S I mustbelinearfor ~ ( r;S I ) n S i 2 I C i .Wehave S = S I S S 0 Conversely,let S = S I S S 0 forsome S I = [ i 2 I S i I f 1 ;:::;k g .Note that ~ ( r;S I ) containstheconnectedcomponent C i thatcorrespondsto S i foreach i 2 I ,bytheLemma6.2.13.Sincenoneofthosecomponentscontain s r ~ ( r;S I ) n S i 2 I C i isanassemblygraphthatcontains r .Since S 0 islinearin ~ ( r;S I ) n S i 2 I C i ~ ( r;S I S S 0 ) n S i 2 I C i isasimpleassemblygraphthatcontains r .Hence, ~ ( r;S ) has atransversearccomponentthatcontains r ,or,inotherwords, S isasuccessful set. 2 Example6.2.15 ForthesimpleassemblygraphandthepolygonalHamiltonianpath r giveninExample6.2.10(seeFigure6.4), r c 0 = f r 2 ;r 3 g .Furthermore, S 2 = f 4 g S 3 = f 5 ; 6 ; 7 g andlet C 2 and C 3 becomponentsof ~ r thatcorrespondto S 2 and S 3 ,respectively.Then,theset f 2 ; 3 ; 4 g issuccessfulbyProposition6.2.14, since f 2 ; 3 ; 4 g = S 2 [f 2 ; 3 g and f 2 ; 3 g islinearin ~ ( r;S 2 ) n C 2 (seeFigure6.3(B)). Theset f 4 ; 8 g isnotsuccessfulbyExample6.2.7(seeFigure6.3(C)).Itca nbe writtenas S 2 [f 8 g ,but f 8 g isnotlinearin ~ ( r;S 2 ) n C 2 .Theset f 4 ; 5 ; 6 ; 7 g 106

PAGE 119

(A) (B) 3 8 9 g 2 C2C3 3 9 2 C 3 C 2 g Figure6.5:Successfulandunsuccessfulsmoothings.(A)Su ccessfulsmoothingof inExample6.2.10relativeset S = f 4 ; 5 ; 6 ; 7 g = f 4 g[f 5 ; 6 ; 7 g = S1[ S2.The resultinggraphcontains r inasinglecomponent.(B)Unsuccessfulsmoothingofin Example6.2.10relativeset S = f 4 ; 5 ; 6 ; 7 ; 8 g = f 4 g[f 5 ; 6 ; 7 g[f 8 g = S1[ S2[f 8 g Theset f 8 g isnotlinearin ~ ( r;S 2 [ S 3 )n ( C2[ C3)sincetheresultinggraphcontains r ontwodistinctcomponentsindicatedbyfullanddottedline s.canbewrittenas S 2 [ S 3 S ; .ItisclearfromFigure6.5(A)that ; islinearin ~ ( r;S 2 [ S 3 ) n ( C 2 [ C 3 ).Hence f 4 ; 5 ; 6 ; 7 g isasuccessfulset.Ontheotherhand,the set f 4 ; 5 ; 6 ; 7 ; 8 g isnotsuccessful,since f 8 g isnotlinearin ~ ( r;S 2 [ S 3 ) n ( C 2 [ C 3 ) (seeFigure6.5(B)). 6.3SmoothingStrategies ThesuccessfulsetsstudiedinSection6.2aredenedsuchth atthesmoothing oftheverticesofasuccessfulsetkeeptheHamiltonianpoly gonalpathinthe assemblygraphconnected.SinceaHamiltonianpolygonalpa thmodelsasingle macronucleargene,asuccessfulsetresemblesasetofpoint erswhosesimultaneousrecombinationkeepagivengeneintact.Inparticular ,thesimultaneous rearrangementofpointersfromasuccessfulsetdonotdispe rsetheMDSsondifferentmolecules.Therefore,asuccessfulsetcanbeconsid eredasasinglestepin theprocessofmacronucleargeneassembly. Inthissectionweusethesuccessfulsetstodenesmoothing strategiesfora givenassemblygraph.Thesmoothingstrategiesdenepossi blerearrangement strategiesforagivenmicronucleargene.Denition6.3.1 Letbesimpleassemblygraphand r aHamiltonianpolygonal 107

PAGE 120

pathin.Anorderedpartition S = f S 1 ;:::;S k g of V ( G )iscalled smoothing strategy forwithrespectto r .Denote S 0 1 = ; and S 0 i = [ i 1 j =1 S j .Wesaythat S isa successfulsmoothingstrategy forwithrespectto r if S i issuccessfulin ~ ( r;S 0 i ) ,forevery i =1 ;:::;k 2 3 4 5 6 (A) (B) (C) g 2 3 4 g 4 5 6 g Figure6.6:(A)Anexampleofanassemblygraphandthecorres pondingHamiltonian polygonalpath.(B) S1-smoothingofthegraphin(A),forthesuccessfulset S1= f 2 ; 3 g (C) S2-smoothingofthegraphin(A),fortheset S2= f 5 ; 6 g S2isnotsuccessfuland thereforecannotbetherstsmoothingsetinasmoothingstr ategy.Example6.3.2 ForthesimpleassemblygraphandthepolygonalHamiltonia n path r giveninFigure6.6(A),considerthesets S 1 = f 2 ; 3 g and S 2 = f 5 ; 6 g .The smoothing ~ ( r;S 1 ) isdepictedinFigure6.6(B),andsince r belongstoasingle transversecomponentof ~ ( r;S 1 ) S 1 issuccessful.Theset S 2 = f 5 ; 6 g isnot successfulasseenin ~ ( r;S 2 ) depictedinFigure6.6(C)sinceportionsof r belong totwotransversecomponents.Furthermore, S 1 [ S 2 = V ().Therefore,the sequence( S 1 ;S 2 )isasuccessfulsmoothingstrategy,but( S 2 ;S 1 )isnotasuccessful smoothingstrategyforwithrespectto r Lemma6.3.3 Let beasimpleassemblygraph, r beaHamiltonianpolygonal pathin andlet S S 0 V () .Ifboth S and S 0 aresuccessfulwithrespectto r ,then S 1 = S 0 S issuccessfulfor ~ ( r;S ) Proof. Let S S 0 V (),andsupposethatboth S and S 0 aresuccessful. Then ~ ( r;S ) isanassemblygraphthathasatransversearccomponentcont aining r .Theapplicationof S 1 -partialsmoothingon ~ ( r;S ) resultsin ~ ( r;S 0 ) .However, 108

PAGE 121

~ ( r;S 0 ) isanassemblygraphthathasatransversecomponentcontain ing r ,since S 0 issuccessfulin.Therefore, S 1 = S 0 S issuccessfulfor ~ ( r;S ) 2 Proposition6.3.4 Thereisaone-to-onecorrespondencebetweenthesetofsuccessfulsmoothingstrategiesandthesetofallnestedseque nces: P 1 P 2 P k = V () ,where P i issuccessfulfor forevery i =1 ;:::;k Proof. Let f S 1 ;:::;S k g beasuccessfulsmoothingstrategyforwithrespect to r .Then, S i issuccessfulin ~ ( r;S 0 i ) ,where S 0 i = [ i 1 h =1 S h .Hence, P i = S i [ S 0 i is successfulin.Therefore, P 1 P 2 P k isnestedsequenceofsuccessful setsin. Conversely,let P 1 P 2 P k = V ()benestedsequencewhere P i is successfulforforevery i =1 ;:::;k .Dene S i = P i P i 1 .Since P i 1 P i and both P i and P i 1 aresuccessfulsetsin,byLemma6.3.3, P i P i 1 issuccessful in ( r;P i 1 ) .Ontheotherhand, P i 1 = [ i 1 h =1 S h = S 0 i .Therefore, S i = P i P i 1 isasuccessfulsetin ( r;S 0 i ) .Theorderedset f S 1 ;:::;S k g consistsofpairwise disjointsubsetsof V ()whoseunionis V ()andeach S i issuccessfulin ~ ( r;S 0 i ) Hence S 1 ;:::;S k isasuccessfulsmoothingstrategyfor r 2 Example6.3.5 LetbetheassemblygraphgiveninFigure6.7(A)andlet r be theHamiltonianpolygonalpathdepictedbytickedredline. S 1 -smoothingof, forthesuccessfulset S 1 = f 2 ; 3 ; 4 g withrespectto r resultsinanassemblygraph 1 ,givenonFigure6.7(B).Theset S 2 = f 5 ; 6 ; 7 g issuccessfulwithrespectto r in 1 .Theassemblygraph 2 obtainedby S 2 -partialsmoothingof 1 isillustrated inFigure6.7(C).Applying S 3 -smoothingon 2 forthesuccessfulset S 3 = f 8 ; 9 g resultswiththediagramdepictedinFigure6.7(D).Since S 1 [ S 2 [ S 3 = V (), f S 1 ;S 2 ;S 3 g issuccessfulsmoothingstrategyfor. 109

PAGE 122

5 6 7 8 9 g 8 9 g MDS3 IES6 MDS9 MDS8 MDS7 MDS6 MDS5 MDS4 MDS2 MDS1 IES1 IES2 IES4 IES3 IES7 IES8 IES5 3 4 5 6 7 8 9 g 2 (A) (B) (C) (D) Figure6.7:(A)Anexampleofanassemblygraphandthecorres pondingHamiltonian polygonalpath.(B) S1-smoothingofthegraphin(A),forthesuccessfulset S1= f 2 ; 3 ; 4 g .(C) S2-smoothingofthegraphin(B),forthesuccessfulset S2= f 5 ; 6 ; 7 g (D) S3-smoothingofthegraphin(C),forthesuccessfulset S2= f 8 ; 9 g .110

PAGE 123

7AssemblyWords Inthischapterweinvestigatetheorderofrecombinationev entsofDNAthattake placeduringthemacronucleardevelopmentinciliates.The recombinationprocesseshappenincertainsuccession,possiblywithsomepoi nterrecombinations performedatthesametime,butothersinprescribedorder.W edenepartialorderonDNAmoleculesinsuchawaythattwomoleculesarerelat edbythisorder whenoneisproducedbyrecombinationfromtheother.Weappl yourmodeltoexperimentaldatathatgivesthepossibleintermediatemolec ulesinthemacronucler geneassembly.Themodelprovidesamethodforderivingallp ossiblepathways ofgenerearrangements.Namely,givenanMDS-IESstructure ofamicronuclear geneandpossibleintermediatemolecules,ourgoalistothe oreticallycharacterize allpossibleassemblystrategies.Forthatpurpose,weintr oduceassemblywords assetsoflinearandcircularwordsanddeneapartialorder onsuchassembly wordsthatrepresentsgeneassemblystrategy. 7.1DenitionsandNotations Let A k = f M 1 ;:::;M k ;I 1 ;:::;I ( k +1) g and A k = f M 1 ;:::; M k ; I 1 ;:::; I ( k +1) g be alphabets(setsofsymbols)forsomeinteger k 1,and A k = A k [ A k .We considertwotypesofwordsover A :linearwordsthatcorrespondtotheregular denitionofwordsoveranalphabet,and circularwords denedinSection3.1.2. Sameasintheintermolecularandtheintramolecularmodel, symbol M i 2 A k correspondstoMDS i for i 2f 1 ; 2 ;:::;k g andeachsymbol I j correspondstoIES j for j 2f 1 ; 2 ;:::;k +1 g .Thesymbolsin A k refertothecorrespondinginverted MDSsandIESs. 111

PAGE 124

Denition7.1.1 An assemblyword overanalphabet A k isaset W = f w 0 ; [ w 1 ] ; [ w 2 ] ;:::; [ w n ] g whichconsistofonelinearword w 0 and n circularwords[ w 1 ] ; [ w 2 ] ;:::; [ w n ]over A k ,suchthatforeachsymbol 2 A k ,either or appearexactlyoncein W Wesaythattwoassemblywordsareequaliftheyareequalasse ts. Wedenoteby A k thesetofallassemblywordsoveralphabet A k Example7.1.2 Forexample,let A 4 = f M 1 ;M 2 ;M 3 ;M 4 ;I 1 ;I 2 ;I 3 g and A 4 = A 4 [ A 4 ,and w 0 = M 3 M 4 I 2 M 2 ,[ w 1 ]=[ I 1 ],[ w 2 ]=[ M 1 I 1 ],[ w 3 ]=[ M 1 I 3 ]bewords over A 4 .Then W = f w 0 ; [ w 1 ] ; [ w 3 ] g isanassemblyword,but W 0 = f w 0 ; [ w 1 ] ; [ w 2 ] g isnot,since I 3 doesnotappearin W 0 .Also, W 00 = f w 0 ; [ w 1 ] ; [ w 2 ] ; [ w 3 ] g isnotan assemblyword,since I 1 appearstwicein W 00 Denition7.1.3 Eachwordover A k oftheform M i M ( i +1) :::M ( i + j ) or M ( i + j ) ::: M i ,forsome i;j 2f 1 ; 2 ;:::;k g iscalledan assembledsegment .Foran assembledsegment s andanassemblyword W ,wedenoteby s @ W ifthereis anelement w 2 W suchthat s isasubstringof w Wedenoteby S W thesetofallassembledsegmentsintheassemblyword W Anassembledsegment s @ W is maximal ifforeveryassembledsegment s 0 @ W s s 0 implies s 0 = s Denition7.1.4 Let W beanassemblyword.Wedenetheset S max W = f s @ W j s isamaximalassembledsegment g .The degree oftheassemblyword W is denedas s 2 S max W ( j s j 1).Thedegreeofanassemblyword W isdenotedby dg(W). Thedegreeofanassemblyword W countsthepairs M i M ( i +1) or M ( i +1) M i thatappearassubwordsintheelementsof W Example7.1.5 Considertheassemblyword W = f w 0 ; [ w 1 ] ; [ w 3 ] g ,where w 0 = I 4 M 3 M 4 I 2 M 2 ,[ w 1 ]=[ I 5 M 5 M 6 I 3 ],[ w 2 ]=[ I 6 M 1 I 1 ],[ w 3 ]=[ I 7 ].Then S W = 112

PAGE 125

f M 3 M 4 ;M 5 M 6 ; M 2 ; M 1 g andthereforethedegreeof W is(2-1)+(2-1)+(1-1)+(11)=2. 7.2Rewritingrules Denition7.2.1 Wedenethreetypesoftransformations(rewritingrules)o f assemblywords: deletion,insertion and inversion DeletionLet W = f w 0 ; [ w 1 ] ;:::; [ w n ] g beanassemblywordoveralphabet A k .Suppose w 0 iswrittenas w 0 = a 1 a i w n +1 a i +1 a s andlet w 0 0 = a 1 a 2 a s Thenwesaythattheassemblyword W 0 = f w 0 0 ; [ w 1 ] ;:::; [ w n ] ; [ w n +1 ] g isobtainedfrom W bydeletion. InsertionLet W = f w 0 ; [ w 1 ] ;:::; [ w n ] g beanassemblywordoveralphabet A k and w 0 = b 1 b s .Thenwesaythattheassemblyword W 0 = f w 0 0 ; [ w 1 ] ;:::; [ w j 1 ] ; [ w j +1 ] ;:::; [ w n ] g isobtainedfrom W byinsertion,where w 0 0 = b 1 b i w j b i +1 b s ( j n i s ). InversionLet W = f w 0 ; [ w 1 ] ;:::; [ w n ] g beanassemblyword.Suppose w 0 iswritten as w 0 = v 1 v 2 v 3 ,where v 1 ;v 2 and v 3 arewordsover A ,andlet w 0 0 = v 1 v 2 v 3 Thenwesaythattheassemblyword W 0 = f w 0 0 ; [ w 1 ] ;:::; [ w n ] g 113

PAGE 126

isobtainedfrom W byinversion. Notethatthedeletionandinsertionoperationsdenedonas semblywordsare similartotheoperations op 1and op 1 R ,respectivelyfromtheintermolecularmodel giveninSection3.2.Theydierintheirdomains(alphabets )andtheapplicability requirements.Forexample,theoperation op 1isapplicabletoastring u 0 over alphabet= f A;C;G;T g if u 0 containsrepeatedsubword,i.e u = uxwxv for some u;x;w;v 2 ,and op 1( uxwxv )= f uxv; [ xw ] g .Ontheotherhand,ifthe singletonset f u 0 g isanassemblywordthedeletionoperationisapplicableto f u 0 g andtheresultisnotuniquelydetermined.Namely,if u 0 = uxv = wyz ,thenboth oftheassemblywords f uv; [ x ] g and f wz; [ y ] g areobtainedbydeletionfrom u 0 Similarly,theinsertionoperationisapplicabletoanyass emblyword,while op 1 R isapplicabletoasetoflinearandcircularwordsuchthatbo thofthemcontain thesamesubword.Lemma7.2.2 (i)Theword W 0 obtainedfromanassemblyword W bydeletion, addition,orinversion,isindeedanassemblyword. (ii)If W 0 isobtainedfrom W bydeletion(insertionandinversion,respectively),then W canbeobtainedfrom W 0 byinsertion(deletionandinversion, respectively).Proof. (i)Let W = f w 0 ; [ w 1 ] ;:::; [ w n ] g beanassemblywordoveralphabet A k andlet W 0 beasetofwordsobtainedfrom W bydeletion,insertionorinversion. Then, W 0 containsonelinearwordand n +1, n 1or n circularwordsover A k .If W 0 isobtainedfrom W bydeletion(insertion)thenbythedenitionofdeletion (insertion)operationthenumberofsymbolsin W and W 0 isthesameandevery symbolfrom A thatappearsin W appearsin W 0 .Therefore,oneof M i or M i appearsexactlyoncein W 0 forevery i 2f 1 ; 2 ;:::;k g andoneof I j or I j occurs exactlyoncein W 0 ,foreach j 2f 1 ; 2 ;:::;k +1 g .Hence, W 0 isanassemblyword. If W 0 isobtainedfrom W byinversion,then W 0 = f w 0 0 ; [ w 1 ] ;:::; [ w n ] g ,where w 0 0 = v 1 v 2 v 3 and w 0 = v 1 v 2 v 3 .Eachsymbol i thatappearsin v 2 isreplacedby i in W 0 ,whiletherestofthesymbolsareunchanged.Therefore,one of M i or M i 114

PAGE 127

appearsexactlyoncein W 0 forevery i 2f 1 ; 2 ;:::;k g andoneof I j or I j occurs exactlyoncein W 0 ,foreach j 2f 1 ; 2 ;:::;k +1 g .Hence, W 0 isanassemblyword. (ii)Let W = f w 0 ; [ w 1 ] ;:::; [ w n ] g beanassemblywordoveralphabet A k .Suppose w 0 iswrittenas w 0 = a 1 a i w n +1 a i +1 a s andlet w 0 0 = a 1 a 2 a s .Then theword W 0 = f w 0 0 ; [ w 1 ] ;:::; [ w n ] ; [ w n +1 ] g isobtainedfrom W bydeletion. Since w 0 0 = a 1 a 2 a s and w 0 = a 1 a i w j a i +1 a s for j = n +1theassemblyword W = f w 0 ; [ w 1 ] ;:::; [ w n ] g isobtainedfrom W 0 byinsertion.Onemay similarlyshowthatif W 0 isobtainedbyinsertionfrom W ,then W isobtained from W 0 bydeletion. Next,supposethat w 0 iswrittenas w 0 = v 1 v 2 v 3 ,where v 1 ;v 2 and v 3 arewords over A k ,andlet w 0 0 = v 1 v 2 v 3 .Thentheword W 0 = f w 0 0 ; [ w 1 ] ;:::; [ w n ] g isobtained from W byinversion.Furthermore, w 0 = v 1 v 2 v 3 where v 2 = v 2 .Hence W isalso obtainedfrom W 0 byinversion. 2 Remark7.2.3 Theoperationsinsertionanddeletionareinversetoeachot her. Example7.2.4 Let W = f M 3 I 1 M 4 I 2 M 2 I 3 M 1 g beanassemblywordoveralphabet A 4 .Theword W 0 = f M 3 M 4 I 2 M 2 I 3 M 1 ; [ I 1 ] g isanassemblywordthatis obtainedfrom W bydeletion,andtheword M 00 = f M 3 I 1 M 4 I 2 M 2 M 1 I 3 g isobtainedfrom W byinversion.Alsonotethattheword W canbeobtainedfrom W 0 byaddition. 7.3OrderingofAssemblyWords Denition7.3.1 Givenanalphabet A k ,wedeneabinaryrelation\ "onthe setofassemblywords A k over A k asfollows: W W 0 ifthereisasequence W = W 0 ;W 1 ;:::;W h = W 0 ofassemblywordssuchthatforeach i ( i =1 ;:::;h ), W i isobtainedfrom W i 1 byasingleoperationofdeletion,insertionorinversion, dg(W i 1 ) dg(W i )anddg(W) < dg(W 0 ). Forinstance, W W 0 for W and W 0 denedintheExample7.2.4. Proposition7.3.2 Therelation isastrictpartialorder. 115

PAGE 128

Proof. Theirrerexivityandantisymmetrypropertiesof followfromtheirrerexivityandantisymmetrypropertiesoftherelation < denedforthedegree oftheassemblywords.Inotherwords, < isastrictpartialorderonnon-negative integers.If W W 0 and W 0 W 00 ,thenitisclearthatdg(W) < dg(W 00 ). Let 1 2 m and 1 2 n besequencesofdeletion,insertionandinversion thattransform W into W 0 and W 0 into W 00 ,respectively.Thenthesequence 1 m 1 n transforms W into W 00 .Therefore, W W 00 ,i.e. istransitive.Since isirrerexive,antisymmetricandtransitive,itisastrict partial order. 2 Notethat,foragivenalphabet A k ,theposet( A k ; )isboundedwithassembly wordsofdegree0asminimalelementsandassemblywordsofde gree k 1as maximalelements.Example7.3.3 Let W = f M 3 I 1 M 4 I 2 M 2 I 3 M 1 g beanassemblywordover A 4 Then W !f M 3 M 4 I 2 M 2 I 3 M 1 ; [ I 1 ] g!f M 3 M 4 I 2 M 2 M 1 I 3 ; [ I 1 ] g! !f I 2 M 4 M 3 M 2 M 1 I 3 ; [ I 1 ] g : Also, W !f M 3 I 1 M 4 I 2 M 2 M 1 I 3 g!f M 3 M 4 I 2 M 2 M 1 I 3 ; [ I 1 ] g : Notethat W isa minimalwordand W 0 = f I 2 M 4 M 3 M 2 M 1 I 3 ; [ I 1 ] g isamaximalword. Denition7.3.4 Let( A k ; )betheposetoveranalphabet A k .An assembly strategy isalinearlyorderedsubsetof( A k ; )thatcontainsamaximalanda minimalelement. InExample7.3.3,therstsubsetisanassemblystrategy,bu tthesecondis not. 7.4AssemblyGraphsandAssemblyWords Both,smoothingstrategiesintroducedinSection6.3andas semblystrategiesintroducedinSection7.3modelthemicronucleargenerearran gements.Inthis sectionwestudyhowassemblywordsandassemblystrategies arerelatedtoassemblygraphsandsmoothingstrategies. 116

PAGE 129

Letbeadirectedassemblygraphandlet f C 1 ;:::C n g bethesetofall transversecomponentsin.Eachcomponent C i for i 2f 1 ; 2 ;:::n g isuniquely determinedbytransversepath r i thatcontainseveryedgeof C i exactlyonce.Let r i =( v i 0 ;e i1 ;v i 1 ;e i2 ;:::;e in ;v i n i )if C i isacomponentwithendpoints v i 0 and v i n ,or r i =( v i 0 ;e i1 ;v i 1 ;e i2 ;:::;e in i ),if C i hasnoendpointswith4-valentvertices v i 0 and edges e in 2 E ( v i 0 ).Weassignalinearword w r i = e i1 e i2 e in to r i if C i iscomponent withtwoendpoints,oracircularword[ w r i ]=[ e i1 e i2 e in ]if C i hasnoendpoints. Werefertothesetofwords W = f w r i j i 2f 1 ; 2 ;:::;n gg asthe phraseof .Let beadirectedassemblygraphthatcontainsexactlyonetran sversecomponent withtwobasepointsand n componentswithnoendpoints.Then,thephrase W of,consistsofonelinearwordand n circularwords.Denoteby W 1 thesetofall phrasesthatcontainexactlyonelinearword.Therewriting operationsinsertion, deletionandinversiondenedonassemblywordscanbealsod enedonphrases from W 1 Proposition7.4.1 Let beadirectedassemblygraphthatcontainsexactlyone transversecomponent C withtwoendpoints.If ~ f v g istheassemblygraphobtained from byasmoothingofavertex v 2 V ( C ) ,then W ~ f v g isobtainedfrom W by atmosttwoapplicationsofdeletion,insertionorinversio n. Proof: Letbeadirectedassemblygraph.Let f C;C 1 ;:::;C s g bethesetof alltransversecomponentsinsuchthat C istheonlytransversecomponentwith twoendpoints i and t .Then, W = f w r ;w r 1 ;:::;w r s g isthephraseof.Let v beavertexin C .Therearetwopossibilities: (i)Thevertex v belongsonlyto C (ii)Thereiscomponent C i withnoendpointssuchthat v belongstoboth C and C i ThesesituationsaredepictedinFigure7.1(A)and(B),resp ectively.Inthe rstcasetherigidvertexis( v; ( e k ;e m ;e k +1 ;e m +1 ))forsomenaturalnumbers1 k m n suchthat r =( i;e 1 ;v 1 ;e 2 ;:::;e k ;v;e k +1 ;:::;e m ;v;e m +1 ;:::;e n ;t )is 117

PAGE 130

i t i t v v C C C 1(A) (B)ekek+1emem+1ekek+1e 1me 1m+1 Figure7.1:(A)Therigidvertex v belongstosingletransversecomponent C .(B)The rigidvertex v belongstotwotransversecomponents C and C1.thetransversepaththatcorrespondsto C .Bysmoothingof v ,onlythecomponent C ischangedinasdepictedinFigure7.2.Ifthesmoothingisp arallelthe transversepathof C becomes r =( i;e 1 ;v 1 ;e 2 ;:::;e k ;v; e m ;:::; e k +1 ;v;e m +1 ;:::;e n ;t ) andthereforethephraseof ~ f v g is W ~ f v g = f e 1 e 2 e k e m e k +1 e m +1 e n ;w r 1 ;:::;w r s g : Thephrase W ~ f v g isobtainedfrom W byinversion.Bynonparallelsmoothingof v thetransversecomponent C isdividedintotwotransversecomponentsin W ~ f v g ; oneofthemislinerwithtransversepath( i;e 1 ;v 1 ;e 2 ;:::;e k ;v;e m +1 ;:::;e n ;t )and theotheroneiscircularwithtransversepath( v;e k +1 ;:::;e m ;v ).Thereforethe phraseof ~ f v g is W ~ f v g = f e 1 e 2 e k e m +1 e n ; [ e k +1 e m ] ; [ w r 1 ] ;:::; [ w r s ] g : Thephrase W ~ f v g isobtainedfrom W bydeletion. Inthecase( ii )therigidvertex( v; ( e k ;e 1m ;e k +1 ;e 1m +1 ))belongstotwotrans118

PAGE 131

i t i t C C C1ekek+1emem+1ekek+1e 1me 1m+1 i t i t C C C1ekek+1emem+1ekek+1e 1me 1m+1 Figure7.2:Parallelsmoothingofvertex v (left)andnonparallelsmoothingofvertex v (right).versecomponents C andsay C 1 (seeFigure7.1(B)).Thetransversecomponents C and C 1 aredeterminedbythetransversepaths r =( i;e 1 ;v 1 ;e 2 ;:::;e k ;v;e k +1 ;:::;e n ;t ) and r 1 =( v;e 1m +1 ;:::;e 1m ;v ) ; respectively.If ~ f v g isobtainedbyparallelsmoothingof v ,thenthetransverse componentsof W ~ f v g are C 2 ;:::;C s andacomponentdeterminedbythetransverse path ( i;e 1 ;v 1 ;e 2 ;:::;e k ;v; e 1m ;:::; e 1m +1 ;v;e k +1 ;:::;e n ;t ) : Thephraseof ~ f v g is W ~ f v g = f e 1 e 2 e k e 1m e 1m +1 e k +1 e n ; [ w r 2 ] ;:::; [ w r s ] g : 119

PAGE 132

Itisclearthat W ~ f v g isobtainedfrom W byinsertionandinversion. If ~ f v g isobtainedfrombynonparallelsmoothingofvertex v ,thenone cansimilarlyshowthat W ~ f v g canbeobtainedfrom W byapplicationofsingle insertionoperation. Ineachcase W ~ f v g canbeobtainedfrom W byapplicationofinsertion,deletionorinsertionoperation. 2 Notethatthephraseofeachdirectedsimplerealizableasse mblygraphcanbe viewedasanassemblyword.Remark7.4.2 Letbeadirectedsimplerealizableassemblygraphwithtwo basepoints i and t .Thenmodelssomemicronucleargene G .Let G beformally representedasastringofthefollowingform: (*) G = I 1 N i 1 I 1 N i 2 I k N i k I k +1 forsomepositiveinteger k;i 1 ;:::;i k 2f 1 ; 2 ;:::;k g (for i j 6 = i s if j 6 = s ),where N i j isathreesymbolwordintheset f ( i j ) M i j ( i j +1) ; ( i j ) M i j ( i j +1) g forevery j 2f 1 ; 2 ;:::;k g Thencanbeconstructedusingtheconstructiondescribedi nSection6.1, withonlyonedierence;insteadofaddinganedgefromverte x i k backtothe basepoint i ,weaddanedgefromthevertex i k tothebasepoint t .Theedgesof arelabeledbyeither I j or M j ( or M j ).Let w G = I 0 w i 1 I 1 w i 2 I k 1 w i k I k ,where w i j = M i j if N i j =( i j ) M i j ( i j +1)and w i j = M i j if N i j = ( i j ) M i j ( i j +1).Then thesingletonset f w G g isthephraseof.Inaddition, w G isawordoveralphabet A k ,suchthatforeachsymbol 2 A k ,either or appearsexactlyoncein w G Therefore,theset f w G g isanassemblyword. Therefore,foreachsimplerealizableassemblygraph,the reisanassembly word w thatcorrespondsto.If r =( v 0 ;e 1 ;v 1 ;e 2 ;:::;e n ;v n )istheEulerian transversepathofthenbysuitablelabelingoftheedgesof w = f e 1 e n g canbeviewedasanassemblywordthatcorrespondsto.Proposition7.4.3 Let beadirectedsimplerealizableassemblygraphwithtwo endpointsandaHamiltonianpolygonalpath r .Ifthenestedsequence ; = S 0 120

PAGE 133

S 1 S 2 S k = V () isasmoothingstrategyfor withrespectto r ,then thesequence W ;W ~ ( r;S 1 ) ;W ~ ( r;S 2 ) ;:::;W ~ ( r;S k ) isanassemblystrategy. Proof :Letbedirectedsimplerealizableassemblygraphwithtwo endpoints. Then,byRemark7.4.2theedgesofcanbelabeledbyeither I j or M j ( M j ),such thatthephrase W ofisanassemblywordoveralphabet A k .ByProposition 7.4.1,eachof W ~ ( r;S i ) canbeobtainedfrom W ~ ( r;S i 1 ) for i 2f 1 ; 2 ;:::;k g by applyingasequenceofinsertion,deletionandinversionop erations.Thus,by Lemma7.2.2eachofthewords W ~ ( r;S 2 ) ;:::;W ~ ( r;S k ) isanassemblyword. Foreachvertex v intherearetwoneighboringedgesincidentto v labeled by M i and M i 1 forsome i 2f 1 ; 2 ;:::;k g .Sinceissmoothedwithrespectto Hamiltonianpolygonalpath r ,smoothingthevertex v resultsinanedgewith label M i 1 M i Therefore,dg(W ~ ( r; S i 1 ) ) < dg(W ~ ( r; S i ) )forevery i 2f 1 ; 2 ;:::;k g .Hence,the sequence W ;W ~ ( r;S 1 ) ;W ~ ( r;S 2 ) ;:::;W ~ ( r;S k ) isassemblystrategy. 2 7.5ApplicationofAssemblyWordstoExperimentalData In[39],thegeneassemblyof O.Trifallax actinIgenewasdiscussed.ThemicronuclearactinIgeneisschematicallyrepresentedonFigure7. 3.Partiallyassembled moleculesweredetectedandarepresentedaspossibleinter mediates.Basedon theseresults,weproposepossiblepathwaysfordetangling O.Trifallax actinI gene.InFigure7.4thosepathwaysaredescribedintermsofa ssemblygraphsand smoothings. MDS 3 MDS 4 MDS 6 MDS 5 MDS 7 MDS 9 MDS 10 MDS 1 MDS 8 MDS 2 IES 1 2 3 456 7 8 9 Figure7.3:SchematicrepresentationoftheactinImicronu cleargenein O.Trifallax .TheassemblygraphrepresentationofactinIisgivenonFigu re7.4(A).Note 121

PAGE 134

M1 M2 M3 M4 M5 M6 M7 M8 M9M10 2 3 4 5 6 7 8 9 10 I1 I2 I3 I4 I5 I6 I7I8 I9 M1 M2 M3 M4 M5 M6 M7 M8 M9M10 2 4 5 6 7 8 9 10 I1 I2 I3 I4 I5 I6 I7I8 I9 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 2 4 5 6 8910 I1 I2 I3 I4 I5 I6 I7I8 I9 M1 M2 M3 M4 M5 M6 M7 M8 M9M10 2 5 6 8910 I1 I2 I3 I4 I5 I6 I7I8 I9 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 2 10I2 I3 I4 I5 I6 I7I8 I9 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 2 6 7 8 9 10 I1 I2 I3 I4 I5 I6 I7I8 I9 (A) (B)(C) (D) (E) (F) Figure7.4:Twosmoothingstrategies((A)(B)(D)(E)(F)and (A)(B)(C)(F))asmodels oftwodierentpathwaysforactinImacronucleargeneassem bly.thatthepath r : M 1 M 2 M 9 M 10 isaHamiltonianpolygonalpath.The graphs(A),(B),(C),(F)determineonepathway,and(A),(B) ,(D),(E)another. Therstcasedenesthesmoothingstrategy f 3 g f 7 g f 4 g f 5 ; 6 ; 8 ; 9 g ,andthe laterone f 3 g f 4 ; 5 g f 6 ; 7 ; 8 ; 9 g .Bothsmoothingstrategiesarenotsuccessful. Next,weexplaintheconnectionbetweenthepossibleinterm ediatesandthe proposedpathwaysforassemblyofactinIin O.trifallax 1.Therststepwouldbeanalignmentofthepairofpointers, thatoccuratthe endofMDS 2 andatthebeginningofMDS 3 ,followedbytheirrecombination (seeFigure7.5(A)).Theresultingmoleculeisanitermedia tethatcoresponds 122

PAGE 135

totheoneinFigure7.5(B).Intermsofassemblygraphs,this stepis f 3 g smoothingwithrespectto r 2.Thealignmentofpointer7isshowninFigure7.5(C).After recombination, MDS 6 andMDS 7 aresplicedincorrectorder,whileweassumethatMDS 5 isexcisedasapartofcircularmolecule.Theresultinginte rmediatecorrespondstotheonegiveninFigure7.5(D),whichdoesnotconta inMDS 5 and thereisnoinformationaboutthepositionofMDS 5 atthispoint.Thisstep wouldbe f 7 g -smoothingwithrespectto r 3.InthisstepIES 1 isexcised,andMDS 3 andMDS 4 arejoinedthroughrecombinationofpointer4.Theresultingmoleculecorrespondst otheintermediate showninFigure7.5(F). 4.Thenextstepinvolvesanintermolecularrecombination. Namely,theexcised circularmoleculethatcontainsMDS 5 containsbothpointers5and6.Pointer 5wouldalignwithpointer5whichisattheendofMDS 4 andpointer6would alignwithpointer6whichisatthebeginningofMDS 6 .Atthesametime pairsofpointer8and9align.Allofthemrecombineresultin ginoneexcised circularmoleculecomposedofIESsonlyandanintermediate giveninFigure 7.5(H).Intermsofassemblygraphs,thisstepis f 5 ; 6 ; 8 ; 9 g -smoothing. ThesecondpathwayforassemblyofactinIin O.trifallax maybedescribed throughthefollowingsteps. 1.Therststepisexactlythesameasinthepreviouslydiscu ssedpathway. TheintermediateisinFigure7.6(B). 2.Thepairofpointers4(attheendofMDS 3 andthebeginningofMDS 4 )and 5(attheendofMDS 5 andthebeginningofMDS 6 )willalignandrecombine, whichisdepictedinFigure7.6(C).Theoutcomeis:twocircu larmolecules andanintermediatemolecule,whoseMDS-IESstructureissh owninFigure 7.6(D).OneofthecircularmoleculescontainsMDS 6 andtheotheroneis excisedIES 1 .Intermsofassemblygraphsthisstepis f 4 ; 5 g -smoothing. 123

PAGE 136

3.Inthenextstep,anintermolecularrecombinationisperf ormedbetweenthe circularmoleculethatcontainsMDS 6 andtheintermediatemolecule(see Figure7.6(E)).Theyrecombinethroughpointers6and7.Poi nters8and 9recombineaswell.TheresultingintermediateisgiveninF igure7.6(F). Thisstepcanbemodeledby f 6 ; 7 ; 8 ; 9 g -smoothing. Inthelistofpossibleintermediates,noneofthemcontainM DS 10 ,sowecannot trackitspositionandcannotdeterminewhenthepointer10i sbeingrecombined. Thelaststepintheassemblyforbothpathwaysisexpectedto berecombinationofpointer2.Withthat,MDS 1 wouldbeinvertedandcorrectlyjoined withMDS 2 .Inbothpathways,atcertainpoint,acircularmoleculecon taining anMDSwasexcised.Thissuggeststhatthemotivationforthe notionofsuccessfulsmoothingstrategies,namelytheassumptionthateachi ntermediatemolecule containsallMDSsmaynotapplyingeneral,i.e.,someassemb lystrategiesmay beintermolecular. Wealsouseassemblystrategiestoprovethatthetworearran gementpathways fordescramblingthe O.Trifallax actinIgenedescribedabovearetheoreticallyall thepossiblestrategies.Thesetofpossibleintermediatem oleculesobservedin[39] isgivenbelow. a = M 1 I 8 M 2 M 3 I 1 M 4 I 2 M 6 I 3 M 5 I 4 M 7 I 5 M 9 ; b = M 1 I 8 M 2 M 3 I 1 M 4 I 2 M 6 M 7 I 5 M 9 ; c = M 1 I 8 M 2 M 3 M 4 M 5 M 6 M 7 M 8 M 9 ; d = M 8 I 9 M 1 I 8 M 2 M 3 M 4 M 5 I 4 M 7 I 5 M 9 ; e = M 8 I 9 M 1 I 8 M 2 M 3 M 4 I 2 M 6 M 7 I 5 M 9 ; f = M 8 I 9 M 1 I 8 M 2 M 3 I 1 M 4 I 2 M 6 I 3 M 5 I 4 M 7 I 5 M 9 : Basedonthisdata,weconstructassemblywordsthatmodelth emicronuclear, 124

PAGE 137

macronuclearandintermediatemolecules.Let w 0 = M 3 I 1 M 4 I 2 M 6 I 3 M 5 I 4 M 7 I 5 M 9 I 6 M 10 I 7 M 2 I 8 M 1 I 9 M 8 ; A = f M 8 I 9 aI 6 M 10 I 7 g ; B = f M 8 I 9 bI 6 M 10 I 7 ; [ I 3 M 5 I 4 ] g ; C = f I 5 I 9 cI 6 M 10 I 7 ; [ I 1 ] ; [ I 3 I 2 I 4 ] g ; D = f dI 6 M 10 I 7 ; [ I 2 M 6 I 3 ] ; [ I 1 ] g ; E = f eI 6 M 10 I 7 ; [ I 3 M 5 I 4 ] ; [ I 1 ] g ; F = f fI 6 M 10 I 7 g ; w t = f I 5 I 9 I 8 M 1 M 2 M 3 M 4 M 5 M 6 M 7 M 8 M 9 M 10 I 7 ; [ I 3 I 2 I 4 ] ; [ I 1 ] ; [ I 6 ] g beassemblywordsoveralphabet A 10 .Then, w 0 modelsthescrambledmicronuclearactinIgene, w t modelstheassembledmacronuclearactinIgeneandtheassemblywords A;B;C;D;E;F modeltheintermediatemolecules a;b;c;d;e;f ,respectively.Weconstructtheassemblywords A;B;C;D;E;F ,sothat a;b;c;d;e;f respectivelyarefactorsoftherespectivelinearelement. Notethat w 0 isminimal and w t ismaximalwithrespecttothepartialorder Proposition7.5.1 Theassemblywords A through F satisfythefollowingproperties.(1) w 0 v for v 2f A;B;C;D;E;F g (2) v w t for v 2f A;B;C;D;E;F g (3)Therearetwoassemblystrategiesthathaveminimalword w 0 ,maximalword w t ,andconsistofassemblywordsfrom f A;B;C;D;E;F g Proof. (1)Weshowthat w 0 A ,byconstructingasequenceofassemblywords suchthateachwordinthesequencehasgreaterorequaldegre ethanthepreceding wordandisobtainedbyinsertion,deletionandinversionfr omitspredecessor.Let w 0 = f M 3 I 1 M 4 I 2 M 6 I 3 M 5 I 4 M 7 I 5 M 9 I 6 M 10 I 7 M 2 I 8 M 1 I 9 M 8 g ; w 1 = f M 3 I 1 M 4 I 2 M 6 I 3 M 5 I 4 M 7 I 5 M 9 I 6 M 10 I 7 M 1 ; [ M 2 I 8 M 1 I 9 M 8 ] g ; 125

PAGE 138

w 2 = f M 2 I 8 M 1 I 9 M 8 M 3 I 1 M 4 I 2 M 6 I 3 M 5 I 4 M 7 I 5 M 9 I 6 M 10 I 7 g ; w 3 = f M 8 I 9 M 1 I 8 M 2 M 3 I 1 M 4 I 2 M 6 I 3 M 5 I 4 M 7 I 5 M 9 I 6 M 10 I 7 g : Foreach i 2f 1 ; 2 ; 3 g theassemblyword w i isobtainedfrom w i 1 byasingleinsertion,deletionorinversionoperation.Furthermo re,dg( w i 1 ) dg( w i ), dg( w 0 ) < dg( w 3 )and A = w 3 .Therefore, w 0 A Similarly,onecanshowthat w 0 v for v 2f A;B;C;D;E;F g (2)Weonlyprovethat C w t ,sincetherestofthecasescanbesimilarly proven.Let v 0 = f I 5 I 9 M 1 I 8 M 2 M 3 M 4 M 5 M 6 M 7 M 8 M 9 I 6 M 10 I 7 ; [ I 1 ] ; [ I 3 I 2 I 4 ] g ; v 1 = f I 5 I 9 I 8 M 1 M 2 M 3 M 4 M 5 M 6 M 7 M 8 M 9 I 6 M 10 I 7 ; [ I 1 ] ; [ I 3 I 2 I 4 ] g ; v 2 = f I 5 I 9 I 8 M 1 M 2 M 3 M 4 M 5 M 6 M 7 M 8 M 9 IM 10 I 7 ; [ I 1 ] ; [ I 3 I 2 I 4 ] ; [ I 6 ] g : Theassemblyword v 1 isobtainedfrom v 0 = C byinversionand v 2 = w t is obtainedfrom v 1 bydeletion.Inaddition,dg( C ) < dg( v 1 ) < dg( w t ).Therefore, C w t (3)Theassemblywords F and A arethesame.Hence,weconsideronly A Thedegreesoftheassemblywords w 0 w t and A through E are: dg(w 0 )=0 ; dg(A)=1 ; dg(B)=2 ; dg(C)=7 ; dg(D)=3 ; dg(E)=3 ; dg(w t )=9 : Accordingtothedegreesoftheassemblywordstherearetwos equencesthat arepossiblecandidatesforassemblystrategies.Theyare: w 0 A B D C w t ; w 0 A B E C w t : By(1)and(2),wehave w 0 A and C w t .Onecaneasilycheckthat 126

PAGE 139

A B B E and B C .Ontheotherhand, B 6! D .Toshow B 6! D assumethecontrary.If B D thenthereisasequenceofassemblywords B = W 0 ;W 1 ;:::;W h = D suchthatforeach i ( i =1 ;:::;h ), W i isobtainedfrom W i 1 byasingleoperationofdeletion,insertionorinversion,a nddg( W i 1 ) dg( W i ).Sincetheassembledsegment M 6 M 7 @ B and M 6 M 7 6 @ D ,thereissome j 2f 2 ; 3 ;:::;h g suchthat M 6 M 7 @ W j 1 and M 6 M 7 6 @ W j .Theassemblyword W j isobtainedfrom W j 1 bysingledeletion,inversionorinsertionoperation,so bythestructureof B and D thatisonlypossibleifdg( W j 1 ) > dg( W j ).Thisis acontradictionandthus, B 6! D Hencethefollowingtwoassemblystrategiesaretheonlyass emblystrategies thathaveminimalword w 0 ,maximalword w t ,andconsistofassemblywordsfrom f A;B;C;D;E;F g : w 0 A B D C w t ; w 0 A B E C w t : 2 127

PAGE 140

M10 M8 I9M1I8 M2 M3 I1 M4 I2 M6 I3 M5 M9 I6 I4 M7 I5 I7 M10 M8 I9M1I8 M2 M3 I1 M4 I2 M6 I3 M5 M9 I6 I4 M7 I5 I7 M10 M8 I9M1I8 M2 M3 I1 M4 I2 M6 I3 M5 M9 I6 I4 M7 I5 I7 M10 M8 I9M1I8 M2 M3 I1 M4 I2 M6 I3 M5 M9 I6 I4 M7 I5 I7 M10 M8 I9M1I8 M2 M3 I1 M4 I2 M6 I3 M9 I6 M7 I5 I7 M5 I4 M10M8M1M2M3M4M6M9M7M5I9I8I1I2I3I6I5I7I4 M10 M5 M9 I6 M7 I9 I7 M6 M4M3M1I8M2M8 I3I4I2 I5 I9 M1 I8 M2 M3 M4 M5 M6 M7 M8 M9 I6 M10 I7I5(A) (B) (C) (D) (E) (F) (G) (H) Figure7.5:FirstproposedpathwayforunscramblingactinI genein O.trifallax ,based onthepossibleintermediatesobservedin[39].128

PAGE 141

M10 M8 I9M1I8 M2 M3 I1 M4 I2 M6 I3 M9 I6 M7 I5 I7 M5 I4 M10M8M1M2M3M4M6M9M7M5I9I8I1I2I3I6I5I7I4 I5 I9 M1 I8 M2 M3 M4 M5 M6 M7 M8 M9 I6 M10 I7 M10 M8 I9M1I8 M2 M3 I1 M4 I2M6I3 M9 I6 M7 I5 I7 M5 I4 M10 M8 I9M1I8 M2 M3 I1 M4 M6 I3 M5 M9 I6 I4 M7 I5 I7 I2 M10M5M9 I6 M7 I5 I7 M6 M4M3M1I8M2M8 I3I4I2 I9 (A) (B)(C) (D) (E) (F) Figure7.6:Secondproposedpathwayforunscramblingactin Igenein O.trifallax ,based onthepossibleintermediatesobservedin[39].129

PAGE 142

8Conclusion Inthisdissertation,wepresentedseveralmathematicalap proachestostudygene rearrangementinciliates.Weappliedvarioustechniquesf romlanguagetheory, graphtheoryandknottheorywhichmayleadtobetterunderst andingofthese bio-processes.Ourmathematicalmodelsalsooermotivati onsfornewbiological experimentsandnewmathematicalresearchdirections. Althoughmotivatedandappliedtogenerearrangementsinci liates,theexplicitmodelsforRNA-guidedDNArecombinationdiscussedi nChapter4serve aspossiblemodelsforhomologousDNArecombinationingene ral.TheassumptionthatRNAmoleculesareinvolvedinguidingrecombinati onshasbeenjustied in[42].However,additionaldetailsneedtobefurthertest edexperimentally,such aswhenandwhereexactlythecutsofthestrandsoccurintheb ranchmigrations, whattypeofenzymesareinvolved. ThemodelsforRNA-guidedassemblyproposedin[2]aretheba ckgroundfor thenewcombinatorialmodelforgeneassemblybasedonspati algraphsthatwas presentedinChapter5.Westudiedgraphswithrigidvertice s,calledassembly graphs,andintroducedanumberofnewmathematicallyrelev antnotions,propertiesandcharacteristicsofassemblygraphssuchasassembl ynumbers,polygonal paths,successfulsmoothings. Apolygonalpathisdenedasapathwhichmakes\90 turn"ateachvertex anditwasintroducedasamodelforasingleMACgeneintheass emblygraph. Asetofpolygonalpathswhoseunioncontainseachvertexoft heassemblygraph iscalledHamiltonianset.Theassemblynumberisthecardin alityoftheminimal Hamiltoniansetinanassemblygraph,whichcanmodelthemin imalnumberof 130

PAGE 143

genesthatcanresultfromtheDNArearrangement. Weprovedthatthereisanassemblygraphwithassemblynumbe r n ,forevery naturalnumber n .Anassemblygraphwithassemblynumberonerepresentsa singlescrambledgene.Hence,combinatoriallydescribing thegraphswithassemblynumberonewillbeconsideredinafutureresearch.Thisp roblemcanbe extendedtoaquestofcharacterizingtheassemblygraphswi thassemblynumber n i.e.,theassemblygraphsthatmodelmultiplescrambledgen es. Inthisstudywemainlyconcentrateonthepropertiesofsimp leassembly graphs,whichareassemblygraphscomposedofasingletrans versecomponent.A simpleassemblygraphmodelsasinglemicronuclearDNAmole cule.Sincethere areexperimentalresultsshowingthatsomegenescanappear onmorethanone lociinthemicronucleus[34],extendingthismodeltoinclu demulticomponent (nonsimple)assemblygraphswouldbeanaturalgeneralizat ion. Inaddition,weconsideredtherelationshipbetweenthenum berofverticesin theassemblygraphanditsassemblynumber.Wedenedaminim umrealization numberof n R min ( n )tobethenumberofverticesoftheminimalassemblygraph withassemblynumber n .Someboundsforthisnumberweredetermined.For example,weprovedthat R min ( n ) 3 n 2and R min ( n )
PAGE 144

beknottedorlinked.Weshowedthatthereisalwaysanembedi ngofanassembly graphwithxedHamiltonianset,suchthatthesmoothedgrap hwithrespect totheHamiltoniansetisanunknot/unlink.However,theexp erimentalresults showthatthereareexcisedcircularmoleculesaftertherec ombinationwhichare removed,soanaturalquestionforfurtheranalysiswouldbe :whetheranassembly graphcanbeembeddedin R 3 suchthatthesmoothedgraph ~ r isunlinkedfor anyHamiltoniansetofpaths r InChapter6,successfulsetsofverticesareproposedforst udyingtherearrangementstrategies.Thesmoothingofallverticesfromas uccessfulsetinan assemblygraphwithHamiltonianpolygonalpath r resultsinanassemblygraph thatcontains r inasingletransversecomponent.Undertheassumptionthat the MDSsarenotdispersedondierentmoleculesduringthegene rearrangement, thesuccessfulsetsmodelsimultaneousrecombinationofas etofpointers.This isamotivationforanexperimentthatwillcheckourassumpt ionthattheMDSs arepartofasinglemoleculethroughoutthewholegenerearr angement.Wecharacterizethesuccessfulsmoothingsthroughminimalcomple mentaryHamiltonian sets.Weplantoincorporateacharacterizationofsuccessf ulsmoothingstond anecientalgorithmfordeterminingthesuccessfulsets. Thestudyoftheassemblywords,introducedinChapter7,isi nthebeginning stageandfurtherresultsareexpected.Usingthemodelofas semblywordsand thepartialorderonthesetofassemblywords,onecancharac terizeallpossible pathwaysthatmayappearduringunscramblingagene.Additi onally,predicting theintermediatemoleculesintheDNArearrangement,theor eticallyidentifying theintermediatestepsanddeterminingallassemblystrate gieswillbeconsidered infutureresearch. Themodelsforgenerearrangementinciliatesproposedinth isdissertationapplytoolsfromdierentbranchesofmathematicssuchaslang uagetheory,graph theoryandknottheory.Thenotionsofassemblygraph,polyg onalpath,assemblynumberandassemblywordintroducedherepresentaviewo fwellknown mathematicalobjectsfromdistinctiveperspective.Inadd ition,ourmathematical 132

PAGE 145

modelsmaymotivatenewbiologicalexperimentsandnewmath ematicalresearch. 133

PAGE 146

References [1]B.Alberts,D.Bray,K.Hopkin,A.Johnson,J.Lewis,M.Ra ,P.Walter, K.Roberts,EssentialCellBiology,GarlandScience(2004) [2]A.Angeleska,N.Jonoska,M.Saito,L.F.Landweber,RNAguidedDNA assembly, J.ofTheoreticalBiology 248:4 (2007)706{720. [3]A.Angeleska,N.Jonoska,M.Saito,L.F.Landweber,Stra tegiesforRNAguidedDNARecombination, AlgorithmicBioprocesses (eds.JoostN.Koket aleds.)(2009),inpress. [4]A.Angeleska,N.Jonoska,M.Saito,DNArecombinationst hroughassembly graphs, DiscreteAppliedMathematics ,inpress. [5]D.H.Ardell,C.A.Lozupone,L.F.Landweber,Polymorphi sm,recombination andalternativeunscramblingintheDNApolymerasealphage neoftheciliate Stylonychialemnae. Genetics 165 (4)(2003)1761{1777. [6]V.Bafna,P.A.Pevzner,Genomerearrangementsandsorti ngbyreversals, SIAMJ.onComputing ,(1993)148-157. [7]V.Bafna,P.A.Pevzner,Sortingbytranspositions, Proc.of6thACM-SIA MSymp.onDiscreteAlgorithms (1995). [8]G.T.Bignell,T.Santarius,J.C.Pole,A.P.Butler,J.Pe rry,E.Pleasance, C.Greenman,A.Menzies,S.Taylor,S.Edkins,P.Campbell,M .Quail,B. Plumb,L.Matthews,K.McLay,P.A.Edwards,J.Rogers,R.Woo ster,P.A. Futreal,M.R.Stratton,Architecturesofsomaticgenomicr earrangementin humancancerampliconsatsequence-levelresolution, GenomeRes. 17:9 (2007)1296-1303. 134

PAGE 147

[9]R.J.Brooker,GeneticsAnalysisandPrinciples,McGraw -Hill(2005). [10]P.J.Campbell,P.J.Stephens,E.D.Pleasance,S.O'Mea ra,H.Li,T.Santarius,L.A.Stebbings,C.Leroy,S.Edkins,C.Hardy,J.W.Teag ue,A.Menzies, I.Goodhead,D.J.Turner,C.M.Clee,M.A.Quail,A.Cox,C.Br own,R. Durbin,M.E.Hurles,P.A.W.Edwards,G.R.Bignell,M.R.Str atton,P.A. Futreal,Identicationofsomaticallyacquiredrearrange mentsincancerusinggenome-widemassivelyparallelpaired-endsequencing NatureGenetics 40 (2008)722{729. [11]W-J.Chang,S.Kuo,L.F.Landweber,Anewscrambledgene intheciliate Uroleptus Gene 368 (2006)72{77. [12]A.R.O.Cavalcanti,L.F.Landweber,Insightsintoabio logicalcomputer:detanglingscrambledgenesinciliates, Nanotechnology:ScienceandComputation (J.Chen,N.Jonoska,G.Rozenbergeds.)Springer(2006)349 {360. [13]M.Daley,I.McQuillan,N.A.Stover,L.F.Landweber,As impletopological mechanismforgenedescramblinginstichotrichousciliate s,submittedin J. ofTheoreticalBiology (2009). [14]M.Daley,O.Ibarra,L.Kari,Closurepropetiesanddeci sionquestionsof some22languageclassesunderciliatebio-operations, Theoret.Comput.Sci. 306 (2003)19-38. [15]R.Daniel,IESexcisionin Sterkiellahistriomuscorum viaahypotheticalnucleasemechanism,seniorthesis(2006)PrincetonUniversi ty. [16]J.Dassow,M.Holzer,Languagefamiliesdenedbyacili atebio-operation: hierarchiesanddecidabilityproblems, Int.J.Found.Comput.Sci. 16(4) (2005)645-662. [17]C.Dennis,A.Fedorov,E.Kas.,L.Salome,M.Grigorie v,RuvAB-directed branchmigrationofindividualHollidayjunctionsisimped edbysequence heterology, TheEMBOJournal 23 (2004)2413{2422. 135

PAGE 148

[18]Th.DobzhanskyandA.H.Sturtevant,Inversionsinthec hromosomesof DrosophiliaPseudoobscura Genetics 23(1) (1937). [19]A.Ehrenfeucht,T.Harju,I.Petre,D.M.Prescott,G.Ro zenberg, Computing inLivingCells ,Springer(2005). [20]A.Ehrenfeucht,T.Harju,G.Rozenberg,Geneassemblyt hroughcyclicgraph decomposition, TheoreticalComputerScience 281 (2002)325{349. [21]O.Garnier,V.Serrano,S.Duharcourt,E.Meyer,RNAmed iatedprogrammingofdevelopmentalgenomerearrangementsin Parameciumtetraurelia Mol.cellBiol. 24 (2004)7370{7379. [22]A.Ehrenfeucht,D.M.Prescott,G.Rozenberg,Computat ionalaspectsofgene (un)scramblinginciliates.In:LandweberLF,WinfreeE(ed s.) Evolutionas Computation ,Springer,BerlinHeidelbergNewYork(2001)216-256. [23]A.Ehrenfeucht,I.Petre,D.M.Prescott,G.Rozenberg, Universalandsimple operationsforgeneassemblyinciliates,In:V.Mitrana,C. Martin-Vide (eds.) Words,Sequences,Languages:WhereComputerScience,Biol ogyand LinguisticsComeAcross ,KluwerAcademic,Dortrecht(2001)329{342. [24]T.Harju,I.Petre,G.Rozenberg,Twomodelsforgeneass emblyinciliates, TUCSTechnicalReports 604 (2004). [25]T.Head,FormallanguagetheoryandDNA:ananalysisoft hegenerative capacityofspecicrecombinantbehaviors, Bull.Math.Biology 49 (1987) 737{759. [26]J.E.Hopcroft,J.D.Ullman, Introductiontoautomatatheory,languages,and computation ,Addison-Wesley,1979. [27]A.R.Jones,TheCiliates,St.Martin'sPress(1974).[28]N.Jonoska,M.Saito,Algebraicandtopologicalmodels forDNArecombinant processes DevelopmentsinLanguageTheory (C.S.Calude,E.Calude,M.J. Dinneeneds.)Springer LNCS 3340 (2004)49|62. 136

PAGE 149

[29]S.A.Juranek,S.Rupprecht,J.Postberg,H.J.Lipps,sn RNAandHeterochromatinFormationAreInvolvedinDNAExcisionduringMacronu clearDevelopmentinStichotrichousCiliates Eukaryot.Cell 4 (2005)1934{1941. [30]L.Kari,L.F.Landweber,Computationalpowerofgenere arrangement, DNA BasedComputersV (E.Winfree,D.K.Giordeds.)AMS(1999)207{216. [31]L.H.Kauman,VirtualKnotTheory, EuropeanJ.Combinatorics 20 (1999) 663{690. [32]L.H.Kauman,Aself-linkinginvariantofvirtualknot s, Fund.Math. 184 (2004),135{158. [33]L.A.Klobutcher,D.M.Prescott,Thespetcialcaseofth eHypotrichs, The molecularbiologyofciliatedprotozoa ,eds.J.G.Gall,AcademicPress(1986) 111{154. [34]S.Kuo,W-J.Chang,L.F.Landweber,Complexgermlinear chitecture:two genesinterwinedontwoloci, Mol.Biol.Evol. 23 (1)(2006)4{6. [35]L.F.Landweber,L.Kari,Theevolutionofcellularcomp uting:naturessolutiontoacomputationalproblem, BioSystems 52 (1999)313. [36]LFLandweber,L.Kari,Universalmolecularcomputatio ninciliates.In: LandweberLF,WinfreeE(Eds.) EvolutionasComputation ,Springer,Berlin HeidelbergNewYork(2002)257-274. [37]L.F.Landweber,T-C.Kuo,E.A.Curtis,Evolutionandas semblyofanextremelyscrambledgene, PNAS 97 (2000)3298{3303. [38]A.Miyake,Fertilizationandsexualityinciliates, CiliatesCellsandOrganisms eds.K.Hausman,P.C.Bradbury(1996)244{283. [39]M.Mollenbeck,Y.Zhou,A.R.O.Cavalcanti,F.Jonsson, W.-J.Chang,S.Juranek,T.G.Doak,G.Rozenberg,H.J.Lipps,L.F.Landweber, Thepathway fordetanglingascrambledgene, PLoSONE 3(6):e2330(2008). 137

PAGE 150

[40]K.Mochizuki,N.A.Fine,T.Fujisawa,M.A.Gorovsky,An alysisofapiwirelatedgeneimplicatessmallRNAsingenomerearrangement in Tetrahymena Cell 110 (2002)689{699. [41]K.Murasugi,Knottheoryanditsapplications,Birkheu ser(1996). [42]M.Nowacki,V.Vijayan,Y.Zhou,T.Doak,E.Swart,L.F.L andweber,RNAtemplateguidedDNArecombination:epigeneticreprogramm ingofagenome rearrangementpathway, Nature 451 (2008)153{158. [43]J.J.Paulin,MorphologyandCytologyofciliates, CiliatesCellsandOrganisms eds.K.Hausman,P.C.Bradbury(1996)1{41. [44]D.Prescott,Theevolutionaryscramblinganddevelopm entalunscrambling ofgermlinegenesinhypotrichousciliates, NucleicAcidResearch 27 no.5 (1999). [45]D.M.Prescott,TheDNAofCiliatedProtozoa, MicrobiologicalReviews 58 no.2(1994)233{267. [46]D.M.Prescott,A.Ehrenfeucht,G.Rozenberg,Molecula roperationsforDNA processinginhypotrichousciliates. Europ.J.Protistology 37 (2001)241-260. [47]D.M.Prescott,A.Ehrenfeucht,G.Rozenberg,Template -guidedrecombinationforIESelim-inationandunscramblingofgenesinstich otrichousciliates, J.ofTheoreticalBiology 222 (2003)323{330. [48]D.M.Prescott,A.F.Greslin.ScrambledactinIgeneint hemicronucleusof Oxytrichanova DevGenet. 13 (1)(1992)66{74. [49]D.M.Prescott,A.Ehrenfeucht,G.Rozenberg,Template -guidedrecombinationforIESeliminationandunscramblingofgenesinsticho trichousciliates, J.ofTheoreticalBiology 222 (2003)323{330. [50]I.B.Raikov,Nucleiofciliates, CiliatesCellsandOrganisms eds.K.Hausman,P.C.Bradbury(1996)221{239. 138

PAGE 151

[51]D.W.Sumners,Liftingthecurtain:usingtopologytopr obethehiddenaction ofenzymes, NoticesofAMS 42 (5)(1995)528{537. [52]S.L.Tausta,L.A.Klobutcher.Detectionofcircularfo rmsofeliminatedDNA duringmacronucleardevelopmentin E.crassus Cell 59 (6)(1989)1019{ 1026. [53]J.Wen,C.Maercker,H.J.Lipps,Sequentionalexcision ofinternaleliminated DNAsequencesinthedierentiatingmacronucleusinthehyp otrichousciliate Stylonychialemnae, NucleicAcidsRes. 24(22)(1996)4415{4419. [54]H.Yan,X.Zhang,Z.Shen,N.C.Seeman,ArobustDNAmecha nicaldevice controlledbyhybridizationtopology, Nature 415 (2002)62{65. 139

PAGE 152

AbouttheAuthor AngelaAngeleskawasbornOctober,291978inOhrid,Macedon ia.Shereceived herBachelorofScienceinMathematicsfromtheUniversityo fSt.Cyriland Methodius,Skopje,Macedoniain2002.Shespentasemesterw orkingonan undergraduateresearchprojectattheUniversityofNoviSa d,Serbiain2002/2003. From2003to2009shehasbeenateachingassistantatUnivers ityofSouthFlorida (USF),whereshetaughtvariouscourses.ShereceivedherM. A.inmathematics fromtheDepartmentofMathematicsandStatisticsatUSFin2 005.Sheentered thePhDprograminmathematicsatthesamedepartment.Herdo ctoraladvisors areDr.JonoskaandDr.Saito.Angela'sresearchinterestis appliedlanguage theory,graphtheoryandknottheoryinmodelingbio-molecu larprocesses.