Citation

## Material Information

Title:
Algorithms for simple stochastic games
Creator:
Valkanova, Elena
Place of Publication:
[Tampa, Fla]
Publisher:
University of South Florida
Publication Date:
Language:
English

## Subjects

Subjects / Keywords:
Game theory
Optimal strategies
Algorithms
Computational complexity
Computational equilibrium
Dissertations, Academic -- Computer Science -- Masters -- USF ( lcsh )
Genre:
non-fiction ( marcgt )

## Notes

Summary:
ABSTRACT: A simple stochastic game (SSG) is a game defined on a directed multigraph and played between players MAX and MIN. Both players have control over disjoint subsets of vertices: player MAX controls a subset V[subfield MAX] and player MIN controls a subset V[subfield MIN] of vertices. The remaining vertices fall into either V[subfield AVE], a subset of vertices that support stochastic transitions, or SINK, a subset of vertices that have zero outdegree and are associated with a payo in the range [0; 1]. The game starts by placing a token on a designated start vertex. The token is moved from its current vertex position to a neighboring one according to certain rules. A fixed strategy Ïƒ of player MAX determines where to place the token when the token is at a vertex of V[subfield MAX]. Likewise, a Ï„ strategy of player MIN determines where to place the token when the token is at a vertex of V[subfield MIN] .When the token is at a vertex of V[subfield AVE, the token is moved to a uniformly at random chosen neighbor. The game stops when the token arrives on a SINK vertex; at this point, player MAX gets the payo associated with the SINK vertex. A fundamental question related to SSGs is the SSG value problem: Given a SSG G, is there a strategy of player MAX that gives him an expected payoff at least 1/2 regardless of the strategy of player MIN? This problem is among the rare natural combinatorial problems that belong to the class NP Î  coNP but for which there is no known polynomial-time algorithm. In this thesis, we survey known algorithms for the SSG value problem and characterize them into four groups of algorithms: iterative approximation, strategy improvement, mathematical programming, and randomized algorithms.We obtain two new algorithmic results: Our first result is an improved worst-case, upper bound on the number of iterations required by the Homan-Karp strategy improvement algorithm. Our second result is a randomized Las Vegas strategy improvement algorithm whose expected running time is O(2 [superscript 0:78n]).
Thesis:
Thesis (M.S.C.S.)--University of South Florida, 2009.
Bibliography:
Includes bibliographical references.
System Details:
Mode of access: World Wide Web.
System Details:
System requirements: World Wide Web browser and PDF reader.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 42 pages.
Statement of Responsibility:
by Elena Valkanova.

## Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
002064141 ( ALEPH )
567641276 ( OCLC )
E14-SFE0003070 ( USFLDC DOI )
e14.3070 ( USFLDC Handle )

## USFLDC Membership

Aggregations:
USF Electronic Theses and Dissertations

## Postcard Information

Format:
Book

Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
controlfield tag 001 002064141
005 20100323130913.0
007 cr bnu|||uuuuu
008 100323s2009 flu s 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0003070
035
(OCoLC)567641276
040
FHM
c FHM
049
FHMM
090
QA76 (Online)
1 100
Valkanova, Elena.
0 245
Algorithms for simple stochastic games
h [electronic resource] /
by Elena Valkanova.
260
[Tampa, Fla] :
b University of South Florida,
2009.
500
Title from PDF of title page.
Document formatted into pages; contains 42 pages.
502
Thesis (M.S.C.S.)--University of South Florida, 2009.
504
Includes bibliographical references.
516
Text (Electronic thesis) in PDF format.
538
Mode of access: World Wide Web.
System requirements: World Wide Web browser and PDF reader.
590
520
ABSTRACT: A simple stochastic game (SSG) is a game defined on a directed multigraph and played between players MAX and MIN. Both players have control over disjoint subsets of vertices: player MAX controls a subset V[subfield MAX] and player MIN controls a subset V[subfield MIN] of vertices. The remaining vertices fall into either V[subfield AVE], a subset of vertices that support stochastic transitions, or SINK, a subset of vertices that have zero outdegree and are associated with a payo in the range [0; 1]. The game starts by placing a token on a designated start vertex. The token is moved from its current vertex position to a neighboring one according to certain rules. A fixed strategy of player MAX determines where to place the token when the token is at a vertex of V[subfield MAX]. Likewise, a strategy of player MIN determines where to place the token when the token is at a vertex of V[subfield MIN] .When the token is at a vertex of V[subfield AVE, the token is moved to a uniformly at random chosen neighbor. The game stops when the token arrives on a SINK vertex; at this point, player MAX gets the payo associated with the SINK vertex. A fundamental question related to SSGs is the SSG value problem: Given a SSG G, is there a strategy of player MAX that gives him an expected payoff at least 1/2 regardless of the strategy of player MIN? This problem is among the rare natural combinatorial problems that belong to the class NP coNP but for which there is no known polynomial-time algorithm. In this thesis, we survey known algorithms for the SSG value problem and characterize them into four groups of algorithms: iterative approximation, strategy improvement, mathematical programming, and randomized algorithms.We obtain two new algorithmic results: Our first result is an improved worst-case, upper bound on the number of iterations required by the Homan-Karp strategy improvement algorithm. Our second result is a randomized Las Vegas strategy improvement algorithm whose expected running time is O(2 [superscript 0:78n]).
653
Game theory
Optimal strategies
Algorithms
Computational complexity
Computational equilibrium
690
z USF
x Computer Science
Masters.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.3070

PAGE 1

AlgorithmsforSimpleStochasticGames by ElenaValkanova Athesissubmittedinpartialfulllment oftherequirementsforthedegreeof MasterofScienceinComputerScience DepartmentofComputerScienceandEngineering CollegeofEngineering UniversityofSouthFlorida MajorProfessor:RahulTripathi,Ph.D. NagarajanRanganathan,Ph.D. SudeepSarkar,Ph.D. DateofApproval: May29,2009 Keywords:gametheory,optimalstrategies,algorithms,computationalcomplexity, computationalequilibrium c Copyright2009,ElenaValkanova

PAGE 2

DEDICATION Tomyparents

PAGE 3

PAGE 4

TABLEOFCONTENTS LISTOFTABLESiii LISTOFFIGURESiv LISTOFALGORITHMSv ABSTRACT vi CHAPTER1INTRODUCTION1 1.1Motivation1 1.2OurContribution3 1.3Organization4 CHAPTER2SIMPLESTOCHASTICGAMES5 2.1Background5 2.2Notations7 2.3DenitionsandPreliminaries7 2.4RelatedModels12 2.4.1ParityGames,MeanPayoGames,andDiscountedPayoGames12 2.4.2MarkovDecisionProcessesMDPS12 2.4.3StochasticGamesalsocalledCompetitiveMarkovDecisionProcesses"14 CHAPTER3ALGORITHMSFORSIMPLESTOCHASTICGAMES15 3.1IterativeApproximationAlgorithms15 3.1.1AnAlgorithmbySomla15 3.1.2AnAlgorithmbyShapley17 3.1.3TheConvergefromBelow"AlgorithmbyCondon18 3.2StrategyImprovementAlgorithms18 3.2.1AnAlgorithmbyHomanandKarp19 3.3MathematicalProgrammingAlgorithms19 3.3.1AQuadraticProgrammingAlgorithmbyCondon19 3.3.2LinearProgrammingAlgorithms20 3.4RandomizedAlgorithms24 3.4.1ARandomizedVariantoftheHoman-KarpAlgorithm byCondon24 3.4.2ASubexponentialRandomizedAlgorithmbyLudwig24 i

PAGE 5

CHAPTER4NEWRESULTS27 4.1Preliminaries27 4.2AnImprovedAnalysisoftheHofman-KarpAlgorithm28 4.3ANewRandomizedAlgorithm35 CHAPTER5RELATEDWORK,CONCLUSION,ANDOPENPROBLEMS38 REFERENCES40 ii

PAGE 6

LISTOFTABLES Table2.1NotationsforSimpleStochasticGames7 Table5.1SummaryofAlgorithmsforSimpleStochasticGames39 iii

PAGE 7

LISTOFFIGURES Figure2.1Asimplestochasticgame G with10verticessource:Condon[Con92]6 iv

PAGE 8

LISTOFALGORITHMS Algorithm1:AnAlgorithmbySomla[Som05]16 Algorithm2:AnAlgorithmbyShapley[Sha53]17 Algorithm3:TheConvergeFromBelow"AlgorithmbyCondon[Con93]18 Algorithm4:AnAlgorithmbyHomanandKarp[HK66]19 Algorithm5:AQuadraticProgrammingAlgorithmbyCondon[Con93]20 Algorithm6:AnLPAlgorithmforSSGswithOnly AVE and MAX Vertices[Der70]21 Algorithm7:AnLPAlgorithmforSSGswithOnly AVE and MIN Vertices[Con92]21 Algorithm8:AnLPAlgorithmforSSGswithOnly MAX and MIN Vertices[Con92]22 Algorithm9:ARandomizedAlgorithmbyCondon[Con93]24 Algorithm10:ASubexponentialRandomizedAlgorithmbyLudwig[Lud95]25 Algorithm11:OurRandomizedAlgorithm35 v

PAGE 9

ALGORITHMSFORSIMPLESTOCHASTICGAMES ElenaValkanova ABSTRACT AsimplestochasticgameSSGisagamedenedonadirectedmultigraphandplayed betweenplayers MAX and MIN .Bothplayershavecontroloverdisjointsubsetsofvertices: player MAX controlsasubset V MAX andplayer MIN controlsasubset V MIN ofvertices.The remainingverticesfallintoeither V AVE ,asubsetofverticesthatsupportstochastictransitions, or SINK ,asubsetofverticesthathavezerooutdegreeandareassociatedwithapayointhe range[0 ; 1].Thegamestartsbyplacingatokenonadesignatedstartvertex.Thetokenis movedfromitscurrentvertexpositiontoaneighboringoneaccordingtocertainrules.Axed strategy ofplayer MAX determineswheretoplacethetokenwhenthetokenisatavertex of V MAX .Likewise,axedstrategy ofplayer MIN determineswheretoplacethetoken whenthetokenisatavertexof V MIN .Whenthetokenisatavertexof V AVE ,thetokenis movedtoauniformlyatrandomchosenneighbor.Thegamestopswhenthetokenarriveson a SINK vertex;atthispoint,player MAX getsthepayoassociatedwiththe SINK vertex. AfundamentalquestionrelatedtoSSGsistheSSGvalueproblem:GivenaSSG G ,is thereastrategyofplayer MAX thatgiveshimanexpectedpayoatleast1 = 2regardlessofthe strategyofplayer MIN ?Thisproblemisamongtherarenaturalcombinatorialproblemsthat belongtotheclassNP coNPbutforwhichthereisnoknownpolynomial-timealgorithm. Inthisthesis,wesurveyknownalgorithmsfortheSSGvalueproblemandcharacterizethem intofourgroupsofalgorithms:iterativeapproximation,strategyimprovement,mathematical programming,andrandomizedalgorithms.Weobtaintwonewalgorithmicresults:Ourrst resultisanimprovedworst-case,upperboundonthenumberofiterationsrequiredbythe Homan-Karpstrategyimprovementalgorithm.OursecondresultisarandomizedLasVegas strategyimprovementalgorithmwhoseexpectedrunningtimeis O 0 : 78 n vi

PAGE 10

CHAPTER1 INTRODUCTION 1.1Motivation Gametheoryisabranchofappliedmathematicsthatisusedineconomics,biology,engineering,andcomputerscience.Gametheorycapturesbehaviorinstrategicsituationsin whichseveralplayersmustmakeindividualchoicesthatpotentiallyaecttheinterestsof otherplayers.Therearedierenttypesofgames,wheretheinitialsconditionsorassumptions mayvarybasedonthedierentnalobjectives.Inmanygames,acentralsolutionconcept isthatofcomputingequilibriumcommonlyknownasNashequilibrium,whereeachplayer hasadoptedastrategythatisunlikelytoyieldabetterpayouponchange.Theoutcomes i.e,payosinthiscasearestableinthesensethatnoneoftheplayerswouldwanttodeviate fromthexedstrategyyieldingtheequilibrium.Apayoisanumber,alsocalledutility, thatreectsthedesirabilityofanoutcometoaplayerandincorporatestheplayer'sattitude towardsrisk.Therearetwotypesofgamerepresentationsknown:standardmatrixform andcompactform.Inthestandardformallpossiblestrategiesandpreferencesofallplayers areexplicitlylisted.Thisformisveryusefulifthereareonlytwoplayersandtheplayershave onlyafewstrategies.Inmostofthegamestherearemanyplayerse.g.,manytracstreams, manyISPscontrollingsuchstreams,andsoexplicitrepresentationisexponential-sizedinthe natureofthegame.Inroutinggames,thestrategyspaceofeachplayerconsistsofallpossible pathsfromsourcetodestinationinthenetwork,whichisexponentiallylargeinthenatural sizeofthegame. Theapplicationofgametheoryineconomicswasrstcoveredina1944booktitledTheoryofGamesandEconomicBehavior"byJohnvonNeumannandOskarMorgenstern.Game theoryhasbeenusedtoanalyzeawidearrayofeconomicphenomena|auctions,bargaining,duopolies,fairdivision,oligopolies,socialnetworkformation,andvotingsystems.The 1

PAGE 11

solutionconceptsaredenedinnormsofrationality.Therearetwotypesofgamesusedin economics:cooperativeandnon-cooperativegames.Innon-cooperativegames,eachplayer usesastrategythatrepresentsabestresponsetotheotherstrategies.Incooperativegames, agroupofplayerscoordinatetheiractions. Gametheoryprovidesamodelforinteractivecomputationsinmulti-agentsystemsincomputerscienceandlogic.Inparticular,techniquesofgametheoryareapplicabletotheproblem ofconstructingreliablecomputersystems.Eachgameisplayedonaniteautomatonand eachstateintheautomatonisownedbyoneoftheplayers.Theplayerowningthestatewith thetokencanmovethetokenalonganyoftheoutgoingedgestoanextstate,andthenextturn starts.Ingeneral,playsareinniteandthenumberofplayersandtheirobjectivesmayvary withtheapplication.Autonomousagentswithvariedinterestscharacterizemanycomputer systemstoday.Gametheoryappearstobeanaturaltoolforbothdesigningandanalyzing theinteractionsamongsuchagents.Consequently,therehasbeenmuchrecentinterestin applyinggametheorytosystemsproblemssee[AKP + 02,SS95].Onesystemproblemof recentinterestisimprovingtheroutingpathsusedbyInternetServiceProvidersISPsby designingmechanismsthatenableISPcoordination.Thesolutiontothisprobleminvolves interactionbetweenautonomousentitiesandapplicationofgametheoreticapproaches. Yao's[Yao77]principleisagame-theoretictechniqueforprovinglowerboundsonthe computationalcomplexityofrandomizedalgorithms,andespeciallyofonlinealgorithms.This principlestatesthattoobtainalowerboundontheperformanceofrandomizedalgorithms, itsucestodetermineanappropriatedistributionofdicultinputsandtoprovethatno deterministicalgorithmcanperformwellagainstthatdistribution.Thetheoreticalbasis ofthisprinciplereliesonthemin-maxtheoremfortwo-personzero-sumgames,whichisa fundamentalresultingametheory. Manyproblemsinarticialintelligence,networking,cryptography,computationalcomplexitytheory,andcomputer-aidedvericationcanbereducedtoatwo-playergamewith specicwinningconditions.Thetwo-playerstochasticgamemodelwasintroducedrstby Shapley[Sha53],andasimplestochasticgameSSGisarestrictionofthegeneralstochastic game.SSGshaveapplicationsinreactivesystemsandinsynthesizingcontrollers.AnSSGis agamedenedonadirectedmultigraphandhastwoplayers| MAX and MIN .Inacom2

PAGE 12

putersystem,choicesof MIN playerforhisstrategycorrespondtotheactionsavailableto thesoftwaredriver,andthechoicesof MAX playerforhisstrategycorrespondtothenondeterministicbehavioroftheenvironment.Inthiscontext,theoptimizationproblemisto ndanoptimalstrategyfor MIN playerintheSSGthatminimizestheprobabilityofreaching anerrorstate.Inthesimplestochasticgames,ratherthanlookingforawinningstrategy, thegoalistondan optimalstrategy ,thatis,astrategywhichguaranteesthebestexpected payoforaplayer.ThedecisionproblemforSSGsistodetermineifthe MAX playerwill winwithprobabilitygreaterthan1 = 2,whenbothplayersusetheiroptimalstrategies. Inthisthesis,wefocusonthealgorithmicpartofgametheory.Theeldofalgorithmic gametheorycombinescomputerscienceconceptsofcomplexityandalgorithmdesigningame theory.Theemergenceoftheinternethasmotivatedthedevelopmentofalgorithmsfornding equilibriaingames,markets,computationalauctions,peer-to-peersystems,andsecurityand informationmarkets.ThisthesisstudiesalgorithmsforSSGs. 1.2OurContribution WepresentknownalgorithmsforsolvingSSGsandforndingtheiroptimalstrategies.We surveyknownalgorithmsandcategorizedthemintofourgroupsas:iterativeapproximation algorithms,strategyimprovementalgorithms,mathematicalprogrammingalgorithms,and randomizedalgorithms.Weintroducebasicdenitionsandconceptsrequiredfortheanalysis ofthesealgorithms.WeformalizethenotionofoptimalstrategiesofplayersinSSGs,and characterizetherunningtimeofthealgorithmsbytheiriterationcomplexityi.e.,thenumber ofiterationsrequiredtoperformsomexedalgorithm-specicpolynomial-timecomputation. Weobtaintwonewalgorithmicresults:Ourrstresultisanimprovedworst-case,upperbound onthenumberofiterationsrequiredbytheHoman-Karpstrategyimprovementalgorithm. OursecondresultisarandomizedLasVegasstrategyimprovementalgorithmwhoseexpected runningtimeis O 0 : 78 n 3

PAGE 13

1.3Organization Thethesisisorganizedasfollows.InChapter1,welistsomeapplicationsofgametheory andpresentexamples.InChapter2,wedenetheSSGvalueproblem,describesomefundamentalpropertiesofSSGs,andbrieyintroducerelatedmodelssuchasparitygames,mean payogames,Markovdecisionprocesses,andstochasticgames.Wesurveyknownalgorithms fortheSSGvalueprobleminChapter3.InChapter4,wedescribeournewalgorithmic resultsonthisproblem.Theresultsofthischapterwereobtainedjointlywithmyadvisor R.Tripathi.Finally,wementionsomefuturedirectionsofworkinChapter5. 4

PAGE 14

CHAPTER2 SIMPLESTOCHASTICGAMES 2.1Background AsimplestochasticgameSSG G isatwo-playergame,denedonadirectedmultigraph G V;E .Thevertexset V ispartitionedintodisjointsubsets V MAX V MIN V AVE ,and SINK Thereareonlytwoverticesinthesubset SINK ,thatarelabeledas0-sinkand1-sink.All verticesof G ,exceptthoseof SINK ,haveexactlytwooutgoingedges.The SINK verticeshave onlyincomingedgesbutnooutgoingedges.Onevertexof G isdesignatedasthe start vertex, labeledstart-vertex.Thegame G isplayedbytwoplayers MAX and MIN .Beforethestart of G ,theplayersarerequiredtochooseastrategyforplayingthegame.Bothplayersadhere totheirrespectivestrategythroughoutthegame.Astrategy forplayer MAX isamapping from V MAX to V suchthatforeach v 2 V MAX v; v 2 E G .Similarly,astrategy for player MIN isamappingfrom V MIN to V suchthatforevery v 2 V MIN v; v 2 E G Thegame G isplayedasfollows:Atokenisplacedonthestartvertex.Ateachstepofthe game,thetokenismovedfromitscurrentvertexposition v toaneighboringoneaccording tothefollowingrule: Ifthecurrentvertex v belongsto V MAX ,thenthe MAX playertakesaturn.Theplayer movesthetokenfrom v to v Ifthecurrentvertex v belongsto V MIN ,thenthe MIN playertakesaturn.Theplayer movesthetokenfrom v to v Ifthecurrentvertex v belongsto V AVE ,thennoneoftheplayerstakesanyturn.Instead, thetokenismovedfrom v toaneighborchosenuniformlyatrandom. 5

PAGE 15

Figure2.1Asimplestochasticgame G with10verticessource:Condon[Con92] Winningconditions:Ifthecurrentvertex v belongsto SINK ,thenthegamestops.If player MIN reaches0-sinki.e., v is0-sink,thenhewinsthegame.Otherwise,player MAX winsthegame. Theobjectiveofeachplayeristomaximizehis/herchancesofwinningthegame.Thus, MAX wouldliketochooseastrategy thatgivesthemaximumchanceofthetokenreaching 1-sink,nomatterwhatstrategy MIN chooses.Ontheotherhand, MIN wouldliketochoose astrategy that,irrespectiveofthestrategychosenby MAX ,givesthemaximumchanceof thetokenreaching0-sink. ThereisalsoamoregeneralversionofSSGsstudiedintheliterature[Fil81,Som05].In thisgeneralizedversion,thegameisdenedonadirectedmultigraph G thathasaset SINK ofsinkvertices.Apayo p s 2 [0 ; 1]isassociatedwitheachsinkvertex s of G .Thepayo p s ofasinkvertexisarationalnumberofsizepolynomialinthenumberofverticesof G Theremainingverticesof G arepartitionedinto V MAX V MIN ,and V AVE ,asintheoriginal version.Ifaplayreachesasink s of G ,thentheplaystopsandplayer MAX winsapayo p s fromplayer MIN .Therulesofplayingthegameatpositionsotherthanthesinkvertices 6

PAGE 16

arethesameasbefore.Theobjectiveofplayer MAX istomaximizehisexpectedpayoand thatofplayer MIN istominimizethisamount. Condon[Con92]showedthatthismoregeneralversionofSSGscanbetransformedinto theoriginallydenedSSGs.Henceforth,thisthesiswillconsideronlythemoregeneralSSGs. 2.2Notations ThenotationsusedinthisthesisaresummarizedinTable2.1. Table2.1NotationsforSimpleStochasticGames Notation Meaning G graphdeningasimplestochasticgame V setofverticesofgraph G = V;E E setofedgesofgraph G = V;E j A j cardinalityofset A [ n ] set f 1 ; 2 ;:::;n g MAX player1 MIN player2 SINK verticesthathavezerooutdegree start-vertex thestartvertexof G V MAX setofallMAXvertices V MIN setofallMINvertices V AVE setofallAVEvertices p x payoassociatedwitha SINK vertex x strategyof MAX player strategyof MIN player v ; expectedpayocorrespondingtostrategies and v opt optimalvaluevector F ; operatorwithrespecttostrategies and F G operatorwithrespecttothegame G 2.3DenitionsandPreliminaries Denition2.1 [strategies] Astrategy forplayer MAX isamappingfrom V MAX to V such thatforeach v 2 V MAX v; v 2 E G .Similarly,astrategy forplayer MIN isamapping from V MIN to V suchthatforevery v 2 V MIN v; v 2 E G 7

PAGE 17

Denition2.2 [stoppinggames] Asimplestochasticgame G isa stoppinggame ifforany position x andforallstrategies and ofthetwoplayers,anyplayof G x usingstrategies and endsatasinknodewithprobabilityone. Denition2.3 [expectedpayos] Let G beasimplestochasticgamewithplayers MAX and MIN .Let and denotetheirrespectivestrategies.Let q ; x;s denotetheprobabilitythat aplayof G x ,usingstrategies and ,endsinanode s 2 SINK .Theexpectedpayo vector v ; of G ,correspondingto and ,isavectorofvalues v ; x 2 [0 ; 1] foreachgame position x of G suchthat: v ; x = X s 2 SINK q ; x;s p s : Denition2.4 Let G V;E beasimplestochasticgamewithplayers MAX and MIN .Let and denotetheirrespectivestrategies.Correspondingto and ,theoperator F ; :[0 ; 1] j V j [0 ; 1] j V j isdenedasfollows:Forevery v 2 [0 ; 1] j V j F ; v = w suchthatforevery x 2 V w x = 8 > > > > > > > < > > > > > > > : v x if x 2 V MAX v x if x 2 V MIN 1 2 v y + 1 2 v z if x 2 V AVE and x;y x;z 2 E p x if x 2 SINK Proposition2.5 Thevector v ; ofexpectedpayosisthe unique xedpointoftheoperator F ; .Thatis, F ; v ; = v ; Denition2.6 Let G V;E beasimplestochasticgamewithplayers MAX and MIN .The operator F G :[0 ; 1] j V j [0 ; 1] j V j isdenedasfollows:Forevery v 2 [0 ; 1] j V j F G v = w suchthatforevery x 2 V with x;y x;z 2 E w x = 8 > > > > > > > < > > > > > > > : max f v y ;v z g if x 2 V MAX min f v y ;v z g if x 2 V MIN 1 2 v y + 1 2 v z if x 2 V AVE p x if x 2 SINK 8

PAGE 18

Denition2.7 [optimalstrategies] Let G V;E beasimplestochasticgamewithplayers MAX and MIN Thestrategies ? and ? are optimalataposition x ifforanystrategy of MAX and foranystrategy of MIN ,itholdsthat v ; ? x v ? ; ? x v ? ; x If ? and ? exist,then v opt x = df v ? ; ? x iscalledanoptimalvalueofthegame G x Thevector v opt ,whosevalueatanyposition x is v opt x ,issaidtobean optimalvalue vector of G Thestrategies ? and ? arecalled optimal for G iftheyareoptimalateveryposition x of G Shapley[Sha53]showedthattherealwaysexistsapairofoptimalstrategies ? and ? for a stopping simplestochasticgame.Moreover,anypairofoptimalstrategiesforthisgame yieldsthesameoptimalvaluevector.Henceforth,werefertoanoptimalvaluevectorofa stoppingsimplestochasticgameastheoptimalvaluevectorofthegame. Theorem2.8see[Som05] Let G bea stopping simplestochasticgame.Then,thereisa unique xedpoint v ? oftheoperator F G i.e., F G v ? = v ? .Moreover, v ? x istheoptimal valueof G x forall x 2 V Denition2.9 [greedystrategies] Let G = V;E beasimplestochasticgamewithplayers MAX and MIN .Let v : V R beavaluevectorfor G .Astrategy of MAX playerissaid tobe v -greedyat x 2 MAX if v x =max f v y ;v z g ,where x;y x;z 2 E .Similarly, astrategy of MIN playerissaidtobe v -greedyat x 2 MIN if v x =min f v y ;v z g where x;y x;z 2 E Inboththecases,ifthereisatie,i.e., v y = v z ,thenitis requiredthatfor x 2 V MAX v x equals max f v y ;v z g ,andfor x 2 V MIN v x equals min f v y ;v z g Foranyplayer P 2f MAX ; MIN g ,astrategyfor P issaidtobe v -greedyif itis v -greedyatevery x 2 V P Proposition2.10see[Som05] Let G bea stopping simplestochasticgame.Let v opt be theoptimalvaluevectorof G .Thenthefollowingstatementsareequivalent:astrategies 9

PAGE 19

and areoptimal,b v ; = v opt ,cstrategies and are v ; -greedy,anddstrategies and are v opt -greedy. Howard[How60]showedthatinanystoppingSSG G ,foreverystrategy ofplayer MAX thereisastrategy ofplayer MIN thatis optimal w.r.t. inthesensethat,forevery x 2 V MIN withneighbors y and z v ; x isequaltotheminimumof v ; y and v ; z Henceforward,wecall anoptimalstrategyofplayer MIN withrespecttoastrategy of player MAX .Inthesameway,wecandene asanoptimalstrategyofplayer MAX with respecttoastrategy ofplayer MIN .Thestrategies and arethebestresponse strategiesofplayers MAX and MIN ,respectively.Theformaldenitionisasfollows: Denition2.11 [bestresponsestrategies] Let G beasimplestochasticgamewithplayers MAX and MIN .Astrategy of MAX issaidtobe optimalwithrespect toastrategy of MIN ifforall x 2 V MAX withchild y v ; x v ; y .Similarly,astrategy of MIN issaidtobe optimalwithrespect toastrategy of MAX ifforall x 2 V MIN withchild y v ; x v ; y Denition2.12 [switchablenodes] Let G = V;E beasimplestochasticgamewithplayers MAX and MIN .Let v : V R beavaluevectorfor G .Let x 2 V MAX [ V MIN haschildren y and z .Thenode x issaidtobe v switchable if x 2 V MAX and v x < max f v y ;v z g ,or if x 2 V MIN and v x > min f v y ;v z g Denition2.13 [stablenodes] Let G = V;E beasimplestochasticgamewithplayers MAX and MIN .Let v : V R beavaluevectorfor G .Let x 2 V MAX [ V MIN [ V AVE haschildren y and z .Thenode x issaidtobe v -stableifthefollowingholds:If x 2 V MAX then v x =max f v y ;v z g ,if x 2 V MIN then v x =min f v y ;v z g ,andif x 2 V AVE then v x = 1 2 v y + 1 2 v z .Thevector v issaidtobestableifforall x 2 V x is v -stable. Otherwise,wesaythat v isnotstable. SSGswerestudiedbyCondon[Con92,Con93],motivatedbycomplexity-theoreticanalysis ofrandomizedspace-boundedalternatingTuringmachines.Condon[Con92]showedthatthe SSGvalueproblem,denedbelow,isinNP coNP.Thisproblemisararecombinatorial problemthatbelongstoNP coNP,butisnotknowntobeinP. 10

PAGE 20

Denition2.14 [TheSSGValueProblem] The value ofaSSG G isdenedtobe max min v ; start vertex The SSGvalueproblem isdenedasfollows: SSG VAL GivenaSSG G ,isthevalueof G> 1 2 Thenextlemmastatesthatthereisapolynomial-timeprocedurethattransformsaSSG G toastoppingSSG G 0 suchthatthevalueof G 0 isgreaterthan1 = 2ifandonlyifthevalue of G isgreaterthan1 = 2.Thus,aSSG G belongstotheproblemSSG-VALifandonlyif G 0 belongstoSSG-VAL,where G 0 istheoutputoftheprocedurementionedinLemma2.15. Henceforth,wheneverwesaythat G isaSSG,weimplicitlyassumethat G isastoppingSSG. Lemma2.15 [Con92] Thereisapolynomial-timeprocedurethattransformsaSSG G toa stoppingSSG G 0 suchthat G 0 hasthesamenumberof MAX and MIN verticesas G andthe valueof G 0 isgreaterthan 1 = 2 ifandonlyifthevalueof G isgreaterthan 1 = 2 Lemma2.16statesthatapairofoptimalstrategies h ? ; ? i ofastoppingSSG G issucient tosolvetheSSGvalueproblemon G .Italsoimpliesthatallpairsofoptimalstrategies h ; i yieldthesamevaluevector v ; Lemma2.16 [Con92] ForastoppingSSG G = V;E ,let h ? ; ? i beapairofoptimal strategiesofplayers MAX and MIN .Then,forall x 2 V v ? ; ? x =max min v ; x : Lemma2.17impliesthatapairofstrategies h ; i thatachievethevalueofaSSG G on astartvertex x is,infact,anoptimalpairofstrategiesatposition x of G Lemma2.17 [Con92] Let G = V;E beastoppingSSGandlet x 2 V .Then,thefollowing holdsforanyvertex x 2 V min max v ; x =max min v ; x : 11

PAGE 21

2.4RelatedModels ManyvariantsofSSGs,suchasparitygames,mean-payogames,anddiscountedpayo games,havebeenextensivelystudiedintheliterature[HK66,Der70,FV97].SSGsarea restrictionofstochasticgames.StochasticgamesareageneralizationofMarkovdecision processes.Wenextbrieyintroduceallthesemodels. 2.4.1ParityGames,MeanPayoGames,andDiscountedPayoGames ParitygamesPGs,meanpayogamesMPGsanddiscountedpayogamesDPGs arenon-cooperativetwo-persongames,playedonadirectedgraphinwhicheachvertexhas atleastoneoutgoingedge.Inaparitygame,thedirectedgraphhastwotypesofvertices, V MAX and V MIN .Eachvertex v hasapositiveintegercolor p v 2 N andhasatleastone outgoingedge.SimilarlytoaSSG,thegameisbetweentwoplayers MAX and MIN ,andthe gamebeginswhenatokenisplacedonthestartvertex.Dependingonthetypeofthevertex wherethetokencurrentlylies,playersalternatelymovethetokenalongoneofitsoutgoing edgeandconstructaninnitepath v 0 v 1 v 2 ::: calledaplay.Player MAX winsifthe largestvertexcolor p v i amongallvertices v i occurringinnitelyofteninaplayisodd,and player MIN winsifthecolor p v i iseven,where v 0 v 1 ::: istheinnitepathformedby theplayers.MeanpayogamesMPGs[EM79,GKK88]aresimilartoPGs,butinstead ofcoloredverticeshaveinteger-weightededges.InMPGs,therstplayertriestomaximize theaverageedgeweightinthelimitwhereasthesecondplayertriestominimizethisvalue. MPGscanbeusedtodesignandanalyzealgorithmsforjobscheduling,nite-windowonline stringmatching,andselectionwithlimitedstorage.Indiscountedpayogames,wearegiven rationaldiscountfactors.ItisknownthatPGscanbereducedtoMPGssee[GW02]and thatMPGsandDPGscanbereducedtoSSGs[ZP96].Thedecisionproblemscorresponding toPGS,MPGs,andDPGsarealsoknowntobeinNP coNP. 2.4.2MarkovDecisionProcessesMDPS AMarkovdecisionprocessisasingleagentcontrolledstochasticsystem,whichisobserved atdiscretetimepointsandisdescribedby:asetofstates S ,asetofactions A ,asetof 12

PAGE 22

observations O ,arewardfunction r ,astatetransitionfunction p ,anobservationfunction andaninitialstate s 0 .Ateverystate s ,thecontrollerortheagentoftheprocessmakesan observation o s andthenchoosesanaction a 2A dependingontheobservationmade.The choiceof a 2A inastate s resultsinanimmediate reward r s;a ,andisaccompaniedbya probabilistictransitiontoanewstate s 0 2S .Dependingonthetypeofobservationmadein everystate s ,aMarkovdecisionprocessiscalledafully-observableMarkovdecisionprocess MDP,anunobservableMarkovdecisionprocessUMDP,orapartially-observableMarkov decisionprocessPOMDP. Denition2.18See[MGLA00,FV97] Apartially-observableMarkovdecisionprocess isatuple M = S ;s 0 ; A ; O ;p;o;r ,where S A ,and O arenitesetsof states actions ,and observations ,respectively, s o 2S isthe initialstate p : SAS! [0 ; 1] isthestatetransitionfunction,i.e., p s;a;s 0 istheprobability ofmovingtostate s 0 2S upontakingaction a 2A instate s 2S o : S!O isthe observationfunction ,i.e., o s istheobservationmadeinstate s 2S and r : SA! Z isthe rewardfunction ,i.e., r s;a istherewardgainedupontakingan action a 2A instate s 2S Ifa POMDP isdenedinsuchawaythatthestatesandtheobservationscoincide,i.e., S = O ,and o istheidentityfunction,thenthe POMDP iscalled fully-observable and isdenotedby MDP .Inanothercase,whenthesetofobservationsisasingleton,then the POMDP iscalled unobservable andisdenotedby UMDP FindinganoptimalpolicyinMDPsisaproblemofoptimizationtheoryandissolvable inpolynomialtimeusinglinearprogramming[d'E63,Kha79,Kar84].MarkovdecisionprocessesMDPsarewidelyusedformodelingsequentialdecision-makingproblemsthatarise inengineeringandsocialsciences. 13

PAGE 23

2.4.3StochasticGamesalsocalledCompetitiveMarkovDecisionProcesses" Stochasticgames,alsocalledcompetitiveMarkovdecisionprocesses,aremultiagentgeneralizationsofMarkovdecisionprocesses.Instochasticgames,thestatetransitionfunction dependsjointlyontheactionsofallplayers;therewardsarealsodeterminedbythejoint actionsoftheplayers.Aformaldenitionoftwo-personstochasticgamesisasfollows. Denition2.19See[FV97] Atwo-personstochasticgameisatuple G = S s 0 A 1 A 2 p r 1 r 2 ,where S A 1 ,and A 2 arenitesetsof states actionsofplayer1 ,and actionsofplayer2 respectively. s o 2S isthe initialstate p : SA 1 A 2 S! [0 ; 1] isthe statetransitionfunction ,i.e., p s;a 1 ;a 2 ;s 0 isthe probabilityofmovingtostate s 0 uponaction a 1 2A 1 byplayer1andaction a 2 2A 2 by player2instate s 2S r 1 : SA 1 A 2 Z istherewardfunctionofplayer1,i.e., r 1 s;a 1 ;a 2 isthereward gainedbyplayer1upontakingaction a 1 2A 1 byplayer1andaction a 2 2A 2 byplayer 2instate s 2S r 2 : SA 1 A 2 Z istherewardfunctionofplayer2,i.e., r 2 s;a 1 ;a 2 isthereward gainedbyplayer2upontakingaction a 1 2A 1 byplayer1andaction a 2 2A 2 byplayer 2instate s 14

PAGE 24

CHAPTER3 ALGORITHMSFORSIMPLESTOCHASTICGAMES Manyalgorithmshavebeenproposedforsolvingsimplestochasticgames.Inthischapter, weintroducefourmainmethodsapproachesusedinthedesignofthesealgorithms.These methodsare:theiterationapproximationmethod,thestrategyimprovementmethod,the mathematicalprogrammingmethodandrandomizedalgorithms.Thereisnostrictdierentiationbetweenthesefourtypesandsomealgorithmsinvolvemorethanonemethodtoobtain anoptimalpairofstrategies. WewillrestrictourattentiontostoppingSSGsasdiscussedinChapter2,Section2.3see Lemma2.15. 3.1IterativeApproximationAlgorithms Inaniterativeapproximationalgorithmfora stopping simplestochasticgame G ,the algorithmbeginswithaninitialvaluevector v 1 2 [0 ; 1] j V j ,whichisupdatedfromoneiteration toanother.Attheendofalliterations,thealgorithmreturnstheoptimalvaluevector v opt andtheoptimalstrategies ? ? for G 3.1.1AnAlgorithmbySomla Somla[Som05]proposedtwoiterativeapproximationalgorithmsforsimplestochastic games;wedescribebelowoneofthem. ThealgorithmbySomlabeginswith v 1 = F G j V j .Atthestartofthe i 'thiteration, thereisacurrentvaluevector v i 2 [0 ; 1] j V j .Thisvaluevectorisupdatedtoanewone ~ v 2 [0 ; 1] j V j bysolvingasystemoflinearconstraints.Thevector~ v isthentransformedintoa vector v i +1 = F G ~ v forthenextiteration.Thealgorithmterminateswhenthecurrentvalue vector v i leadsto v i -greedystrategies h i ; i i ,whichareoptimalfor G .Therequirementfor 15

PAGE 25

optimalityof h i i i isgivenbyProposition2.10:Strategies h i ; i i areoptimalfor G ifand onlyif h i ; i i are v i ; i -greedy,i.e., F G v i ; i = v i ; i .Attermination,thealgorithmoutputs theoptimalstrategies h i ; i i andtheoptimalvaluevector v i ; i Algorithm1:AnAlgorithmbySomla[Som05] Input :Simplestochasticgame G Output :Anoptimalpairofstrategies h ; i andtheoptimalvaluevector v opt begin 1 Startwith v 1 F G j V j and i 1 2 Find v i -greedystrategies h i i i 3 Stopif h i ; i i areoptimal.Returntheoptimalstrategies h i ; i i andtheoptimal 4 valuevector v i ,whichalsoequals v i ; i 5 Findavaluation v thatmaximizes P x v x andsatisesthelinearconstraints: 6 a v i v v 7 bstrategies h i ; i i are v -greedy 8 c v v F i ; i v 9 Take v i +1 F G v andREPEATsteps3-9 10 end 11 Deneapartialorderon V [0 ; 1]by v v v 0 ifandonlyif v x v 0 x forall x 2 V .Let W [0 ; 1] j V j bearegiondenedby W = f v 2 [0 ; 1] j V j j v v F G v g : Itcanbeshownthattheoptimalexpectedpayo v opt isthemaximalpointof W under thepartialorderdenedby v .However, W isdescribedbylocaloptimalityequations,which arenonlinear.Thealgorithmusesthecrucialideaofpartitioning W intosubregions W ; in whichtheequationsarelinear.Thesubregions W ; aredenedforeachgivenstrategies and ofrespectiveplayersasfollows: W ; = f v 2 W jh ; i are v -greedyand v v F ; v g : Thealgorithmiteratesthroughonesubregiontoanother.Ineachiteration,anewsubregion isvisitedandamaximalelement~ v ofthecurrentsubregionisdeterminedbyusinglinear programming.Thesubregionsarevisitedinamonotonicallyincreasingorder,denedby v 16

PAGE 26

oftheirmaximalelements.Eventually,thesubregioncontainingtheoptimalexpectedpayo v opt isvisitedinsomeiteration.Atthisiteration,thealgorithmterminatesbecauseofthe maximalityof v opt Thealgorithmndsanoptimalpairofstrategiesafteratmostanexponentialnumber ofiterations,sincethenumberofdierentsubregionsisthesameasthenumberofdierent strategies.Onedrawbackofthisalgorithmisthatitispossiblethatthesamesubregionfrom thegraphistraversedseveraltimesduringthesearchforanoptimalpairofstrategies. 3.1.2AnAlgorithmbyShapley Algorithm2:AnAlgorithmbyShapley[Sha53] Input :Simplestochasticgame G Output :Anoptimalpairofstrategies h ; i andtheoptimalvaluevector v opt begin 1 Startwithavaluevector v initializedasfollow:Forevery x 2 V 2 v x = 8 > > < > > : 1if x 2 V MAX 0if x 2 V MIN 1 2 v y + 1 2 v z if x 2 V AVE and x;y x;z 2 E p x if x 2 SINK : 3 while F G v 6 = v do 4 Let v 0 bedenedasfollows: 5 v 0 x = 8 > > < > > : max f v y ;v z g if x 2 V MAX and x;y x;z 2 E min f v y ;v z g if x 2 V MIN and x;y x;z 2 E 1 2 v y + 1 2 v z if x 2 V AVE and x;y x;z 2 E v x if x 2 SINK : 6 Set v v 0 7 Output v and v -greedystrategies h ; i 8 end 9 Thealgorithmusesiterativeapproximationmethod.Itstartswithsomeinitialvalue vector v ,whereall MIN verticeshavevalue0,all MAX verticeshavevalue1,andall AVE verticesarestable.Ititerativelyupdates v suchthataftereachiteration,thevector v gets closertotheoptimalvaluevector v opt .Ineachiterationstep,thevalue v x ofeachnode x isupdatedbasedonthevalueofitschildrenusingtheoperator F G until F G v = v ,where F G isdenedinDenition2.6.Condon[Con93]showedthatintheworstcasethisalgorithm requires n iterations,where n isthenumberofnodesinthegraph. 17

PAGE 27

3.1.3TheConvergefromBelow"AlgorithmbyCondon Algorithm3:TheConvergeFromBelow"AlgorithmbyCondon[Con93] Input :Simplestochasticgame G Output :Anoptimalpairofstrategies h ; i andtheoptimalvaluevector v opt begin 1 Startwithavaluevector v inwhichallnodes x 2 V MIN havevalue v x =0and 2 allnodes x 2 V MAX [ V AVE arestable.Tondthis v ,uselinearprogrammingto solve minimize X x 2 V v 0 x ,subjectto 3 v 0 x v 0 y if x 2 V MAX and x;y 2 E v 0 x =0if x 2 V MIN v 0 x = 1 2 v 0 y + 1 2 v 0 z if x 2 V AVE and x;y x;z 2 E v 0 x = p x if x 2 SINK 4 while F G v 6 = v do 5 Let v 0 bethesolutiontothefollowinglinearprogram,LP v : 6 maximize X x 2 V v 0 x ,subjectto 7 v 0 x v 0 y if x 2 V MIN and x;y 2 E v 0 x = 1 2 v 0 y + 1 2 v 0 z if x 2 V AVE and x;y x;z 2 E v 0 x = v 0 y if x 2 V MAX & x;y x;z 2 E ,and v y >v z v 0 x = 1 2 v 0 y + 1 2 v 0 z if x 2 V MAX & x;y x;z 2 E ,and v y = v z v 0 x = p x if x 2 SINK 8 Let v bethevaluevectorinwhichallnodes x 2 V MIN havevalue 9 v x = v 0 x andallnodes x 2 V MAX [ V AVE arestable.This v canbefound usinglinearprogrammingasinStep1 Output v and v -greedystrategies h ; i 10 end 11 Thealgorithmuseslinearprogrammingtooutputtheoptimalvaluevector v opt forgame G .Itstartswithaninitialvaluevector v whereall MIN verticeshavevalue0andallvertices in MAX [ AVE are v -stable.ItiterativelyinvokesSteps6-9untilallverticesarestabilized. Atthispoint,thealgorithmreachestheoptimalvaluevector. 3.2StrategyImprovementAlgorithms ThestrategyimprovementmethodforsolvingaSSGwasrstintroducedbyHomanand Karp[HK66].Algorithms,basedonthismethod,startwithaninitialpairofstrategyiesfor players MAX and MIN .Thealgorithmiterativelycomputesanoptimalpairofstrategies.In 18

PAGE 28

eachiteration,thestrategyofoneoftheplayersisimprovedbyswitchingthenodesatwhich theoptimalchoiceisnotachieved.Themainideaisthatlocaloptimizationonastrategywill eventuallyleadtoaglobaloptimizationoverthegame. 3.2.1AnAlgorithmbyHomanandKarp Algorithm4:AnAlgorithmbyHomanandKarp[HK66] Input :Simplestochasticgame G Output :Anoptimalpairofstrategies h ; i andtheoptimalvaluevector v opt begin 1 Let and bearbitrarystrategiesforplayers MAX and MIN ,respectively 2 while F G v ; 6 = v ; do 3 Let 0 beobtainedfrom byswitchingall v ; -switchable MAX vertices 4 Let 0 beanoptimalstrategyofplayer MIN w.r.t. 0 5 Set 0 0 6 Output h ; i andtheoptimalvaluevector v ; 7 end 8 MelekopoglouandCondon[MC90]showedthatmanyvariantsoftheHoman-Karpalgorithm,where,insteadofswitchingallswitchable MAX vertices,onlyone MAX vertexis switched,require n iterationsintheworstcase.InChapter4,weshowthattheHomanKarpalgorithmrequiresatmost O n =n iterationsintheworstcase.Thisisthebestknown worst-case,upperboundontheiterationcomplexityofthisalgorithm. 3.3MathematicalProgrammingAlgorithms TheSSGvalueproblemcanbereducedtomathematicalprogrammingproblemse.g.,to aquadraticprogrammingproblemwithnon-convexobjectivefunction,whicharegenerally NP-hard.SomesuchalgorithmsareincludedinthesurveybyFilarandSchultz[FS86]. 3.3.1AQuadraticProgrammingAlgorithmbyCondon Inthisalgorithm,anon-convexquadraticprogramwithlinearconstraintsisformulated givenaSSGG.ThefollowingtheorembyCondon[Con93]ensuresthatalocallyoptimal solutiontothisquadraticprogramyieldstheoptimalvaluevectorof G 19

PAGE 29

Algorithm5:AQuadraticProgrammingAlgorithmbyCondon[Con93] Input :Simplestochasticgame G Output :Anoptimalpairofstrategies h ; i andtheoptimalvaluevector v opt begin 1 Denethefollowingquadraticprogram: 2 Minimize X x 2 V max [ V min withchildren y and z v x )]TJ/F22 10.9091 Tf 10.909 0 Td [(v y v x )]TJ/F22 10.9091 Tf 10.909 0 Td [(v z ; subjecttotheconstraints v x v y if x 2 V MAX withchild y v x v y if x 2 V MIN withchild y v x = 1 2 v y + 1 2 v z if x 2 V AVE withchildren y and z v x = p x if x 2 SINK Findalocallyoptimalsolution v tothisquadraticprogram.Output v and 3 v -greedystrategies h ; i end 4 Theorem3.1 [Con93] Theobjectivefunctionvalueiszeroifandonlyifthevaluevector v istheoptimalvaluevectorofthegame.Moreover,zeroistheonlylocallyoptimalsolutionof theobjectivefunctioninthefeasibleregion. Algorithm5ndsalocallyoptimalsolution v andoutputs v -greedystrategies h ; i .Itis openwhetheralocallyoptimalsolutiontothequadraticprogramusedinthisalgorithmcan becomputedinpolynomialtime. 3.3.2LinearProgrammingAlgorithms TherearespecialcaseswhentheSSGvalueproblemcanbesolvedinpolynomialtime. Forinstance,forthecasewhenthegraph G = V;E deningaSSGconsistsofonlytwo typesofvertices,suchas AVE and MAX vertices, AVE and MIN vertices,or MAX and MIN ,thereisapolynomial-timealgorithm. Theorem3.2 TheSSGvalueproblemrestrictedtoSSGswithonly AVE and MAX vertices, AVE and MIN vertices,and MAX and MIN canbesolvedinpolynomial time. 20

PAGE 30

Algorithm6:AnLPAlgorithmforSSGswithOnly AVE and MAX Vertices[Der70] Input :Asimplestochasticgame G = V;E withonly AVE and MAX vertices Output :Theoptimalvaluevector v opt begin 1 Minimize X x 2 V v x 2 subjecttotheconstraints: 3 v x v y if x 2 V MAX and x;y 2 E v x 1 2 v y + 1 2 v z if x 2 V AVE and x;y ; x;z 2 E v x = p x x 2 SINK v x 0 x 2 V Solvethislinearprogramtoobtainanoptimalvaluevector v .Output v 4 end 5 Algorithm7:AnLPAlgorithmforSSGswithOnly AVE and MIN Vertices[Con92] Input :Asimplestochasticgame G = V;E with AVE and MIN vertices Output :Theoptimalvaluevector v opt begin 1 Maximize X x 2 V v x 2 subjecttotheconstraints: 3 v x v y if x 2 V MIN and x;y 2 E v x 1 2 v j + 1 2 v k if x 2 V AVE and x;y ; x;z 2 E v x = p x x 2 SINK v x 0 x 2 V Solvethislinearprogramtoobtainanoptimalvaluevector v .Output v 4 end 5 21

PAGE 31

Algorithm8:AnLPAlgorithmforSSGswithOnly MAX and MIN Vertices[Con92] Input :Asimplestochasticgamewithonly MAX and MIN vertices Output :Anoptimalpairofstrategies h ; i andtheoptimalvaluevector v opt begin 1 D ; 2 U V )]TJ/F22 10.9091 Tf 10.909 0 Td [(D 3 forall vertices x 2 SINK do 4 v x p x 5 D D [f x g 6 repeat 7 forall vertices x 2 U do 8 if x 2 V MAX witha1-valuedchildin D then 9 move x to D 10 v x 1 11 if x 2 V MAX withtwo0-valuedchildrenin D then 12 move x to D 13 v x 0 14 if x 2 V MIN witha0-valuedchildin D then 15 move x to D 16 v x 0 17 if x 2 V MIN withtwo1-valuedchildrenin D then 18 move x to D 19 v x 1 20 until noverticesaremovedfrom U to D intheloop 21 forall vertices x 2 U do 22 v x 0 23 Output v and v -greedystrategies h ; i 24 end 25 22

PAGE 32

Proof: TheproofofcaseisduetoDerman[Der70].ConsideraSSG G = V;E whichisaninputtoAlgorithm6.Since G isastoppingSSG,byLemma2.16thereisaunique optimalvaluevectorof G .Weclaimthatif v LP isanoptimalsolutiontotheLPdenedin Algorithm6,then v LP mustbetheuniqueoptimalvaluevectorof G Assumetothecontrarythat v LP ,whichisanoptimalsolutiontotheLP,isnotanoptimal valuevectorof G .Then,atleastoneofthefollowingtwocasesarises:thereisavertex x 2 V MAX withneighbors y;z in G suchthat v LP x > max f v LP y ;v LP z g ,andthereis avertex x 2 V AVE withneighbors y;z in G suchthat v LP x > 1 2 v LP y + v LP z .Inthe rstcase,itiseasytoseethatthevaluevector v 0 ,denedforevery u 2 V by v 0 u = 8 > < > : v LP u if x 6 = u ,and max f v LP y ;v LP z g if x = u and x;y ; x;z 2 E isafeasiblesolutionoftheLPwithanimprovedobjectivefunctionvalue: P x 2 V v 0 x < P x 2 V v LP x .Thisleadstoacontradiction.Inthesecondcase,itiseasytoseethatthe valuevector v 0 ,denedforevery u 2 V by v 0 u = 8 > < > : v LP u if x 6 = u ,and 1 2 v LP y + v LP z if x = u and x;y ; x;z 2 E isafeasiblesolutionoftheLPwithanimprovedobjectivefunctionvalue: P x 2 V v 0 x < P x 2 V v LP x .Thisalsoleadstoacontradiction.Hence, v LP mustbetheoptimalvalue vectorof G Inasimilarway,wecanprovecase,i.e.,thatthesolutionoutputfromAlgorithm7is theuniqueoptimalvaluevectorofagame G withonly AVE and MIN vertices. Fortheproofofcase,considerastoppingSSGGthathasonly MAX and MIN vertices. Inthiscase,weclaimthatAlgorithm8outputsanoptimalpairofstrategies h ; i andthe uniqueoptimalvaluevector v of G inpolynomialtime.Thealgorithmmaintainstwosetsof vertices D and U D isthesetofverticeswhosevalueshavealreadybeendeterminedand U isthesetofverticeswhosevaluesarestillundetermined.Thealgorithmrunsinpolynomial timesinceoneachiterationexceptthelastoneatleastonevertexismovedfrom U to D 23

PAGE 33

Thecorrectnessofthealgorithmrequiressomeanalysiswhosedetailsweomitinthisthesis. Forcompletedetailsoftheproof,wereferthereadertothepaper[Con92]. 3.4RandomizedAlgorithms 3.4.1ARandomizedVariantoftheHoman-KarpAlgorithmbyCondon Algorithm9:ARandomizedAlgorithmbyCondon[Con93] Input :Astoppingsimplestochasticgame G Output :Anoptimalpairofstrategies h ; i andtheoptimalvaluevector v opt begin 1 Let and bearbitrarystrategiesforplayers MAX and MIN ,respectively 2 while F G v ; 6 = v ; do 3 Choose2 n non-emptysubsetsof V MAX randomlyanduniformlythatare 4 ~v ; -switchable Letthestrategiesobtainedbyswitchingthesesubsetsbe 1 ;:::; 2 n 5 Lettheoptimalstrategiesofplayer MIN withrespectto 1 ;:::; 2 n be 6 1 ;:::; 2 n ,respectively Let1 k 2 n beanindexsuchthat P x 2 V v k ; k x P x 2 V v l ; l x for 7 all1 l 2 n Let k and k 8 Output h ; i andtheoptimalvaluevector v ; 9 end 10 Theoutputstrategies h ; i areoptimalsince v opt is v ; -greedy.Thealgorithmhalts sincethereareonlyanitenumberofstrategies,andnopaircanberepeatedatthestartof anewiteration.Condon[Con93]provedthecorrectnessofthealgorithm.Shealsoshowed thattheexpectednumberofiterationsofthisalgorithmis2 n )]TJ/F23 7.9701 Tf 6.587 0 Td [(f n +2 o n ,foranyfunction f n = o n ,where n isthenumberof MAX vertices. 3.4.2ASubexponentialRandomizedAlgorithmbyLudwig TherandomizedalgorithmforSSGs,proposedbyLudwig[Lud95],issubexponentialin thenumberofverticesofthegamewhentheoutdegreeoftheverticesinthegameisatmost two.Thealgorithmwillbeexponentialifareductionisappliedfromagamewitharbitrary outdegreetoagamewithoutdegreeatmosttwo. 24

PAGE 34

Algorithm10:ASubexponentialRandomizedAlgorithmbyLudwig[Lud95] Input :AstoppingSSG G = V;E andastrategy ofplayer MAX Output :Anoptimalpairofstrategies h ; i andtheoptimalvaluevector v opt begin 1 repeat 2 Chooseuniformlyatrandomavertex s 2 V MAX 3 Constructanewgame e G = e V; e E asfollows: 4 e V V )-222(f s g 5 e E E )-222(f x;y 2 E j x = s or y = s g [f x;y j x;s 2 E and s = y g 6 Letthesetof MAX verticesof e G be e V MAX .Recursivelyapplythealgorithm 7 tothegame e G andthestrategy e : e V MAX e V ofplayer MAX tondan optimalstrategy e 0 ofplayer MAX forthegame e G .Herewedene e for every x 2 e V MAX asfollows: e x = x if x 6 = s ,and e x = s otherwise. 8 Extend e 0 toastrategy 0 for G bysetting 0 s = s 9 Findanoptimalstrategy 0 forplayer MIN w.r.t.thestrategy 0 in G 10 if thepair h 0 ; 0 i isoptimal then 11 return h 0 ; 0 i andtheoptimalvaluevector v 0 ; 0 12 else 13 Let beobtainedfrom 0 byswitchingvertex s 14 until anoptimalpairofstrategies h ; i in G isfound 15 Output h ; i andtheoptimalvaluevector v ; 16 end 17 25

PAGE 35

Thealgorithmstartswithagivenstrategy ofplayer MAX andarandomchoiceofa vertex s .Ineachiteration,anewstrategy 0 isoutput,whichisthebeststrategyforplayer MAX withrespectto atvertex s .ThealgorithmrequiressolvingalinearprogramLPof sizepolynomialin n .ItisimportanttonotethatthisLPissolvedforeachswitch"operation performedbythealgorithm,whereswitch"isdenedasachangeinthestrategyofplayer MAX atasinglevertex.Therefore,therunningtimeperswitchoperationispolynomialin n .Itcanbeshownthattheexpectednumberofswitchoperationsperformedis2 O p d ,where d isthenumberof MAX vertices.Hence,theexpectedrunningtimeofthisalgorithmis 2 O p d poly n ,whichissub-exponentialintheinputsize n Theorem3.3 [Lud95] TheexpectedrunningtimeofAlgorithm10is 2 O p min fj V MAX j ; j V MIN jg poly n 26

PAGE 36

CHAPTER4 NEWRESULTS Inthischapter,weobtaintwonewalgorithmicresults.Ourrstresultisanimproved worst-case,upperboundonthenumberofiterationsrequiredbytheHoman-Karpstrategy improvementalgorithm.ThisresultisdescribedinSection4.2.OursecondresultisarandomizedLasVegasstrategyimprovementalgorithmwhoseexpectedrunningtimeis O 0 : 78 n ThisresultisdescribedinSection4.3. AlltheresultsofthischapterwereobtainedjointlywithmyadvisorR.Tripathi.These resultsappearedinpreliminaryforminatechnicalreportbyV.KumarandR.Tripathi[KT04]. 4.1Preliminaries Notation4.1 Let G bea stopping simplestochasticgamewithplayers MAX and MIN .For everystrategy ofplayer MAX ,weuse todenotetheuniqueoptimalstrategyofplayer MIN w.r.t.strategy .Likewise,foreverystrategy ofplayer MIN ,weuse todenote theuniqueoptimalstrategyofplayer MAX w.r.t.strategy Notation4.2 Let G bea stopping simplestochasticgamewithplayers MAX and MIN Foreverystrategy ofplayer MAX ,weuse S todenotethesetofall v ; -switchable verticesof G .Likewise,foreverystrategy ofplayer MIN ,weuse T todenotethesetofall v ; -switchableverticesof G Denition4.3 Let G bea stopping simplestochasticgamewithplayers MAX and MIN Foreverystrategy ofplayer MAX andsubset S S ,let switch ;S : V MAX V be astrategyofplayer MAX obtainedfrom byswitchingallverticesof S only.Thestrategy 27

PAGE 37

switch ;S isdenedasfollows:Forevery x 2 V MAX withneighbors y;z suchthat y = x switch ;S = 8 > < > : y if x 62 S ,and z if x 2 S Likewise,foreverystrategy ofplayer MIN andsubset T T switch ;T : V MIN V is denedasastrategyofplayer MIN obtainedfrom byswitchingallverticesof T only.Its formaldenitionisasfollows:Forevery x 2 V MIN withneighbors y;z suchthat y = x switch ;T = 8 > < > : y if x 62 T ,and z if x 2 T Denition4.4 Let v 1 v 2 bevaluevectorsin [0 ; 1] n ,forsome n 2 N + .Wesaythat v 1 v 2 ifforeachposition x 2 [ n ] ,itholdsthat v 1 x v 2 x v 1 v 2 if v 1 v 2 andthereissomeposition x 2 [ n ] suchthat v 1 x >v 2 x v 1 = v 2 ifforeachposition x 2 [ n ] ,itholdsthat v 1 x = v 2 x v 1 and v 2 areincomparableiftherearepositions x;y 2 [ n ] suchthat v 1 x >v 2 x and v 1 y
PAGE 38

boundingthenumberofiterationsrequiredbypolicyiterationalgorithmstosolveMarkov decisonprocessesMDPs.Simplestochasticgamesaretwo-persongamesoveranitehorizon inwhichtheexpectedpayodependsonthestrategiesofbothplayers.Incontrast,Markov decisionprocessesconsistofasingleagentwhoseactionsdeterminethecumulativeawardover aninnitehorizon.Becauseofthesecontrastingfeatures,itisnotobviouswhethertechniques developedforanalyzingMarkovdecisionprocessescanbegeneralizedinastraightforward mannertoanalyzesimplestochasticgames.Oneofourcontributionsistodemonstratethat simplestochasticgamesindeedcarrysomestructuralpropertiessimilartoMarkovdecision processes,whichcanbeharnessedtoanalyzethesegames. ThemainresultofthissectionisTheorem4.12,whichreliesonseverallemmasonthe propertiesofsimplestochasticgamesandtheHoman-Karpalgorithm.Werstpresentthese lemmasandtheirproofs,andthenusetheseresultsinprovingthetheorem. Lemma4.6 Let G bea stopping simplestochasticgamewith n verticesandwithplayers MAX and MIN .Let and bestrategiesofplayers MAX and MIN ,respectively.Thereis an n n matrix Q withentriesin f 0 ; 1 2 ; 1 g andan n -vector b withentriesin f 0 g[f p x j x 2 SINK g suchthat v ; istheuniqueoptimalsolutiontotheequation v ; = Qv ; + b .Moreover, I )]TJ/F22 10.9091 Tf 9.42 0 Td [(Q isinvertible,allentriesof I )]TJ/F22 10.9091 Tf 9.42 0 Td [(Q )]TJ/F20 7.9701 Tf 6.586 0 Td [(1 arenon-negative,andtheentriesalongthediagonal arestrictlypositive. ProofSketch. TheproofofthislemmaisastraightforwardextensionofLemma1in Condon'spaper[Con92]tothecaseofsimplestochasticgameswithpayos.So,wereferthe readertoherpaper[Con92]forthisproof. Lemma4.7 Let G = V;E bea stopping simplestochasticgamewithplayers MAX and MIN .Let beastrategyofplayer MAX suchthat S isnonemptyandlet S beanynonempty subsetof S .Let 0 denote switch ;S ,whichastrategyofplayer MAX .Then,itholdsthat v 0 ; 0 v ; Proof. AsstatedinNotation4.1,let 0 denotetheuniqueoptimalstrategyof player MIN w.r.t.strategy respectively, 0 ofplayer MAX .Throughoutthisproof,we write for and 0 for 0 fornotationalconvenience. 29

PAGE 39

Theconstructionofapairofstrategies h 0 ; 0 i from h ; i proceedsintwosteps:constructionof h 0 ; i from h ; i intherststep,andconstructionof h 0 ; 0 i from h 0 ; i inthe nextone.Intherststep, 0 isobtainedfrom asaresultofswitchingallverticesof S Inthesecondstep, 0 isobtainedfrom asaresultofswitchingallverticesofsomesubset T V MIN .Noticethatexceptforverticesin S [ T ,allothervertices x 2 V MAX )]TJ/F22 10.9091 Tf 11.495 0 Td [(S and u 2 V MIN )]TJ/F22 10.9091 Tf 10.909 0 Td [(T have x = 0 x and u = 0 u FromLemma4.6,weknowthat v ; and v 0 ; 0 aretheuniquesolutionstotheequations v ; = Q v ; + b .1 v 0 ; 0 = Q 0 v 0 ; 0 + b 0 ; .2 forsome Q Q 0 b ,and b 0 .Let 4 = v 0 ; 0 )]TJ/F22 10.9091 Tf 10.523 0 Td [(v ; .Weshowthat 4 0andforsomeentry 4 isactually > 0.Clearly,byDenition4.4thiswouldsucetoproveLemma4.7. SubtractingEq..2fromEq..1,weget 4 = Q 0 v 0 ; 0 + b 0 )]TJ/F15 10.9091 Tf 10.909 0 Td [( Q v ; + b : .3 Addingandsubtracting Q 0 v ; + b 0 to 4 ,weget 4 = Q 0 v 0 ; 0 + b 0 )]TJ/F15 10.9091 Tf 10.91 0 Td [( Q 0 v ; + b 0 + Q 0 v ; + b 0 )]TJ/F15 10.9091 Tf 10.909 0 Td [( Q v ; + b ; or 4 = Q 0 4 + ; .4 where = Q 0 v ; + b 0 )]TJ/F15 10.9091 Tf 9.529 0 Td [( Q v ; + b .FromLemma4.6,weknowthat I )]TJ/F22 10.9091 Tf 9.529 0 Td [(Q 0 isinvertible. Hence,thereisauniquesolutionto 4 givenby 4 = I )]TJ/F22 10.9091 Tf 11.147 0 Td [(Q 0 )]TJ/F20 7.9701 Tf 6.587 0 Td [(1 .Lemma4.6alsoimplies that I )]TJ/F22 10.9091 Tf 10.329 0 Td [(Q 0 )]TJ/F20 7.9701 Tf 6.587 0 Td [(1 hasallentriesnon-negativeandonlypositiveentriesalongthediagonal.So, itonlysucestoshowthat 0andthatsomeentryof is > 0. Thevector isthedierenceoftwovectors A and B ,where A = Q 0 v ; + b 0 and B = Q v ; + b .Noticethatthevectors A and B areequivalenttothevectorsobtainedby applying F 0 ; 0 and F ; ,respectively,on v ; ,wheretheoperator F 0 ; 0 or, F ; isdened 30

PAGE 40

inDenition2.4.Inotherwords, A = F 0 ; 0 v ; and B = F ; v ; .Hence,wehave = F 0 ; 0 v ; )]TJ/F22 10.9091 Tf 10.909 0 Td [(F ; v ; : .5 Consideranarbitraryvertex x 2 SINK of G .Bydenition2.4,itfollowsthatboth A x and B x equal p x ,thepayoassociatedwith x .Hence,inthiscase x =0fromEq..5. Next,supposethat x 2 S isanarbitraryvertexwith x;y x;z 2 E suchthat y = x and z = 0 x .Inthiscase, A x equals v ; 0 x and B x equals v ; x byDenition2.4. Therefore, x equals v ; z )]TJ/F22 10.9091 Tf 11.495 0 Td [(v ; y fromEq..5.Since x 2 S x isa v -switchable vertexof V MAX ,andsobyDenition2.12,itmustbethecasethat v ; z >v ; y .Hence, wegetthat x > 0. Next,supposethat x 2 T isanarbitraryvertexwith x;y x;z 2 E suchthat y = x and z = 0 x .Then,wehave A x = v ; 0 x = v ; z and B x = v ; x = v ; y Since isanoptimalstrategyw.r.t. andsince x 2 T V MIN x mustpointtotheneighbor of x thathasthesmallervaluein v ; .Thatis,itmustbethecasethat v ; y v ; z Hence,wehave x 0foreveryvertex x 2 T Finally,consideranyarbitraryvertex x 2 V )]TJ/F15 10.9091 Tf 10.93 0 Td [( S [ T [ SINK .Inthiscase,itiseasyto seethatwhenrestrictedtoposition x ,theactionsof F 0 ; 0 and F ; onanyvaluevector v are thesame.Thatis, A x equals B x .ItfollowsfromEq..5that x =0foreverysuch vertex x Thus,wehaveshownthatforevery x 2 V v 0 ; 0 x v ; x ,andforevery x 2 S ,where S isanonemptysubsetof S v 0 ; 0 x >v ; x .Thisprovesthat v 0 ; 0 v ; Lemma4.8 Let G bea stopping simplestochasticgamewithplayers MAX and MIN .Let and 0 bestrategiesofplayer MAX suchthat 0 isobtainedfrom byswitchingasinglevertex x 2 V MAX ,i.e., 0 =switch ; f x g .Then,either v ; v 0 ; 0 or v 0 ; 0 v ; .In otherwords, v ; and v 0 ; 0 arenotincomparable. ProofSketch. TheproofofthislemmaisalmostthesameasthatofLemma4.7.So,we omitthisproof. 31

PAGE 41

Lemma4.9 Let G bea stopping simplestochasticgamewithplayers MAX and MIN .If and 0 aretwodistinctstrategiesofplayer MAX suchthattheybothagreeonverticesin S i.e., 8 x 2 S x = 0 x ,then v ; v 0 ; 0 Proof. Consideranewgame G 0 = V 0 ;E 0 obtainedfrom G = V;E asfollows: V 0 equals V and E 0 isobtainedfrom E bydeletingalledges x;z 2 E suchthat x 2 S and z 6 = x andduplicatingalledges x;y 2 E suchthat x 2 S and y = x .Inotherwords,wehave V 0 = V E 0 = E )-222(f x;z j x 2 S and z 6 = x g + f x;y j x 2 S and y = x g : Here,`+'standsformultisetunionand`-'standsformultisetdierenceoperation.Notice thatevery MAX strategyin G 0 isalsoa MAX strategyin G .Also,noticethat E 0 and E have sameedges,excepttheedgesthathavetailvertexfrom S .Fromtheseobservations,itfollows that G 0 isastoppingsimplestochasticgame.Also,since and 0 agreeonverticesin S and E 0 includesalledges x;y forwhich x 2 S and y = x ,both h ; i and h 0 ; 0 i areplayerstrategiesfor G 0 .Let v ; [ G ]bethevaluevectorin G andlet v ; [ G 0 ]bethe valuevectorin G 0 correspondingtothesamepairofstrategies h ; i .Inasimilarway,we dene v 0 ; 0 [ G ]and v 0 ; 0 [ G 0 ]correspondingtothepairofstrategies h 0 ; 0 i .Wenote that v ; [ G ]= v ; [ G 0 ]andthat v 0 ; 0 [ G ]= v 0 ; 0 [ G 0 ]basedontheaforementioned observations. Next,noticethateveryvertex x 62 S is v ; [ G ]-stable.Theequalityof v ; [ G ]and v ; [ G 0 ]impliesthatevery x 62 S isalso v ; [ G 0 ]-stable.Alledges x;y 2 E such that x 2 S and y = x areduplicatedin G 0 .Hence,every x 62 S isalso v ; [ G 0 ]stable.Therefore,weseethateveryvertexof G 0 is v ; [ G 0 ]-stable,andsostrategies ; are v ; [ G 0 ]-greedybyDenition2.9.Since G 0 isastoppingSSG,Proposition2.10implies that and areoptimalstrategiesfor G 0 and v ; [ G 0 ]istheuniqueoptimalvaluevectorof G 0 .Thus,wehave v ; [ G 0 ] v 0 ; 0 [ G 0 ]byLemma2.16.Usingtheequalityofthevalue vectorsobtainedabove,wegetthat v ; [ G ] v 0 ; 0 [ G ]. 32

PAGE 42

Lemma4.10 Let G bea stopping simplestochasticgame.Let h i ; i i and h j ; j i bepairsofplayerstrategiesduringiterations i and j ,where i
PAGE 43

Assumetothecontrarythatforsome1 j V MIN j ,thenwecanrepeatthe sameproofargumentwith MAX and MIN interchanged.IntheHoman-Karpalgorithm, since,byLemma4.7,thevaluevectorsmonotonicallyincreasewiththenumberofiterations, therecanbeatmost2 n iterationscorrespondingto2 n distinctsubsetsofswitchable MAX vertices.Wepartitiontheanalysisofthenumberofiterationsintotwocases:iterations inwhich j S j n= 3anditerationsinwhich j S j >n= 3.Intherstcase,usingFact4.5, thenumberofsuchiterationsisboundedby P n= 3 k =1 )]TJ/F23 7.9701 Tf 5 -3.995 Td [(n k 2 n H = 3 sinceno MAX strategy canrepeat.Inthesecondcase,since j S j >n= 3foreachsuchiteration,byLemma4.11,the Homan-Karpalgorithmdiscardsatleast n= 3strategies i ,betterthanthecurrentstrategy ,infavorofthesuperiorstrategy 0 forthenextiteration.Thus,thenumberofiterations inwhich j S j >n= 3isboundedby 2 n n= 3 =3 2 n n .ItfollowsthattheHoman-Karpalgorithm requiresatmost2 n H = 3 +3 2 n n 4 2 n n iterationsintheworstcase. 34

PAGE 44

4.3ANewRandomizedAlgorithm WeproposeaLasVegasrandomizedalgorithmforsolvingsimplestochasticgames. OuralgorithmcanbeseenasavariationoftheHoman-Karpalgorithminthat,instead ofdeterministicallychoosingasingleswitchable MAX vertex,ourrandomizedalgorithm choosesauniformlyrandomsubsetofswitchable MAX verticesineachiteration.Similarto theresultsintheprevioussection,ourresultsinthissectionarebasedonanextensionofthe prooftechniqueofMansourandSingh[MS99],whichtheyappliedtoanalyzearandomized policyiterationalgorithmforMarkovdecisonprocesses. OurrandomizedalgorithmisdescribedasAlgorithm11. Algorithm11:OurRandomizedAlgorithm Input :Astoppingsimplestochasticgame G Output :Anoptimalpairofstrategies h ; i andtheoptimalvaluevector v opt begin 1 Let ; bearbitrarystrategiesofplayers MAX and MIN ,respectively 2 while F G v ; 6 = v ; do 3 Chooseasubset S of v ; -switchable MAX verticesuniformlyatrandom 4 Let 0 beobtainedfrom byswitchingallverticesof S 5 Let 0 beanoptimalstrategyofplayer MIN w.r.t. 0 6 Set 0 and 0 7 Output h ; i andtheoptimalvaluevector v ; 8 end 9 Lemma4.13 Foreachiteration i inAlgorithm11,let h i ; i i beapairofplayerstrategies atthestartofthisiteration.Let S i S i beasubsetof v i ; i -switchable MAX verticesand let 0 =switch i ;S i beastrategyof MAX .If v 0 ; 0 6 v i +1 ; i +1 ,thenforany j i +2 0 6 = j Proof. ByLemma4.7,foreachiteration j ,itholdsthat v j ; j v j )]TJ/F21 5.9776 Tf 5.756 0 Td [(1 ; j )]TJ/F21 5.9776 Tf 5.756 0 Td [(1 .Assumetothe contrarythatforsome j i +2, 0 = j .Then,bytransitivity,wehave v 0 ; 0 = v j ; j v j )]TJ/F21 5.9776 Tf 5.756 0 Td [(1 ; j )]TJ/F21 5.9776 Tf 5.757 0 Td [(1 v i +1 ; i +1 ,whichcontradictsthegivenfactthat v 0 ; 0 6 v i +1 ; i +1 Lemma4.14 InAlgorithm11,let h i beapairofstrategiesatthestartofaniteration, where S isnonempty,andlet h 0 0 i bethepairofstrategiesattheendofthisiteration. 35

PAGE 45

Then,theexpectednumberofstrategypairs h i ; i i suchthat v i ; i 6 v 0 ; 0 isatleast 2 j S j)]TJ/F20 7.9701 Tf 8.939 0 Td [(1 Proof. Consideraniterationinwhich h ; i isapairofplayerstrategies.Let U denote thesetofall MAX strategiesobtainedbyswitchingsomesubsetof S ,thesetof v ; switchable MAX vertices.Clearly, j U j =2 j S j .Foreachstrategy 2 U ,weassociatetwo sets:aset U + thatcontainsallstrategies 2 U suchthat v ; v ; ,andaset U )]TJ/F23 7.9701 Tf -1.189 -6.656 Td [( that containsallstrategies 2 U suchthat v ; v ; .Notethatforanypair 2 U ,we have 2 U + ifandonlyif 2 U )]TJ/F23 7.9701 Tf -1.19 -8.111 Td [( .Fromthis,itfollowsthat X 2 U j U + j = X 2 U j U )]TJ/F23 7.9701 Tf -1.19 -7.201 Td [( j j U j 2 2 : Thus,forastrategy 0 chosenuniformlyatrandomfrom U ,theexpectednumberof MAX strategies i 2 U suchthat v i ; i 6 v 0 ; 0 is j U j)]TJ/F27 10.9091 Tf 16.364 8.181 Td [(P 2 U j U + j = j U jj U j = 2=2 j S j)]TJ/F20 7.9701 Tf 8.939 0 Td [(1 Theorem4.15 Withprobabilityatleast 1 )]TJ/F15 10.9091 Tf 10.186 0 Td [(2 )]TJ/F20 7.9701 Tf 6.587 0 Td [(2 O n ,Algorithm11requiresatmost O 0 : 78 n iterationsintheworstcase,where n =min fj V MAX j ; j V MIN jg Proof. Let c 2 ; 1 = 2thatwewillxlaterintheproof.AsintheproofofTheorem4.12,we boundthenumberofiterationsinwhich h ; i isapairofplayerstrategiesand j S j cn by P cn k =0 )]TJ/F23 7.9701 Tf 5 -3.996 Td [(n k ,whichisatmost2 n H c byFact4.5. Wenextboundthenumberofiterationsinwhich h ; i isapairofplayerstrategiesand j S j >cn .Let h ; i h 0 ; 0 i beapairofplayerstrategiesatthestartrespectively, endofonesuchiteration.ByLemma4.14,theexpectednumberofstrategypairs h i ; i i suchthat v i ; i 6 v 0 ; 0 isatleast2 j S j)]TJ/F20 7.9701 Tf 8.938 0 Td [(1 .ByLemma4.13,noneofthesestrategypairs h i ; i i ischoseninanyfutureiteration.Therefore,theexpectednumberofstrategypairs h i ; i i thatAlgorithm11discardsineachsuchiterationisatleast2 j S j)]TJ/F20 7.9701 Tf 8.939 0 Td [(1 2 cn .Itfollows fromMarkov'sinequalitythatwithprobabilityatleast1 = 2,Algorithm11discardsatleast 2 cn )]TJ/F20 7.9701 Tf 6.586 0 Td [(1 strategypairsineachsuchiteration. Wesaythataniterationinwhich j S j >cn is good ifAlgorithm11discardsatleast2 cn )]TJ/F20 7.9701 Tf 6.586 0 Td [(1 pairsofstrategiesattheendofit.Weknowfromabovethattheprobabilitythataniteration inwhich j S j >cn isgoodisatleast1 = 2.ByChernobounds,forany t> 0,atleast1 = 4 36

PAGE 46

ofthe t iterationsinwhich j S j >cn willbegoodwithprobabilityatleast1 )]TJ/F15 10.9091 Tf 11.135 0 Td [(exp )]TJ/F22 10.9091 Tf 8.485 0 Td [(t= 16. Thus,thetotalnumberofiterationsisatmost2 n H c +2 n )]TJ/F23 7.9701 Tf 6.586 0 Td [(c +3 withahighprobability. For c =0 : 227andforsucientlylarge n ,thisnumberofiterationsisboundedby2 0 : 78 n .Also, whenthenumberofiterations t is2 0 : 78 n ,thentheprobabilityofsuccessis 1 )]TJ/F15 10.9091 Tf 10.864 0 Td [(2 )]TJ/F20 7.9701 Tf 6.587 0 Td [(2 O n 37

PAGE 47

CHAPTER5 RELATEDWORK,CONCLUSION,ANDOPENPROBLEMS ParitygamesaredenedinSection2.4.Thealgorithmicproblemofsolvingaparity gameis:PARITY-GAME Givenagame G = V MAX ;V MIN ;E;p denedonadirected graph G = V;E andastartvertex v 0 ,where V MAX ;V MIN isapartitionof V G and p : V G N isacolorfunction,determinewhetherplayer MAX hasawinningstrategy inthegameifthetokenisinitialyplacedonvertex v 0 .Recallthatplayer MAX winsifthe largestvertexcolor p v i ofavertex v i occurringinnitelyoftenisodd,i.e.,iflimsup i !1 p v i isodd,whileplayer MIN winsifitiseven.Here v 0 v 1 v 2 ::: istheinnitepathformed bytheplayers.ThebestknowndeterministicalgorithmforPARITY-GAMEbyJurdzinski, Paterson,andZwick[JPZ08]runsinsubexponentialtime.Somefasterrandomizedstrongly expectedsubexponential-timealgorithmsforthisproblemarebyBjorklund,Sandberg,and Vorobyov[BSV03]andbyHalman[Hal07].Therandomizedalgorithmin[BSV03]isbased ontherandomizedalgorithmofLudwig[Lud95]forsimplestochasticgames,whichinturn isinspiredbythesubexponentialrandomizedsimplexalgorithmforlinearprogrammingby Kalai[Kal92].Therandomizedalgorithmin[Hal07]isbasedontheknownalgorithmof Matousek,Sharir,andWelzl[MSW96]forLP-typeproblems. MeanpayogamesanddiscountedpayogamesarealsodenedinSection2.4.The algorithmicproblemsforthesegamesaredenedsimilartotheproblemPARITY-GAME. Thebestknownrandomizedstronglyexpectedsubexponential-timealgorithmforsolving meanpayogamesisbyHalman[Hal07],whichimprovesupontherandomizedalgorithm byBjorklund,Sandberg,andVorobyov[BSV07].Thebestknownalgorithmforsolvingdiscountedpayogamesistherandomizedstronglyexpectedsubexponential-timealgorithm byHalman[Hal07]. 38

PAGE 48

ThebestknownalgorithmsforsolvingsimplestochasticgamesarebyLudwig[Lud95] andbyHalman[Hal07].GartnerandRust[GR05]showedthatndingoptimalstrategiesfor playersinasimplestochasticgamepolynomial-timereducestothegeneralizedlinearcomplementarityproblemGLCPwithaP-matrix.Thehardnessofsolvingasimplestochastic gameisaddressedin[Jub05]. ThealgorithmssurveyedinthisthesisaresummarizedinTable5.1. Table5.1SummaryofAlgorithmsforSimpleStochasticGames Method Algorithm RunningTime IterativeApproximation bySomla 2 O n byShapley 2 O n ConvergefromBelow"byCondon 2 O n StrategyImprovement TheHoman-Karp 2 O n MathematicalProgramming AQuadraticProgram 2 O n LinearProgramsforRestrictedSSGs n O Randomized ARandomizedVariantoftheHoman-Karp 2 O n byLudwig 2 O p n OurAlgorithm 2 O n Inthisthesis,weobtainedanimprovedworst-case,upperboundonthenumberofiterationsrequiredbytheHoman-Karpstrategyimprovementalgorithm.Wealsopresented arandomizedLasVegasstrategyimprovementalgorithmwhoseexpectedrunningtimeis O 0 : 78 n Oneinterestingdirectionforfutureworkwouldbetondasuper-polynomiallowerbound onthenumberofiterationsrequiredbytheHoman-Karpalgorithmintheworstcase.Anotherinterestingdirectionwouldbetoconductexperimentalworkrelatedtocomparingalgorithmsfortheirrunningtimeperformance.Thepossibilityofapolynomial-timealgorithm, orevenanimproveddeterministicsubexponential-timealgorithm,cannnotberuledoutfor thesimplestochasticgamevalueproblem.Obtaininganalgorithmofthissortwouldbea challengingresearchdirection. 39

PAGE 49

REFERENCES [AKP + 02]A.Akella,R.Karp,C.Papadimitrou,S.Seshan,andS.Shenker.Selshbehavior andstabilityoftheinternet:Agame-theoreticanalysisofTCP.In SIGCOMM '02 ,2002. [BSV03]H.Bjorklund,S.Sandberg,andS.Vorobyov.Adiscretesubexponentialalgorithm forparitygames.In Proceedingsofthe20thAnnualSymposiumonTheoretical AspectsofComputerScience ,pages663{674.SpringerVerlag LectureNotesin ComputerScience#2607 ,2003. [BSV07]H.Bjorklund,S.Sandberg,andS.Vorobyov.Acombinatorialstronglysubexponentialstrategyimprovementalgorithmformeanpayogames. DiscreteApplied Mathematics ,155:210{229,2007. [Con92]A.Condon.Thecomplexityofstochasticgames. InformationandComputation 96:203{224,1992. [Con93]A.Condon.Onalgorithmsforsimplestochasticgames.InJ.Cai,editor, Advances inComputationalComplexityTheory ,volume13,pages51{73.DIMACSseriesin DiscreteMathematicsandTheoreticalComputerScience,AmericanMathematical Society,1993. [d'E63]F.d'Epenoux.Aprobabilisticproductionandinventoryproblem. Management Science ,10:98{108,1963. [Der70]C.Derman. FiniteStateMarkovianDecisionProcesses ,volume67of Mathematics inScienceandEngineering .AcademicPress,NewYork,1970. [EM79]A.EhrenfeuchtandJ.Mycielski.Positionalstrategiesformeanpayogames. InternationalJournalofGameTheory ,8:109{113,1979. [Fil81]J.Filar.Orderedeldpropertyforstochasticgameswhentheplayerswhocontrolstransitionchangesfromstatetostate. JournalofOptimizationTheoryand Applications ,34:503{515,1981. [FS86]J.FilarandT.Schultz.Nonlinearprogrammingandstationarystrategiesin stochasticgames. MathematicalProgramming ,35:243{247,1986. [FV97]J.FilarandK.Vrieze. CompetitiveMarkovDecisionProcesses .Springer-Verlag, 1997. [GKK88]V.Gurvich,A.Karzanov,andL.Khachiyan.Cyclicgamesandanalgorithmto ndminimaxcyclemeansindirectedgraphs. USSRComputationalMathematics andMathematicalPhysics ,28:85{91,1988. 40

PAGE 50

[GR05]B.GartnerandL.Rust.SimplestochasticgamesandP-matrixgeneralizedlinear complementarityproblems.In Proceedingsofthe15thConferenceonFundamentalsofComputationTheory ,pages209{220.SpringerVerlag LectureNotesin ComputerScience#3623 ,2005. [GW02]E.GradelandT.Wolfgang.Automata,logics,andinnitegames. LectureNotes inComputerScience ,2500,2002. [Hal07]N.Halman.Simplestochasticgames,paritygames,meanpayogamesanddiscountedpayogamesareallLP-typeproblems. Algorithmica ,49:37{50,2007. [HK66]A.HomanandR.Karp.Onnonterminatingstochasticgames. Management Science ,12:359{370,1966. [How60]R.Howard. DynamicProgrammingandMarkovProcesses .M.I.T.Press,Cambridge,MA,1960. [JPZ08]M.Jurdzinski,M.Paterson,andU.Zwick.Adeterministicsubexponentialalgorithmforsolvingparitygames. SIAMJournalonComputing ,38:1519{1532, 2008. [Jub05]B.Juba.Onthehardnessofsimplestochasticgames.Master'sthesis,Carnegie MellonUniversity,2005. [Juk01]S.Jukna. ExtremalCombinatorics .Springer,2001. [Kal92]G.Kalai.Asubexponentialrandomizedsimplexalgorithm.In Proceedingsofthe 24thACMSymposiumonTheoryofComputing ,pages475{482.ACMpress,1992. [Kar84]N.Karmarkar.Anewpolynomial-timealgorithmforlinearprogramming. Combinatorica ,4:373{397,1984. [Kha79]L.Khachiyan.Apolynomialalgorithminlinearprogramming. SovietMath.Dokl. 20:191{194,1979. [KT04]V.KumarandR.Tripathi.Algorithmicresultsinsimplestochasticgames.TechnicalReportTR855,DepartmentofComputerScience,UniversityofRochester, November2004. [Lud95]W.Ludwig.Asubexponentialrandomizedalgorithmforthesimplestochastic gameproblem. InformationandComputation ,117:151{155,1995. [MC90]M.MelekopoglouandA.Condon.OnthecomplexityofthePolicyIteration algorithmforsimplestochasticgames.TechnicalReportTR941,Departmentof ComputerScience,UniversityofWisconsin-Madison,June1990. [MGLA00]M.Mundhenk,J.Goldsmith,C.Lusena,andE.Allender.Complexityofnitehorizonmarkovdecisionprocessproblems. JournaloftheACM ,47:681{720, 2000. [MS99]Y.MansourandS.Singh.OnthecomplexityofPolicyIteration.In Proceedingsof the15thConferenceonUncertainityinArticialIntelligence,Stockholm,Sweden pages401{408.MorganKaufmann,July1999. 41

PAGE 51

[MSW96]J.Matousek,M.Sharir,andE.Welzl.Asubexponentialboundforlinearprogramming. Algorithmica ,16/5:498{516,1996. [Sha53]L.Shapley.Stochasticgames.In ProceedingsofNationalAcademyofSciences U.S.A. ,volume39,pages1095{1100,1953. [Som05]R.Somla.Newalgorithmsforsolvingsimplestochasticgames. ElectronicNotes inTheoreticalComputerScience ,119:51{65,2005. [SS95]S.ShenkerandJ.Scott.Makinggreedworkinnetworks:agame-theoreticanalysis ofswitchservicedisciplines. IEEE/ACMTrans.Netw. ,3:819{831,1995. [Yao77]A.Yao.Probabilisticcomputations:Towardauniedmeasureofcomplexity.In Proceedingsofthe18thIEEESymposiumonFoundationsofComputerScience pages222{227,1977. [ZP96]U.ZwickandM.Paterson.Thecomplexityofmeanpayogamesongraphs. TheoreticalComputerScience ,158:343{359,1996. 42

close
Choose Size
Choose file type
Cite this item close

## APA

Cras ut cursus ante, a fringilla nunc. Mauris lorem nunc, cursus sit amet enim ac, vehicula vestibulum mi. Mauris viverra nisl vel enim faucibus porta. Praesent sit amet ornare diam, non finibus nulla.

## MLA

Cras efficitur magna et sapien varius, luctus ullamcorper dolor convallis. Orci varius natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Fusce sit amet justo ut erat laoreet congue sed a ante.

## CHICAGO

Phasellus ornare in augue eu imperdiet. Donec malesuada sapien ante, at vehicula orci tempor molestie. Proin vitae urna elit. Pellentesque vitae nisi et diam euismod malesuada aliquet non erat.

## WIKIPEDIA

Nunc fringilla dolor ut dictum placerat. Proin ac neque rutrum, consectetur ligula id, laoreet ligula. Nulla lorem massa, consectetur vitae consequat in, lobortis at dolor. Nunc sed leo odio.