USFDC Home  USF Electronic Theses and Dissertations   RSS 
Material Information
Subjects
Notes
Record Information

Full Text 
xml version 1.0 encoding UTF8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd leader nam Ka controlfield tag 001 001681013 003 fts 005 20060215071120.0 006 med 007 cr mnuuuuuu 008 051212s2005 flu sbm s000 0 eng d datafield ind1 8 ind2 024 subfield code a E14SFE0001115 035 (OCoLC)62558052 SFE0001115 040 FHM c FHM 049 FHMM 090 TK7885 (Online) 1 100 Ramesh, Vasanth Kumar. 2 245 A game theoretic framework for dynamic task scheduling in distributed heterogeneous computing systems h [electronic resource] / by Vasanth Kumar Ramesh. 260 [Tampa, Fla.] : b University of South Florida, 2005. 502 Thesis (M.S.C.P.)University of South Florida, 2005. 504 Includes bibliographical references. 516 Text (Electronic thesis) in PDF format. 538 System requirements: World Wide Web browser and PDF reader. Mode of access: World Wide Web. 500 Title from PDF of title page. Document formatted into pages; contains 94 pages. 3 520 ABSTRACT: Heterogeneous Computing (HC) systems achieve high performance by networking together computing resources of diverse nature. The issues of task assignment and scheduling are critical in the design and performance of such systems. In this thesis, an auction based game theoretic framework is developed for dynamic task scheduling in HC systems. Based on the proposed game theoretic model, a new dynamic scheduling algorithm is developed that uses auction based strategies. The dynamic scheduling algorithm yields schedules with shorter completion times than static schedulers while incurring higher scheduling overhead. Thus, a second scheduling algorithm is proposed which uses an initial schedule generated with a learning automaton based algorithm, and then heuristics are used to identify windows of tasks within the application that can be rescheduled dynamically during run time. 590 Adviser: Dr. N. Ranganathan. 653 Variable structure stochastic automaton. Nash equilibrium. Nplayer games. Auctions. Task assignment. Game theory. 0 690 Dissertations, Academic z USF x Computer Engineering Masters. 773 t USF Electronic Theses and Dissertations. 4 856 u http://digital.lib.usf.edu/?e14.1115 PAGE 1 A GAME THEORETIC FRAMEW ORK FOR D YN AMIC T ASK SCHEDULING IN DISTRIB UTED HETER OGENEOUS COMPUTING SYSTEMS by V ASANTH KUMAR RAMESH A thesis submitted in partial fulllment of the requirements for the de gree of Master of Science in Computer Engineering Department of Computer Science and Engineering Colle ge of Engineering Uni v ersity of South Florida Major Professor: N. Rang anathan Ph.D. Srini v as Katk oori Ph.D. Soontae Kim Ph.D. Date of Appro v al: April 8, 2005 K e yw ords: V ariable Structure Stochastic Automation, Game Theory Nash Equilibrium, Auctions, T ask Assignment cCop yright 2005, V asanth K umar Ramesh PAGE 2 DEDICA TION T o my belo v ed parents Sri. M. S. Ramesh and Smt. Chandr a Ramesh PAGE 3 A CKNO WLEDGEMENTS W ords cannot e xpress my emotions and it w as almost impossible for me to e xpress my heartfelt gratitude to Dr N. Rang anathan. It w as just the moti v ation I needed to break my dry routine. I am truly blessed to ha v e him as a mentor and friend. My mentor has led me to dedicate my ef forts to the pursuit of research and his contrib ution to my w ork is something that cannot be g auged in an y scale. This is the result of man y hours of discussions and innumerable telephone calls. His enthusiastic approach to life and studies has gi v en me a ne w outlook on a research career I w ould lik e to thank Dr Srini v as Katk oori and Dr Soontae Kim for their critical remarks and suggestions to impro v e the w ork. I am v ery grateful to Prof. W aran who has al w ays been moti v ating me through his golden w ords of advice. He has al w ays boosted my condance during hard times. I e xpress my heartfelt gratitude to my mother my rst Guru, for her lo v e, af fection and constant encouragement and care throughout. I w ould lik e to thank my younger brother Prem ,for his lo v e and concern. Finally I w ould lik e to thank Pradeep, Mali and Naren for their timely help. PAGE 4 T ABLE OF CONTENTS LIST OF T ABLES iii LIST OF FIGURES i v LIST OF ALGORITHMS vii ABSTRA CT viii CHAPTER 1 INTR ODUCTION 1 1.1 Introduction 1 1.2 T axonomy of Heterogenous Computing Systems 3 1.2.1 System Heterogenous Computing 3 1.2.2 Netw ork Heterogenous Computing 5 1.3 Issues in Heterogenous Computing Systems 5 1.3.1 Algorithm Design 8 1.3.2 Codetype Proling and Analytical Benchmarking 8 1.3.3 P artition 8 1.3.4 T ask Mapping 8 1.3.5 Netw ork Requirement 9 1.3.6 Programming En vironment 9 1.3.7 Performance Ev aluation 9 1.4 Moti v ation 10 1.5 Outline of the Thesis 10 CHAPTER 2 RELA TED W ORK 11 2.1 Introduction 11 2.2 Graph Theoretic Algorithms 12 2.3 Non Iterati v e Heuristic T echniques 15 2.4 Optimal Selection Theory 17 2.5 Simulated Annealing 18 2.6 T ab u Search 19 2.7 Genetic Algorithms 20 2.8 Genetic Simulated Annealing Algorithms 21 2.9 Learning Automata Algorithms 22 2.10 List Scheduling Algorithms 24 i PAGE 5 CHAPTER 3 PRELIMIN ARIES OF GAME THEOR Y 28 3.1 Introduction to Game Theory 28 3.2 Basic Denitions 28 3.2.1 Games in Extensi v e F orm 30 3.2.2 Games in Normal F orm 31 3.2.3 Games in Characteristic Function F orm 32 3.2.4 T ypes of Games 32 3.3 Equilibrium in Games 33 3.4 Auction Theory 34 3.4.1 T ypes of Auctions 34 3.4.2 Game Theory as an Optimization T ool for T ask Scheduling 34 CHAPTER 4 D YN AMIC SCHEDULING USING GAME THEOR Y 36 4.1 Problem Statement 36 4.2 Elements of the HC System Model 37 4.2.1 Cost Metric 42 4.3 Dynamic T ask Scheduling as a First Price Sealed Bid Auction 43 4.4 Dynamic Scheduling Algorithm 44 4.4.1 The Conserv ati v e Strate gy 45 4.4.2 The Aggressi v e Strate gy 46 4.4.3 Calculation of the P ayof f 47 4.4.4 Dynamic T ask Mapping based on Nash Equilibrium 47 4.4.5 Analysis of the Algorithm 48 4.5 Simulation Results and Discussion 48 CHAPTER 5 D YN AMIC SCHEDULING USING HEURISTICS 57 5.1 Moti v ation 57 5.2 Heuristic Based Model 58 5.3 Selection Heuristics for the Proposed Scheduler 59 5.3.1 Selection Heuristic H1 60 5.3.2 Selection Heuristic H2 61 5.3.3 Selection Heuristic H3 62 5.3.4 Selection Heuristic H4 64 5.3.5 Selection Heuristic H5 and H6 65 5.4 Heuristic based Dynamic Scheduling Algorithm 66 5.5 Simulation Results and Discussion 67 CHAPTER 6 CONCLUSIONS 77 REFERENCES 78 ii PAGE 6 LIST OF T ABLES T able 3.1 Normal F orm Representation of RockP aper Scissors Game 32 T able 4.1 Data used for Generating Random Graphs 50 T able 5.1 Data used for Ev aluatingparameter 68 T able 5.2 Data used for Ev aluating Heuristics 71 iii PAGE 7 LIST OF FIGURES Figure 1.1 T ask Proling Example Illustrating the Adv antage of HC Systems [1 ] 2 Figure 1.2 Classication of Heterogenous Computing Systems [2] 3 Figure 1.3 Structure of a Distrib uted Memory SIMD [2] 4 Figure 1.4 Structure of a Distrib uted Memory MIMD [2] 4 Figure 1.5 Structure of NHC System [2] 6 Figure 1.6 Conceptual Model of the Assignment of Subtasks to Machines in a HC System En vironment (adapted from [1]) 7 Figure 2.1 T axonomy of T ask Assignment Algorithms [3] 12 Figure 3.1 An Example of a Game T ree 29 Figure 3.2 Extensi v e F orm Representation of RockP aper Scissors Game 31 Figure 3.3 Extensi v e F orm Representation when Players Mo v e Simultaneously 31 Figure 3.4 Classication of Games 33 Figure 4.1 The TFG of Gaussian Elimination Algorithm[4 ] 38 Figure 4.2 The TFG of F ast F ourier T ransformation Algorithm[4] 39 Figure 4.3 An Application T ask Flo w Graph withandnr40 Figure 4.4 An Example of Processor Graph withandn41 Figure 4.5 Frame w ork of the Proposed Dynamic Scheduler 45 Figure 4.6 Freezeframe Illustrating the System State when! "$#and%are being Dynamically Scheduled using the Proposed Game Theoretic Scheduler 46 Figure 4.7 Lo w Communication Comple xity ,&50 Figure 4.8 Medium Communication Comple xity ,&51 i v PAGE 8 Figure 4.9 High Communication Comple xity ,&52 Figure 4.10 Lo w Communication Comple xity 52 Figure 4.11 Medium Communication Comple xity 53 Figure 4.12 High Communication Comple xity 53 Figure 4.13 Lo w Communication Comple xity ,54 Figure 4.14 Medium Communication Comple xity ,54 Figure 4.15 High Communication Comple xity ,55 Figure 4.16 Lo w Communication Comple xity ,& 55 Figure 4.17 Medium Communication Comple xity ,& 56 Figure 4.18 High Communication Comple xity ,& 56 Figure 5.1 Frame w ork of the proposed Hybrid Scheduler 59 Figure 5.2 The TFG of Figure 4.3 after Application of Selection Heuristic H1 60 Figure 5.3 The TFG of Figure 4.3 after Application of Selection Heuristic H2 61 Figure 5.4 The TFG of Figure 4.3 after Application of Selection Heuristic H3 63 Figure 5.5 The TFG of Figure 4.3 after Application of Selection Heuristic H4 65 Figure 5.6 The TFG of Figure 4.3 after Application of Selection Heuristic H5 with criteria of partition being node weight 66 Figure 5.7 The TFG of Figure 4.3 after Application of Selection Heuristic H6 with criteria of partition being edge weight 67 Figure 5.8 Freezeframe Illustrating the System State when! "$#and%are being Dynamically Scheduled using the Proposed Heuristics on the Game Theoretic Scheduler 69 Figure 5.9 Mix v Completion T ime for with Lo w Dependenc y 70 Figure 5.10 Mix v Completion T ime for with Medium Dependenc y 70 Figure 5.11 Mix v Completion T ime for with High Dependenc y 71 Figure 5.12 Size vs. Completion T ime for and with Lo w Data Dependenc y 72 v PAGE 9 Figure 5.13 Size vs. Completion T ime for and with Lo w Data Dependenc y 72 Figure 5.14 Size vs. Completion T ime for & and with Lo w Data Dependenc y 73 Figure 5.15 Size vs. Completion T ime for and with Medium Data Dependenc y 73 Figure 5.16 Size vs. Completion T ime for and with Medium Data Dependenc y 74 Figure 5.17 Size vs. Completion T ime for & and with Medium Data Dependenc y 74 Figure 5.18 Size vs. Completion T ime for and with High Data Dependenc y 75 Figure 5.19 Size vs. Completion T ime forand with High Data Dependenc y 75 Figure 5.20 Size vs. Completion T ime for& and with High Data Dependenc y 76 vi PAGE 10 LIST OF ALGORITHMS 1 Dynamic Scheduler based on First Price Sealed Bid Auction 47 2 W indo w() 47 3 BestCandidates(List) 48 4 Simulation Procedure 49 5 Pseudocode for T ranslation of Selection Heuristic H1 61 6 Pseudocode for T ranslation of Selection Heuristic H2 62 7 Pseudocode for T ranslation of Selection Heuristic H3 63 8 Pseudocode for T ranslation of Selection Heuristic H4 64 9 Pseudocode for T ranslation of Selection Heuristics H5 and H6 66 10 Heuristic Based Dynamic Scheduling Algorithm 68 11 Modied W indo w() 68 vii PAGE 11 A GAME THEORETIC FRAMEW ORK FOR D YN AMIC T ASK SCHEDULING IN DISTRIB UTED HETER OGENEOUS COMPUTING SYSTEMS V asanth K umar Ramesh ABSTRA CT Heterogeneous Computing (HC) systems achie v e high performance by netw orking together computing resources of di v erse nature. The issues of task assignment and scheduling are critical in the design and performance of such systems. In this thesis, an auction based g ame theoretic frame w ork is de v eloped for dynamic task scheduling in HC systems. Based on the proposed g ame theoretic model, a ne w dynamic scheduling algorithm is de v eloped that uses auction based strate gies. The dynamic scheduling algorithm yields schedules with shorter completion times than static schedulers while incurring higher scheduling o v erhead. Thus, a second scheduling algorithm is proposed which uses an initial schedule generated with a learning automaton based algorithm, and then heuristics are used to identify windo ws of tasks within the application that can be rescheduled dynamically during run time. The algorithm yields signicantly better completion times compared to static scheduling while incurring lesser o v erhead than a purely dynamic scheduler Se veral dif ferent heuristics are in v estig ated and compared in terms of ho w the y impact the o v erall scheduler performance. Experimental results indicate that the proposed algorithms perform signicantly better than pre vious algorithms reported in the literature. viii PAGE 12 CHAPTER 1 INTR ODUCTION In this chapter the eld of heterogenous computing (HC) is introduced, the v arious classications, adv antages and disadv antages are discussed. Se v eral issues related to the HC system are addressed and the moti v ation for taskassignment and scheduling in HC system is discussed. Also, an introduction to task assignment and scheduling, which is the focus of this thesis, is presented. 1.1 Intr oduction Scientists and engineers ha v e al w ays been stri ving hard to b uild machines for high per formance. T echnology gro wth has led to the de v elopment of processors with hundreds of millions of de vices within a die. T raditionally computer architects ha v e been focussed on de v eloping homogenous computing models to deli v er superior performance. Hence their goal w as to de v elop a single architecture that could satisfy the requirements of a wide range of applications. Modern day applications ha v e almost saturated the capabilities of such architectures. Currently we ha v e come to belie v e that single architecture computers are no longer suitable for high performance. Studies ha v e sho wn that most of the time, the processor e x ecutes code for which it is poorly suited[1]. One w ay to o v ercome this problem is to b uild a system with se v eral types of architectures and then attempt to match the requirement of application to the suitable processor Hence, the focus is shifting to w ards heterogenous computing systems. In [5] HC is dened as 1 PAGE 13 tuned use of diver se pr ocessing har dwar e to meet distinct computational needs. Another popular denition for HC system is gi v en in [6] as the well or c hestr ated and coor dinated ef fective use of a suite of diver se highperformance mac hines to pr o vide super speed pr ocessing for computationally demanding tasks with diver se computing needs. From these denitions it is clear that the k e y feature in a HC system is the diver sity of the processor architectures. High performance is achie v ed by e xploiting this k e y feature. Figure 1.1 illustrates the concept of heterogenous computing. n n n n rrrrr rrrrr rrrrr rrrrr Execution on a Vector Supercomputer 30% 15% 20% 25% 10% 1% 4% 10% 15% 20% 1% 2% 0.5% 0.5% 1% Twice as fast as baseline 20 times as fast as baseline Execution on a Heterogeneous Suite Special Purpose Vector MIMD SIMD Dataflow Figure 1.1. T ask Proling Example Illustrating the Adv antage of HC Systems [1] It is a kno wn f act that achie ving usable as opposed to peak performance is a grand challenge problem [7]. Man y HighPerformance Computing (HPC) systems achie v e only a fraction of their peak performance. Another feature that attracts scientists to use HC systems for supercomputing is its economic viability Instead of replacing systems with high cost supercomputers, HC of fers a structured methodology to inte grate e xisting systems for high performance computing. Distrib uted Computing and Netw ork Computing ha v e netw orks that v ary in the topology and bandwidth and hence heterogeneity in these systems is partly justied. These systems are homogenous with respect to the computa2 PAGE 14 tional f acilities and lack the full spectrum of a HC system. The fundamental heterogenous processing heuristic is: First e v aluate suitability of tasks to processor types, then loadbalance among selected machines for the nal assignment. [5] 1.2 T axonomy of Heter ogenous Computing Systems Heterogenous computing is broadly classied into tw o cate gories, namely: 1. System Heterogenous Computing 2. Netw ork Heterogenous Computing Figure 1.2 illustrates the classication of heterogenous computing systems. The y are discussed briey in the follo wing sections. H e t e r o g e n o u s C o m p u t i n g S y s t e m H e t e r o g e n o u s C o m p u t i n g N e t w o r k H e t e r o g e n o u s C o m p u t i n g M i x e d M o d e M u l t i M o d e M i x e d M a c h i n e M u l t i M a c h i n e Figure 1.2. Classication of Heterogenous Computing Systems [2] 1.2.1 System Heter ogenous Computing System Heterogenous Computing (SHC) consists of a single supercomputer that can be congured to e x ecute tasks in dif ferent modes such as SIMD and MIMD. Figures 1.3 and 1.4 illustrate a distrib uted memory SIMD and a distrib uted memory MIMD respecti v ely The SHC can be further classied into mixed mode and multi mode systems. Mixed Mode: In mix ed mode operation, the processing elements switch between dif ferent 3 PAGE 15 modes of operation. But, at an y gi v en time, the system can e x ecute in only one of the modes. The P ASM [8 ] system is an e xample of SHC system. MultiMode: In multimode operation, the processing elements can e x ecute in dif ferent modes at the same time. The Image Understanding Architecture [9] system is a multile v el system where each le v el comprises of processing elements congured in dif ferent modes. The tasks are mapped onto the dif ferent layers and the y can coe xist. P E 0 P E 1 P E 2 P E n 1 P r o c 0 M e m 0 I n t e r c o n n e c t i o n N e t w o r k P r o c 1 M e m 1 P r o c 2 M e m 2 P r o c n 1 M e m n 1 C o n t r o l U n i t Figure 1.3. Structure of a Distrib uted Memory SIMD [2] P E 0 P E 1 P E 2 P E n 1 P r o c 0 M e m 0 I n t e r c o n n e c t i o n N e t w o r k P r o c 1 M e m 1 P r o c 2 M e m 2 P r o c n 1 M e m n 1 Figure 1.4. Structure of a Distrib uted Memory MIMD [2] 4 PAGE 16 1.2.2 Netw ork Heter ogenous Computing The Netw ork Heterogenous Computing (NHC) system consists of autonomous computers that are connected in a netw ork. These computers ha v e the ability to e x ecute tasks concurrently NHC systems are further classied as mixed mac hine and multi mac hine systems. Multi machine NHC systems are studied as a special case of mix ed machine NHC systems. There are three layers in NHC, namely: Network layer Communication Layer and Pr ocessing Layer Network Layer: This is the lo west layer This layer tak es care of handling all the issues related to connecting tw o computers such as which routing path to tak e, which routers to use and which protocols to follo w and other issues related to computer netw orks. Communication Layer: This is the intermediate layer and pro vides mechanisms for computers to communicate with each other This layer also pro vides utilities and tools that enable user to vie w a group of computer as a single virtual machine. Some e xamples of these tools include P arallel V irtual Machine PVM [10 ], Message P assing Interf ace MPI [11],Linda [12], p4[13 ], Mentat[14 ], HeNCE [15, 6] and Ja v a programming language. These tools run as a system daemons and pro vide service to an y request from the application process. Pr ocessing Layer: This layer pro vides tools and techniques that ensure ef cient e x ecution of the applications. This layer controls the performance of the heterogenous system. Examples of some services pro vided in this layer include task decomposition, task mapping and load balancing. 1.3 Issues in Heter ogenous Computing Systems There are v arious issues to be addressed in a heterogenous computing system. These issues are enumerated belo w:Algorithm design 5 PAGE 17 Wide Area Network Network Layer Communication Layer Processing Layer Hop5 Hop4 Hop3 Hop2 Hop1 Machine 1 Locan Area Network 1 Machine 2 Machine 3 Machine 4 Machine 5 Locan Area Network 2 Locan Area Network 3 Locan Area Network 5 Locan Area Network 4 Figure 1.5. Structure of NHC System [2]Codetype proling and analytical benchmarkingCode P artitioningT ask mappingNetw ork requirementProgramming en vironmentPerformance e v aluation A brief o v ervie w of these issues is presented ne xt. A sequence of steps in v olv ed in [16] is depicted in gure 1.6. 6 PAGE 18 a p p l i c a t i o n s g e n e r a t i o n o f p a r a m e t e r s t h a t a r e r e p e s e n t e d a s g e n e r a l c h a r a c t e r i s t i c s o f c o m p u t a t i o n a l r e q u i r e m e n t s a n d g e n e r a l c h a r a c t e r i s t i c s o f m a c h i n e c a p a b i l i t i e s t a s k p r o f i l i n g f o r a g i v e n a p p l i c a t i o n g e n e r a l c h a r a c t e r i s t i c s o f c o m p u t a t i o n a l r e q u i r e m e n t s g e n e r a l c h a r a c t e r i s t i c s o f m a c h i n e c a p a b i l i t i e s m a c h i n e s i n t h e h e t e r o g e n o u s s u i t e a n a l y t i c a l b e n c h m a r k i n g f o r t h e m a c h i n e s i n t h e h e t e r o g e n o u s s u i t e s p e c i f i c c h a r a c t e r i s t i c s o f e a c h s u b t a s k o f t h e a p p l i c a t i o n s p e c i f i c c h a r a c t e r i s t i c s o f m a c h i n e s a n d i n t e r m a c h i n e c o m m u n i c a t i o n o v e r h e a d m a t c h i n g a n d s c h e d u l i n g o f s u b t a s k s t o m a c h i n e s b a s e d o n c o s t m e t r i c a s s i g n m e n t o f s u b t a s k s t o m a c h i n e s i n t h e h e t e r o g e n o u s s u i t e c u r r e n t l o a d i n g s t a t u s o f m a c h i n e s a n d n e t w o r k e x e c u t i o n o f t h e g i v e n a p p l i c a t i o n o n t h e h e t e r o g e n o u s s u i t e o f m a c h i n e s i n f o r m a t i o n a c t i o n Figure 1.6. Conceptual Model of the Assignment of Subtasks to Machines in a HC System En vironment (adapted from [1]) 7 PAGE 19 1.3.1 Algorithm Design Man y parameters must be tak en into account in order to design an ef cient algorithm for heterogenous system. The y are :The v arious alternati v es to solv e the problem must be e xplored. The application must be analyzed and its inherent heterogeneity must be e xploited by the algorithm.The number of machines in the suite and their computational abilities, instruction set and architectural features must be tak en into account.The cost of communicating through the netw ork is an important f actor that should not be o v erlook ed when designing an algorithm for HC system. 1.3.2 Codetype Pr oling and Analytical Benchmarking Code type pr oling or task proling is dened as a method to quantify the types of computations that are present in an application [17 ]. Analytical benc hmarking is dened as the procedure that pro vides a measure of ho w well the a v ailable machines in the HC suite perform on the gi v en set of codetypes [17]. 1.3.3 P artition P artitioning an application in a HC system in v olv es solving tw o subproblems. The y are P ar allelism Detection and Clustering P ar allelism Detection in v olv es determination of the kind of parallelism that is present in the gi v en application. Clustering combines se v eral operations in the application into a single module. 1.3.4 T ask Mapping T ask mapping in v olv es the twin process of task assignment and task scheduling. T asks assignment is also kno wn as task matching. T ask assignment determines which machine is best suited to e x ecute a gi v en task. 8 PAGE 20 T ask sc heduling determines when to e x ecute the task so that a performance cost metric of the system is optimized. The task mapping algorithms can be broadly classied into tw o cate gories: Static and dynamic. A detailed surv e y of these techniques is presented in chapter 2. 1.3.5 Netw ork Requir ement The interconnection netw ork impacts the performance of a HC system. The communication requirements of a HC system are of the order of gig abits per second, whereas modern day LANs can pro vide only fe w tens of me g abits per second. Hence LANs are unsuitable for HC computing. Special high speed netw orks with bre optic cables and po werful netw ork interf ace processors are required for b uilding a HC system netw ork. Fe w e xamples of HC netw ork systems are Nectar [18], and HiPPI [19]. 1.3.6 Pr ogramming En vir onment Programming en vironments should pro vide a layer of abstraction between the user and the machines. This layer should mak e pro vision for parallel portable machineindependent programming language. It should pro vide crossparallel compilers and cross deb uggers that support a wide range of architectures. Linda [12 ] and mpC [20 ] are some e xamples of programming en vironments for HC systems. 1.3.7 P erf ormance Ev aluation Performance e v aluation tools monitor the performance of the HC system. The y summarize the runtime beha vior of the application, resource usage and determine the cause for a bottleneck. These tools collect, interpret and e v aluate the information from v arious applications, operating systems, netw ork and the hardw are in the system. 9 PAGE 21 1.4 Moti v ation A static scheduler performs task assignment and scheduling using estimated v alues. The e xact information about the e x ecution times and the communication times are not kno wn when static scheduling is performed. The estimated v alues may dif fer from the actual v alues due to some reasons lik e congestion in the netw ork or errors in estimation techniques and other uctuations. Ho we v er if the information is a v ailable in adv ance, the static scheduler easily outperforms a dynamic scheduler Since a dynamic scheduler uses the most recent system information, it is immune to changes in the netw ork properties. Game theory is kno wn to identify equilibrium points in decision making problems, where the agents participating ha v e conicts in interests. It is also well kno wn that the equilibrium point is a social equilibrium. That is, all the agents are satised with their decisions at that point. In HC systems, deciding which machine a subtask is to be assigned so that the entire application benets is a critical issue. This moti v ates to us to apply g ame theory to the problem of task scheduling in HC systems. 1.5 Outline of the Thesis The outline of the thesis is as follo ws. Chapter 2 discusses the related w orks in the area of task assignment and scheduling in a HC system. Chapter 3 gi v es a brief introduction to the theory of g ames and auctions. A ne w frame w ork for dynamic task scheduling using auctions and g ames is described in chapter 4. Chapter 5 describes the six heuristics that are used to reduce the o v erhead of the dynamic scheduler Concluding remarks are presented in Chapter 6. A Game Theoretic Frame w ork for Dynamic T ask Scheduling in Distrib uted Heterogeneous Computing Systems 10 PAGE 22 CHAPTER 2 RELA TED W ORK The performance of an y HC system depends hea vily on ho w the application is mapped (matched and scheduled) to the system. Therefore, it is important for the scheduling algorithms to be f ast, ef cient, scalable and able to e xploit an y unique feature(s) of the system or the application to its adv antage. V arious algorithms, ranging from simple techniques such as greedy scheduling heuristic to comple x graph partitioning and genetic algorithms, ha v e been proposed in the literature. In this chapter a detailed surv e y of fe w mapping techniques for heterogenous computing systems is presented. 2.1 Intr oduction An application pr o gr am is a sequence of coded procedures and functions held together by some glue codes. These procedures and functions can be collecti v ely called as tasks of the application. Some of these tasks may depend on other tasks and therefore ha v e to w ait for the inputs before the y can start e x ecution, whereas other tasks, which do not depend on an y other tasks, may e x ecute without w aiting. The data structure that best captures this beha viour of an application is the T ask Flo w Graph TFG A nnis a directed ac yclic graph ( D A G ) in which, e v ery node r represents a task in the application and e v ery edge # nrepresents dependenc y of taskon task. Similarly the suite of heterogenous processors a v ailable in the netw ork can also be vie wed as a processor graph, nnwhere, there is a corresponding to e v ery machine in the netw ork and an edge# nconstitutes a communication link fromtoin PG. 11 PAGE 23 T ask matching is the process of selecting on whic h processor a particular task should run such that certain optimization criteria are satised. T ask scheduling follo ws task mapping and in v olv es determining when that task should run on the processor It has been sho wn that the mapping problem is NPcomplete once the number of processors in the system e xceeds 2 [21]. That is, no polynomial time algorithm e xists to solv e the task mapping problem. Therefore, heuristicalgorithms are needed to solv e the mapping problem. Scheduling algorithms ha v e been classied based on the techniques the y use to solv e the task mapping problem: graph theoretic techniques, list schedulers, optimal selection theory and learning automata to name a fe w In the follo wing sections, a brief o v ervie w of these techniques is presented. Figure 2.1 illustrates the taxonomy of task assignment algorithms. S t a t i c T a s k A s s i g n m e n t O p t i m a l S u b O p t i m a l R e s t r i c t e d N o n R e s t r i c t e d A p p r o x i m a t e H e u r i s t i c s G r a p h T h e o r y M a t h m a t i c a l P r o g r a m m i n g R a n d o m i z e d O p t i m i z a t i o n T a s k C l u s t e r i n g G r e e d y S t a t e S p a c e S e a r c h G e n e t i c A l g o r i t h m s S i m u l a t e d A n n e a l i n g M e a n F i e l d A n n e a l i n g V a r i a b l e S t a t e S t o c h a i s t i c A u t o m a t i o n Figure 2.1. T axonomy of T ask Assignment Algorithms [3] 2.2 Graph Theor etic Algorithms Since applications and netw orks are represented as graphs, it is intuiti v e to use graph theoretic techniques to solv e the scheduling problem. A popular graph based scheduling is proposed in [21 ]. The scheduling algorithm is based on maxo w/mincut heuristic. Ho we v er this algorithm can produce optimal solutions only for a tw o processor system. This w ork is further e xtended in [22] to generate suboptimal solutions for a general HC 12 PAGE 24 system. The algorithm called A, uses a heuristic that combines a recursi v e in v ocation of maxo w/mincut algorithm with a greedy type algorithm. It consists of three parts: 1. Gr ab: F or anprocessor graph, for each processor a ne w tw oprocessor graph system is constructed withand where represents the other processors of the system. The maxo w/mincut algorithm as outlined in [22] is then used to determine the tasks that can be assigned to processor This step is repeated for all the processors in the system to generate a partial mapping solution. 2. Lump: In this step, all the tasks that remain unmapped in the gr ab step are mapped to one processor 3. Gr eedy: F or those tasks that are unmapped in the lump step, tasks with high communication costs between them are identied. All the tasks in the same cluster are then mapped to the processor that can complete their e x ecution at the earliest. The A* Algorithm Scheduling algorithms based on the A* search technique from the eld of articial intelligence is proposed in [3]. The A* algorithm is used to search ef ciently in the search space, in this case, a tree. It searches from a node in the tree kno wn as the start node The intermediate nodes represent the partial solutions and the leaf nodes represent complete solutions or goals Associated with each node is a cost which is computed by a cost function. The nodes are ordered for search according to this cost, that is, the node with the minimum cost is searched rst. Essentially A* algorithm is a best r st search algorithm. The v alue offor each nodeis computed as: n (2.1) where is the cost of the search path from the start node to the current node;n is a lo wer bound estimate of the path cost from nodeto the goal node. Expansion of a node is to generate all of its successors and compute the costfor each of them. The 13 PAGE 25 algorithm maintains a sorted list of nodes and al w ays selects a node with the best cost from this list. The ef cienc y of this algorithm depends on the accurac y of the prediction of the v alues forn This is a major dra wback of this approach. Moreo v er special treesearching and pruning techniques are required when the number of tasks and machines increase. Another graph theoretic algorithm is proposed in [23 ] called cluster M mapping algorithm for mapping tasks with nonuniform computation and communication weights. T w o clustering algorithms are proposed which is used to obtain a multilayer clustered graph of tasks ( Specgr aph ) and machines( Repgr aph ). The cluster M algorithm is used to map the nodes of specgraph to nodes of repgraph. Scheduling algorithm based on the minimum spanning tree of a graph is presented in [24]. The technique e xploits the data distrib ution properties of the application. T w o kinds of data distrib ution are considered in this approach: data r euse and multiple data copies Data reuse refers to the condition when tw o or more subtasks located at the same processor need the same data item from a subtask at another processor When the tasks reside in dif ferent processors and require the same data from a subtask residing in another processor the condition is referred to as multiple data copies. The algorithm rst constructs a TFG from the input application. Then Prim' s minimum spanning tree algorithm [25] is used to construct the minimum spanning tree ( MST ) of the TFG. The order in which the v ertices of the TFG are added to the MST correspond to the e x ecution order of the subtasks in the application. T w o algorithms are proposed in [26] namely: HP Greedy and the OLR OG. The HP Greedy algorithm can be outlined as follo ws: 1. P artition the TFG into independent subgraphs 2. F or each of these subgraphs, be ginning from the top of the graph, sort the tasks in each of them based on the weights of the v ertices 3. Be gin with the hea viest node in the subgraph and assign it to the processor that pro vides the best e xpected e x ecution time for that task. 14 PAGE 26 4. Remo v e the processor to which a task w as assigned to and continue the pre vious step. If the processor list becomes empty reset it to include all the processors. The One Le v el ReachOut Greedy ( OLR OG ) algorithm dif fers from HP Greedy by taking into account the w aiting time when choosing the processor 2.3 Non Iterati v e Heuristic T echniques The non iterati v e heuristics that ha v e been proposed tend to e xploit certain characteristic of the system to pro vide optimal solution. Opportunistic Load Balancing (OLB) assigns each task, in arbitrary order to the ne xt a v ailable machine, re g ardless of the task' s e xpected e x ecution time on the machine [27 28 29 ]. User Dened Assignment (UD A) assigns each task, in an arbitrary order to the machine with the best e xpected e x ecution time for that task, re g ardless of the machine' s a v ailability [27]. The algorithm is also referred as the Limited Best Assignment (LB A) [27, 28]. F ast greedy assigns each task, in arbitrary order to the machine with the minimum completion time for the task [27 ]. The Minmin heuristic be gins with the set of all unmapped tasks. Then the set of minimum completion times, min #n rn $ for each is found. Ne xt, the task with the o v erall minimum completion time fromis selected and assigned to the corresponding machine. Hence the heuristic is named Minmin Lastly the ne wly mapped task is remo v ed from, and the process repeats until all tasks are mapped (i.e. ) [27, 28 29 ]. Intuiti v ely Minmin attempts to map as man y tasks as possible to their rst choice of machine, on the basis of completion time, under the assumption that this will result in a shorter mak espan. The Maxmin heuristic is v ery similar to Minmin. The Maxmin heuristic also be gins with the set of all unmapped tasks. Then, the set of minimum completion times, minn #n rn $ for each is found. Ne xt, the task with the o v erall maximum completion time fromis selected and assigned to corresponding machine. Hence the name Maxmin. Lastly the ne wly mapped task is remo v ed from, and the process repeats until all the tasks are mapped, i.e. 15 PAGE 27 [27 28 29]. The moti v ation for this heuristic is to attempt to minimize the penalties incurred by delaying the scheduling of longrunning tasks. The assumption here is that with Maxmin the tasks with shorter e x ecution times can be mix ed with tasks with longer e x ecution times and e v enly distrib uted among the machines, resulting in better machine utilization and a better mak espan. Se gmented minmin algorithm [30] is an e xtension to the minmin heuristic discussed earlier The algorithm sorts the tasks according to thes. The tasks can sorted into ordered list by the a v erage, the minimum, or the maximum. Then, the task list is partitioned into se gments with equal size. The se gment of lar ger tasks is scheduled rst and se gment of smaller tasks last. F or each se gment, Minmin is applied to assign tasks to machines. The algorithm is described as follo ws: 1. Compute the sorting k e y for each task POLICY 1 Smmavg: Compute the a v erage v alue of each ro w inr matrix # n POLICY 2 Smmmin: Compute the minimum v alue of each ro w in thernmatrix r # $ POLICY 3 Smmmax: Compute the maximum v alue of each ro w in thernmatrix # $ 2. Sort the tasks into a task list in decreasing order of their k e ys. 16 PAGE 28 3. P artition the tasks e v enly intose gments. 4. Schedule each se gment in order by applying Minmin The Greedy heuristic is literally a combination on Minmin and Maxmin heuristics. The Greedy heuristic applies both Minmin and Maxmin and uses the better solution [27 28 ]. The heuristics discussed in this section are based on the e xpected completion time of a taskon a machine. A major dra wback here is that the y f ail to consider the communication bandwidth in the netw ork. 2.4 Optimal Selection Theory Optimal Selection Theory ( OST ) is based on mathematical programming formulation for generating optimal scheduling schemes[1, 17 ]. In the OST model, the application is decomposed into a set of nono v erlapping code se gments that are totally ordered in time. Each code se gment is further partitioned into code bloc ks and are e x ecuted on v arious machines of the same type concurrently Since the code se gments are non o v erlapping and totally ordered, the completion time of the application equals the sum of the e x ecution times of all the code se gments. The OST model mak es tw o assumptions: 1. linear speedup when a code se gment runs on multiple copies of a machine type 2. there are suf cient number of machines of each type a v ailable Inte ger linear programming techniques [1] can be used to minimize the e x ecution time. The OST model does not consider the communication constraints in the netw ork. Moreo v er the second assumption is not al w ays true in most practical situations. Augmented Optimal Selection Theory ( A OST ) [31], augments the OST frame w ork by incor porating the performance of the code se gments. It o v ercomes the dra wbacks in OST by 1. It considers the e x ecution time of code se gments on all machines 17 PAGE 29 2. allo wing nonuniform decomposition of code se gments 3. limited number of processors These assumptions renders A OST more practical than OST Heterogenous Optimal Selection Theory ( HOST ) [32 ] is an e xtension of the A OST frame w ork. It incorporates the ef fects of v arious local mapping techniques and allo ws for the concurrent e x ecutions of mutually independent code se gments on dif ferent machine types. Here, the tasks of the application are di vided into subtasks. Each subtask consists of a set of code se gments which are e x ecuted serially The rest of the formulation of HOST is similar to A OST Generalized Optimal Selection Theory ( GOST ) [33 ], e xtends the selection theory formulations by including tasks modeled by general dependenc y graphs. The basic code element is a process that corresponds to a code block or a nondecomposable code se gment. Therefore, an application consists of a number of processes that can be modeled as a dependenc y graph. The model assumes that there aredif ferent types of machines and there are an unlimited number of machines of each type. The objecti v e of the formulation is to generate an optimal mapping such that each node in the dependenc y graph is assigned to a machine of a particular type. Polynomial time heuristics can then be used to minimize the completion time of the application. 2.5 Simulated Annealing Simulated annealing is an iterati v e algorithm that is used to solv e man y NPhard and NPcomplete problems. Because of its conceptual simplicity and v ersatility it is used as a tool in a wide area of engineering applications including mapping and scheduling. The algorithm is deri v ed from a process used in metallur gy to mak e allo ys. The core of the algorithm is the ener gy function The k e y control parameter in the function is the temper atur e v ariable. The algorithm initially accepts poor uphill mo v es, that is, when temperature is high. It does so to a v oid being stuck at local minima. But as temperature 18 PAGE 30 decreases in the follo wing iterations, the probability of accepting bad solutions decrease. A general simulated annealing algorithm can be outlined as follo ws: 1. Get a random initial solution2. Set initial temperature 3. While stop criteria not met do (a) Perform the follo wing stepstimes i. Letbe a random neighbor ofii. Let= cost() cost( ) iii. If : Set= n[do wnhill mo v e] i v If : Set=with probabilityr n[uphill mo v e] (b) Reduce the temperature. 4. Return the bestvisited Since the ef cienc y of algorithm depends on the choice of initial solution and initial temperature, care must be tak en to x these v alues. Man y algorithms for mapping tasks in a heterogenous system are presented in [34, 35, 36 ] 2.6 T ab u Sear ch T ab u search is an e xhausti v e state space search algorithm [37, 38 ]. The algorithm k eeps track of the solutions visited so that it does not visit it ag ain in further searches. The algorithm utilizes a control parameterkno wn as the length of the table list used. During each iteration, the algorithm e xhausti v ely searches for neighborhood of the current solution not visited within the last '' iterations. The current solution is then replaced with the neighboring solution with the best cost. The generalized search algorithm can be outlined as: 19 PAGE 31 1. Get a random solution2. While stop condition not met (a) Let be a neighboring solution ofwith best= cost() cost( ) and not pre viously visited in the lastiterations (b) Set 3. Return the bestvisited 2.7 Genetic Algorithms Genetic algorithms( GA ) [39, 40 ] are generally used for solving problems with huge search space. In general the or g anization of GA can be outlined as follo ws: 1. Generate initial population 2. Ev aluate the tness of each chromosome in the population 3. While stopping criteria not met do (a) Selection (b) Crosso v er (c) Mutation (d) Ev aluation The initial population is generated using a uniform random generator or by using a simple heuristic mapping algorithm. Then each chromosome in the population is e v aluated for 'tness'. Fitness function reect a certain system characteristic lik e total scheduling cost or total e x ecuting time. F ollo wing this step, the selection process is performed. The aim of selection process is to quickly prune out poor solutions from the population and 20 PAGE 32 to promote good solutions into pool of chromosomes. There are se v eral selection algorithms a v ailable in the literature, for e xample the roulette wheel [41], tness ranking [42], tournament [43] and stochastic methods [44]. The objecti v e of crosso v er step is to allo w mixing of t chromosomes from the population to produce super t chromosomes. There are se v eral algorithms to achie v e the crosso v er step [45]. The amount of crosso v er is controlled by a control v ariable kno wn as probability of crosso v er The mutation step is used to a v oid getting stuck at local minima by introducing pre viously discarded bad chromosomes into the population. After mutation, the population is ag ain e v aluated for tness. These four steps are repeated until the stopping criteria is met. In [46] a h ybrid GA based algorithm for task scheduling in a multiprocessor system is presented. Mahesw aran et al. [47] ha v e compared ele v en heuristics for task mapping, including greedy minmin, tab u, simulated annealing and genetic algorithms, and ha v e observ ed that genetic algorithms produce the best solution. 2.8 Genetic Simulated Annealing Algorithms The Genetic Simulated Annealing ( GSA ) heuristic is a combination of the GA and SA techniques [48 49]. In general, GSA follo ws procedures similar to the GA outlined ear lier GSA operates on a population generated by simple heuristics. It performs similar mutation and crosso v er operations. Ho we v er for the selection process, GSA uses the SA cooling schedule and system temperature, and a simplied SA decision process for accepting or rejecting ne w chromosomes. GSA uses elitism to guarantee that the best solutions al w ays remained in the population. In [47] the initial population of 200 chromosomes is generated using minmin heuristic. The initial temperature is set to the a v erage mak espan of the initial population and decreased byin e v ery iteration. When ne w chromosome is compared with the corresponding original chromosome (after crosso v er and/or mutation). The ne w chromosome is accepted if the ne w mak espan is less than the sum of old mak espan and the system temperature. That is, 21 PAGE 33 ne w mak espan(old mak espan + temperature) is true, the ne w chromosome becomes part of the population. Otherwise, the original chromosome survi v es to the ne xt iteration. Therefore as the system temperature decreases, it is ag ain more dif cult for poorer solutions to be accepted. Since it uses a probabilistic procedure during selection process, it accepts poor quality intermediate solutions. These poor solutions sometimes do not lead to better nal solutions. 2.9 Lear ning A utomata Algorithms In [2], the author proposes a scheduling algorithm based on learning automata. The model is based on a Pmodel v ariable structure stochastic automation ( VSSA ) for optimizing a single costmetric and a Smodel VSSA for optimizing multiple costmetrics. The VSSA initiates a random action for which the system reacts with a stochastically related response. The VSSA observ es this response from the system and ree v aluates the action probabilities using a reinforcement scheme. It perform these operation iterati v ely to impro v e the performance of the automata. Essentially the model learns' from the response from the system and adapts itself to choose the best action which optimizes a cost function. Three ne w costmetrics that can be used to model the cost function is presented in [50 ]. The VSSA is represented as a 3tuple ,where: = output of the automation = input to the automation = the reinforcement algorithm The model maintains a separate VSSA for each task An action corresponds to the process of assigning a task to a processor Since a task can be assigned to only a single machine, if there aremachines in then the action set containsactions. The general structure of the learning automata can be outlined as follo ws: 22 PAGE 34 1. Learning Algorithm: (a) While stop criteria not met do: i. Generate a solution ii. Compute the cost of the generated solution iii. If the cost is better than that of pre vious solution: A. Setn as f a v ourable response i v else: A. Setn as an unf a v orable response v T ranslate the responsen with a heuristic vi. Update the action probabilities The crucial steps of the algorithm are steps 1(a)v and 1(a)vi. The input to the automation,, is determined from the response from the en vironment,n Six heuristics are proposed in [2] to translate this response. Another important issue is to determine the function that updates the action probabilities. This is done in step 1(a)vi. The updating function comprises of tw o functions namely the re w ard,and the penalty ,. These functions control the speed of con v er gence of the algorithm and also the quality of the solution. The details of e xtending this algorithm to handle multiple costmetrics is discussed in [51]. A Pmodel VSSA can be used to optimize the weighted a v erage of all the costfunctions. Another more comple x approach to solv e the problem is to use a Smodel VSSA that optimizes the indi vidual costfunctions and the responses for the costs are combined to decide the v alue for Studies in [2] sho w that this latter approach produces better solutions than the weighted a v erage approach. 23 PAGE 35 2.10 List Scheduling Algorithms The general list scheduling algorithm can be outlined as follo ws: 1. While there are tasks to be scheduled do: (a) Maintain a list of tasks, sorted by their priority (b) T ask selection (c) Processor selection List scheduling algorithms can be classied into tw o cate gories: static [52 53, 4 54] and dynamic [55, 56 57] algorithms. In static list scheduling algorithms, the tasks are scheduled in the order of their pre viously computed priorities. A task is usually scheduled on a processor that gi v es the earliest start time for the gi v en task. Thus, during each scheduling step, rst the task is selected and then its destination processor is selected. F ast Critical P ath ( FCP ) [54], reduces the task selection comple xity by restricting the choice for the destination processor from all the processors to only two processors:the task' s enabling processor andthe processor which becomes idle the earliest The Heterogenous Earliest Finish T ime ( HEFT ) [4] algorithm is a D A G scheduling algorithm that supports a bounded number of heterogeneous processing elements (PEs). T o set priority to a task the HEFT algorithm uses the upw ard rank v alue, which is dened as the length of the longest path from to the e xit node. The rank of a node is determined based on its computation and communication costs. The task list is generated by sorting the nodes with respect to decreasing order of rank v alues. The algorithm uses earliest nish time, EFT to select the processor for each task. The running time of HEFT is where,is the number of tasks andis the number of processors in the system. 24 PAGE 36 The CriticalP athonaProcessor ( CPOP )[4] algorithm, is another heuristic for scheduling tasks on a bounded number of heterogenous processors. Critical path is dened as the longest path from the source node to the e xit node. All the nodes in the critical path are the critical path nodes. The algorithm e v aluates the ranks based on communication and computation costs. The critical path nodes are determined in the ne xt step. The algorithm then identies the critical path processor (the processor that minimizes the length of the critical path). The CPOP uses the ranks to assign node priority The processor selection phase has tw o options: 1. If the current node is on the critical path, it is assigned to a critical path processor ( CPP ) 2. otherwise it is assigned to a processor that minimizes the e x ecution completion time. The DynamicLe v el Scheduling ( DLS ) [58 ] algorithm assigns priorities by using an attrib ute called Dynamic Le vel ( DL ). In contrast to mean v alues, median v alues are used to compute the static upw ards rank; and for earliest start time computation, the noninsertion method is used. At each step, the algorithm selects theready node, a v ailable processorpair that maximizes the DL v alue. F or heterogenous en vironments # term is added to the DL computation. Thev alue for a task on a processor#is computed by the dif ference between the task' s median e x ecution time on all processors and its e x ecution time on the current processor Le v elizedMin T ime ( LMT ) [59 ] algorithm is a tw o phase algorithm. The rst phase orders the tasks based on their precedence constraints, i.e., le v el by le v el. This phase groups the tasks that can be e x ecuted in parallel. The second phase is a greedy method that assigns each task (le v el by le v el) to the f astest a v ailable processor as much as possible. A task in a lo wer le v el has higher priority for scheduling than a node in a higher le v el; within the same le v el, the task with the highest computation cost has the highest priority If the number of tasks in a le v el is greater than the number of processors, the negrain tasks are mer ged into a coarsegrain task until the number of tasks equal 25 PAGE 37 the number of processors. Then the tasks are sorted in re v erse order based on a v erage computation time. Be ginning from the lar gest task, each task will be assigned a processor: 1. that minimizes the sum of the communication costs with tasks in the pre vious layers 2. that does not ha v e an y scheduled task at the same le v el The Mapping heuristic ( MH ) [60 ], uses static upw ard ranks to assign priorities to the nodes. A ready node list is k ept sorted according to the decreasing order of priorities. W ith a noninsertion based method, the processor that pro vides the minimum earliest nish time of a task is selected to run the task. After a task is scheduled, the immediate successors of the task are inserted into the list. These steps are repeated until all nodes are scheduled. In dynamic scheduling algorithms, the tasks do not ha v e precomputed priorities. At each scheduling step, each ready task is tentati v ely scheduled to each processor and the besttask processoris selected. Thus, at each step, the task and the destination processor are selected at the same time. In F ast Load Balancing ( FLB ) [57 ], at each iteration of the algorithm, the ready task that can start the earliest is scheduled to the processor on which that start time is achie v ed. The h ybrid remapper [61] is a dynamic scheduling heuristic based on a centralized polic y used to impro v e the solution obtained by a static scheduler The h ybrid algorithm w orks in tw o phases. The rst phase of the algorithm e x ecutes prior to application e xecution. The set of subtasks is partitioned into blocks such that the subtasks in a block do not ha v e an y data dependencies among them. ho we v er the order among the blocks is determined by the data dependencies that are present among the subtasks of the entire application. The second phase of the h ybrid remapper e x ecuted during application run time, inv olv es remapping the subtasks. The remapping of a subtask is performed in an o v erlapped f ashion with the e x ecution of other subtasks. As the e x ecution of the application proceeds, runtime v alues for some subtask completion times and machine a v ailability times can be 26 PAGE 38 obtained. The h ybrid remapper attempts to impro v e the initial matching and scheduling by using the run time information that becomes a v ailable during application e x ecution and information that w as obtained prior to the e x ecution of the application. The mapping decisions are based on a mixture of run time and e xpected v alues. Both static and dynamic approaches of listscheduling ha v e their adv antages and disadv antages in terms of schedule quality the y produce. Static schedulers are more suited for communicationintensi v e and irre gular problems where, selecting important tasks rst is crucial. Dynamic schedulers are more suitable for computationintensi v e applications with high de gree of parallelism, because these algorithms focus on obtaining good processor utilization. 27 PAGE 39 CHAPTER 3 PRELIMIN ARIES OF GAME THEOR Y 3.1 Intr oduction to Game Theory Game theory is properly a branch of mathematics. It is used to analyze the beha viours of economic agents who ha v e conicts in interests. The scope of g ame theory is v aried, ranging from analyzing the beha viour of players of a simple RockP aper Scissors g ame to nations de vising military strate gy Game theory lik e an y other theory in mathematics, consists of some axioms and in v olv es pro ving certain other assertions and theorems to be true assuming that the axioms are true. The theory also contains certain terms that are to be dened precisely with primitive terms These primiti v e terms are simply accepted as understood and the axioms are assumed to be true. The principles and basic b uilding blocks of g ame theory w as proposed by v on Neumann in 1928 [62 ] and Nash[63 ]. Some important denitions of primiti v e terms are presented in the follo wing sections. 3.2 Basic Denitions The theory of g ames is studied in three le v els of abstraction. The y are 1. Theory of g ames in e xtensi v e form 2. Theory of g ames in normal form 3. Theory of g ames in characteristic function form 28 PAGE 40 The fundamental structure behind all the three abstractions is the game tr ee T o represent a g ame as a g ame tree, the follo wing details must be specied: 1. The set of player s who play the g ame 2. A set of alternati v es or mo ves a v ailable to the player during his turn 3. A specication of the informationset for each player 4. A termination condition 5. A set of payof fs for each player for each outcome of a g ame Player s are the entities that compete with each other in the g ame. The y form the nodes in the g ame tree. The player who plays rst is at the root of the tree. The list of mo v es a v ailable to the player at that instant of the g ame form the branches of the tree. The nodes to which these rstorder branches point to are the possible situations that can result from the choices of rst player Second and higher order branches, representing the choices open to the player who is to play ne xt, issue from these nodes. This branching process continues until a situation dened by the termination condition as the outcome of the g ame, is reached. Figure 3.1 sho ws an e xample of a g ame tree. If a player kno ws e xactly the i n f o r m a t i o n s e t 1 o f P l a y e r 1 i n f o r m a t i o n s e t 2 o f P l a y e r 1 P l a y e r 1 P l a y e r 2 C h o i c e 1 o f P l a y e r 1 P l a y e r 2 C h o i c e 2 o f P l a y e r 1 P l a y e r 2 C h o i c e 3 o f P l a y e r 1 P l a y e r 1 C h o i c e 1 o f P l a y e r 2 P l a y e r 1 C h o i c e 2 o f P l a y e r 2 P l a y e r 1 C h o i c e 1 o f P l a y e r 2 P l a y e r 1 C h o i c e 2 o f P l a y e r 2 P l a y e r 1 C h o i c e 1 o f P l a y e r 2 P l a y e r 1 C h o i c e 2 o f P l a y e r 2 O u t c o m e 1 C h o i c e 1 o f P l a y e r 1 O u t c o m e 2 C h o i c e 2 o f P l a y e r 1 O u t c o m e 3 C h o i c e 1 o f P l a y e r 1 O u t c o m e 4 C h o i c e 2 o f P l a y e r 1 O u t c o m e 5 C h o i c e 1 o f P l a y e r 1 O u t c o m e 6 C h o i c e 2 o f P l a y e r 1 O u t c o m e 7 C h o i c e 1 o f P l a y e r 1 O u t c o m e 8 C h o i c e 2 o f P l a y e r 1 O u t c o m e 9 C h o i c e 1 o f P l a y e r 1 O u t c o m e 1 0 C h o i c e 2 o f P l a y e r 1 O u t c o m e 1 1 C h o i c e 1 o f P l a y e r 1 O u t c o m e 1 2 C h o i c e 2 o f P l a y e r 1 Figure 3.1. An Example of a Game T ree choices made by the other players who ha v e already mo v ed then he kno ws to which branch point or node the g ame has progressed to in the g ame tree. Such g ames are kno wn as g ames 29 PAGE 41 of perfect information But if that is not the case, the player can only kno w the set of nodes to which the g ame has progressed. This set of nodes is the informationset of the player The g ain obtained by the player by making some choices to arri v e at an outcome is kno wn as the payof f to the player The most important concept emer ging from the analysis of g ametree is that of str ate gy A strate gy is dened as A str ate gy is essentially a statement made by a player specifying whic h of the alternatives he will c hoose if he nds himself in any of the information sets whic h ar e associated with his mo ves [64 ] A strate gy in v olv es foreseeing all the possible situations that may arise in the course of a g ame. It is sho wn in g ame theory that once a strate gy is chosen by each player the outcome of the g ame is thereby determined. 3.2.1 Games in Extensi v e F orm Extensi v e form representation of a g ame is basically the g ametree itself that captures all of the possible decisions of the g ame be ginning from the root, which is the rst mo v e of the rst player The terminal nodes specify the payof fs to each player This is the lo west le v el representation of the g ame where all the internal structures of the strate gies of the players are visible. Figure 3.2 sho ws the e xtensi v e form representation of a simple RockP aper Scissors g ame played by tw o players. RockP aper Scissors is a tw o player g ame played between tw o players Player1 and Player2. The mo v esset of each player is representing rock, paper and scissors. It is assumed that Player2 plays after Player1 and has perfect information about the mo v es of Player1. Figure 3.2 sho ws the e xtensi v e form representation for such a g ame. Suppose, Player1 and Player2 mo v e simultaneously Player2 has incomplete information about the e xact mo v e of Player1. That is Player2 only kno ws the subset of nodes to 30 PAGE 42 P l a y e r 1 P l a y e r 2 R o c k P l a y e r 2 P a p e r P l a y e r 2 S c i s s o r s < 0 0 > R o c k < 1 1 > P a p e r < 1 1 > S c i s s o r s < 1 1 > R o c k < 0 0 > P a p e r < 1 1 > S c i s s o r s < 1 1 > R o c k < 1 1 > P a p e r < 0 0 > S c i s s o r s Figure 3.2. Extensi v e F orm Representation of RockP aper Scissors Game which the g ame has progressed to. The e xtensi v e form representation to illustrate this case is sho wn in gure 3.3. P l a y e r 1 P l a y e r 2 R o c k P a p e r S c i s s o r s < 0 0 > R o c k < 1 1 > P a p e r < 1 1 > S c i s s o r s < 1 1 > R o c k < 0 0 > P a p e r < 1 1 > S c i s s o r s < 1 1 > R o c k < 1 1 > P a p e r < 0 0 > S c i s s o r s Figure 3.3. Extensi v e F orm Representation when Players Mo v e Simultaneously 3.2.2 Games in Normal F orm Normal form is another popular form of representation of g ames. Normal form representation hides the internal details of the mo v eset and deals only with the relation between a strate gy of a player and its payof f. It is considered the second le v el of abstraction while analyzing g ames. Normal F orm Games ( NFG ) are usually represented by multidimensional matrices. The normal form representation of the RockP aper Scissors g ame is sho wn in table 3.1. There are tw o players in the g ame, therefore the g ame matrix has tw o dimensions, one for each player A player has three mo v es, that is, either Rock, P aper or Scissors. Hence 31 PAGE 43 T able 3.1. Normal F orm Representation of RockP aper Scissors Game Player1, Player2 Rock P aper Scissors Rock P aper Scissors there are three ro ws and three columns in the matrix. The payof fs of players when Player1 chooses Rock and Player2 chooses Scissors is stored in location n which is in our case. 3.2.3 Games in Characteristic Function F orm This is the third layer of abstraction used to represent g ames. Characteristic function g ames are normally used to study about the cooperation among players and formation of coalitions. The readers are referred to [64] for further details on this class of g ames. 3.2.4 T ypes of Games Games are classied into tw o major cate gories, namely: cooper ative g ames and non cooper ative g ames. Cooper ative Game: A g ame in which the participants agree to a set of rules and use them to deduce their strate gies is a cooperati v e g ame. Non Cooper ative Game: A g ame in which such decisions cannot be made by the participants is a non cooperati v e g ame. [63, 65 ] In addition to classifying g ames as cooperati v e and non cooperati v e, the y are also classied as zer osum g ames and non zer osum g ames. The classication of g ames is represented in gure 3.4. Zer oSum game: In this g ame, the sum of the payof fs of all the players in the g ame is zero. The e xample in table 3.1 is an e xample of a zerosum g ame. In these g ames, whate v er one player wins, the other players lose. 32 PAGE 44 G a m e s C o o p e r a t i v e g a m e s N o n c o o p e r a t i v e g a m e s Z e r o s u m g a m e s N o n z e r o s u m g a m e s Figure 3.4. Classication of Games Non zer osum game: In this g ame, the sum of the payof fs is nonzero. Therefore, more than one player can be declared winner of the g ame. 3.3 Equilibrium in Games In 1950, Nash proposed a landmark paper about equilibrium in g ames [65].An equilibrium represents a solution of the g ame that is strate gically most suited for all the players in the g ame. There can be more than one equilibrium point for a g ame. Nash Equilibrium: If there is a set of strate gies with the property that no player can benet by changing his strate gy while others k eep their strate gies unchanged, then the set of strate gies and the corresponding payof f constitute Nash Equilibrium. F ormally: Let the payof fs ofplayers be and the set of possible actions be common kno wledge to all the players. Letdenote the actions of all the players besides player. Then a Nash Equilibrium is an array of actions such that for alland all [65 ]. Se v eral algorithms ha v e been proposed to nd the Nash equilibrium for g ames. Readers are referred to [66] for further details. 33 PAGE 45 3.4 A uction Theory Auction theory can be described as the study of b uying and selling objects. Some basic terms commonly used is this theory are e xplained ne xt. Bid is the highest amount a bidder is willing to pay for an object on sale. V alue can be described as the metric that a bidder uses to e v aluate his bid. If the v alue of a bidder is kno wn to other bidders, then the auction is kno wn as public auctions. In pri v ate auctions, bidders do not ha v e an y information about the v alues of other bidders. Sale price is the actual amount to be paid by the winning bidder Gain or payof f is the dif ference between the bid and the sale price. W inner' s cur se bef alls a bidder whose sale price e xceeds the v alue of the object on sale. Bidding str ate gies are the guidelines follo wed by a bidder to x a bid price from his v alue. 3.4.1 T ypes of A uctions Based on the polic y used for e v aluating and accepting bids from bidders, auctions are classied as rstprice auctions and secondprice auctions. If the bidders are ignorant of the bids of other bidders, then such auctions are kno wn as sealedbid auctions. In a rstprice auction, the winner pays an amount that directly corresponds to his v alue. Whereas in secondprice auction, the winning bidder pays an amount that corresponds to the v alue of the second highest bidder The winner of the auction is the bidder with the highest bid. Based on the number of objects on sale, auctions are classied as singleobject auctions and multiobject auctions. Multiobject auctions are further studied as auctions of identical objects and auctions of dissimilar objects. 3.4.2 Game Theory as an Optimization T ool f or T ask Scheduling In a system with a number of autonomous components such as processors in heterogeneous computing system, though centralized optimization pro vides opportunity for ef34 PAGE 46 cient optimization, the coordination and the transfer of information among components are costly and often infeasible. Hence it is important to de v elop decentralized optimization schemes which permit the indi vidual components tak e control of the actions that contrib ute to w ards the optimization of a global performance criteria. The moti v ation for using g ame theory for scheduling tasks is dri v en by the f act that decentralized optimization in g ame theory is achie v ed by the agents (players of the g ame) acting selshly 35 PAGE 47 CHAPTER 4 D YN AMIC SCHEDULING USING GAME THEOR Y A typical HC en vironment w ould consist of a heterogeneous suite of machines such as SIMD, MIMD, Datao w etc., interconnected by a highspeed netw ork. The applications are e x ecuted by matching the v arious computational requirements of the application with the capabilities of the machines. In order to mak e HC viable se v eral issues need to be addressed. A brief introduction to these issues w as presented in the introduction chapter T ask assignment and task sc heduling are considered the most crucial amongst these issues. Collecti v ely the y are kno wn as task mapping This w ork presents a ne w mapping algorithm based on auctions modeled as g ames. The chapter is or g anized as follo ws. First the problem statement is presented. This is follo wed by a brief description of the HC model and the dynamic scheduling technique. Finally the results of comparing the proposed scheduler with other schedulers are presented. 4.1 Pr oblem Statement F or the follo wing discussions, it is assumed that the application has been partitioned and proled. Also, the machines in the HC netw ork ha v e been benchmark ed.The application is represented as a T ask Flo w Graph (TFG). The nodes in the graph corresponds to a subtask of the application. The edges between the nodes correspond to datadependencies between them. A subtask is a code se gment in the application that cannot be further partitioned, and it has to be e x ecuted as a single unit in a machine. The HC suite is represented as a Processor Graph (PG) where the nodes of the graph represent the corresponding machines and the edges between them, the interconnection between them. The edges in both 36 PAGE 48 the graphs are weighted edges. The edge weight of an edge in the TFG represents the number of data units being transferred. Whereas those in the PG represent the time tak en to transfer a single data unit. The task assignment problem in v olv es determining the machines on which the v arious subtasks of the application need to e x ecute in order to minimize a certain system cost metric. The cost metric could be total completion time load on the machines or an y other characteristic that is unique to the application. Whene v er an application is submitted for e x ecution, the dynamic scheduler is inv ok ed. The dynamic scheduler runs simultaneously in parallel with the tasks throughout the mak espan of the application. From the current system state, the subset of subtasks that are ready to be scheduled and the subset of machines that are a v ailable are determined. Then a g ame theoretic auction model is constructed. Subtasks are therefore assigned based on the outcome of the auction. The system state is then updated after a successful partial assignment. 4.2 Elements of the HC System Model The TFG is constructed by associating e v ery subtask with a node in the graph. F or e xample, the TFG generated for Gaussian elimination algorithm for a matrix of sizeis sho wn in Figure 4.1. Another e xample for a TFG generated for F ast F ourier T ransform (FFT) is sho wn in Figure 4.2. Ifrepresents the set of subtasks, then which is the set of nodes of a graph with nodes. The data dependencies between the subtasks are represented by a directed edge between the pair of nodes in v olv ed. The direction of the edge indicates the direction of data o w Each edge is assigned a weight 37 PAGE 49 Figure 4.1. The TFG of Gaussian Elimination Algorithm[4 ] that corresponds to the number of data units being transferred. It is assumed that these edges do not form an y c ycles. Letn rrepresent the set of edges of the TFG.n r $ # &depends on #Let r #denote the edge weights. It is denes as r # number of data units e xchanged betweenand$#if $ n r;otherwise Hence, TFG = n r 38 PAGE 50 Figure 4.2. The TFG of F ast F ourier T ransformation Algorithm[4 ] Figure 4.3 illustrates an application program that has been 'atomized' into ten subtasks. The amount of computation required for each subtask is represented within parentheses. The number of clock c ycles required to completely e x ecute the subtask on a baseline machine may be used as a yard stick to quantify the computational requirement of a subtask. F or e xample, subtaskin Figure 4.3 requires& clock c ycles on a baseline machine, whereas subtaskrequires only&clock c ycles on the same baseline machine. The edges, as mentioned before, represent the data dependenc y among subtasks. The dependenc y of a subtask is quantied as the number of (kilo)bytes of data required by the subtask before it can be gin to e x ecute. F or e xample, the edges between subtasks and subtaskin Figure 4.3 implies that subtaskdepends on subtasks,andand it cannot be gin to e x ecute before these tasks complete their e x ecution. The edge weights imply that subtaskrequires KB of data from subtask,&KB of data from subtaskandKB of data from subtask. Also, subtaskis a dependenc y to subtask, that is, after completion of e x ecution, subtaskmust supply subtaskwith KB of data. 39 PAGE 51 1 ( 3 2 7 ) 9 ( 2 6 ) 3 0 2 ( 5 1 ) 1 0 ( 1 8 8 ) 4 9 3 ( 5 1 ) 4 ( 3 6 7 ) 7 ( 2 3 5 ) 4 1 8 ( 2 1 1 ) 1 3 5 ( 7 3 ) 2 6 6 ( 1 8 2 ) 4 3 5 4 2 2 4 3 5 Figure 4.3. An Application T ask Flo w Graph with andn rPG is constructed in a similar f ashion. Each machine in the system is associated with a node in the PG. Ifdenotes the set of machines, then which denotes that set of nodes in the PG withnodes. Since the interconnection topology is assumed to be kno wn, an edge is associated with e v ery pair of connected machines. Letn represent the set of edges in PG.n $ # and is connected to #40 PAGE 52 The weight assigned to these edges, #are dened as: # cost of communicating a data unit between and #if $ n otherwise Thus n Figure 4.4 sho ws a sample processor graph with andn. 1 ( 3 7 2 ) 2 ( 8 9 3 ) 1 3 ( 6 1 8 ) 3 1 1 3 1 Figure 4.4. An Example of Processor Graph withandn The solution space for the task assignment and scheduling problem can no w be char acterized by a mapping of the nodes in the TFG to those of the PG. Ifdenotes a point in time during the mak espan of the problem, then represents the initial mapping produced by a static scheduler and denotes the partial mapping produced by the dynamic scheduler at that point of time. 41 PAGE 53 where is the subset of subtasks being considered for mapping at timeand is the subset of machines that are a v ailable during that time. 4.2.1 Cost Metric The objecti v e of task mapping in HC system is to impro v e the performance of the system for a gi v en application. The cost metric in the system characterizes the quality of the solution. Impro v ement in performance could be minimization of the total completion time, or minimizing load on the maximum loaded processor or a specic characteristic of an application. The cost metric should reect the chosen performance criterion of the system. In [2], a detailed description of the v arious cost metrics are presented. The cost metric used in this w ork is the total completion time of the application. The system can be easily modied to study other performance criteria. The cost functions are constructed from matrices whose v alues are obtained from the information from benchmarking and proling techniques. Actual e x ecution times are used by the dynamic scheduler whene v er a v ailable. F or other situations, the e xpected e x ecution times are used to create the cost matrix. Let the e x ecution time matrix be denoted byn .n $ e x ecution time of subtaskon machine #The number of data units e xchanged between e v ery pair of connected subtasks are contained in n $ 42 PAGE 54 n r #The cost of communicating a single data unit between a pair of connected machines is stored in r r $ r r $ #Let comp(n) denote the total computation time for a particular assignment and comm(n) represent the total communication time at a particular time instant. Hence: $ #n n $ r r The cost metric completion time w ould then be r r Also it is to be noted that ifhad been dynamically scheduled by time otherwise 4.3 Dynamic T ask Scheduling as a First Price Sealed Bid A uction A group of interested and competiti v e b uyers, a seller and an object of interest constitute an auction. The decision about which b uyer g ains the object is made by conducting auctions. There are dif ferent types of auctions e.g. First Price auction, Second Price auction, V ick ery auction etc. The auction that is most suited for dynamic scheduling ho we v er 43 PAGE 55 is First Price Sealed Bid auction. In this auction, the b uyers ha v e independent pri v ate v alue of the object. Bids are placed solely based on this pri v ate v alue. All the b uyers are rational. The b uyers follo w certain strate gies to generate the bid. The bid of the b uyer is kno wn only to the seller and the b uyer himself, hence the name sealed bid. It is well kno wn that the dominant strate gy in this type of auction is to bid for an amount that is equal to the pri v ate v alue. The seller then decides the winner based on the bids recei v ed. The seller sells the object to the winner for a price that is proportional to the bid. T ypically a dynamic scheduler has to map a subset of subtasks, to a subset of machines, i.e. it has to solv e a series of partial mapping problem to achie v e a global optimum. This partial mapping problem is modeled as First Price Sealed Bid auction and solv ed using Game Theory The subtasks in are modeled as the b uyers in the auction competing to b uy machines in from the seller The e x ecution times of subtasks in the v arious machines in the HC suite are considered as pri v ate v alues of the subtasks. A bid consists of a list of machines along with the cost associated with e x ecuting the subtask on the corresponding machine. Hence, the First Price Sealed Bid auction can be e xtended to solv e the dynamic scheduling problem. Figure 4.6 is a spacetime representation of the system state when subtask has started e x ecution in machine The subtasks that are no w ready to be scheduled are subtasks,$#and %. The y are the players in the auction. Machines andare the machines to which subtasks #and%ha v e been assigned to respecti v ely by the static scheduler The follo wing sections pro vide a detailed description of the dynamic scheduling algorithm. 4.4 Dynamic Scheduling Algorithm The pseudocode describing the algorithm is presented in Algorithm 1. A strate gy can be described as a guideline used by a b uyer to generate bids. F ollo wing this, the b uyers in the proposed technique ha v e tw o strate gies. The y ha v e been named 44 PAGE 56 Processor Graph Proposed Learning Automata Based Static Scheduler Dynamic Scheduler Initial Static Mapping Final Mapping (PG) Task Flow Graph (TFG) Figure 4.5. Frame w ork of the Proposed Dynamic Scheduler Conserv ati v e and Aggressi v e Strate gies respecti v ely There are a set of actions or mo v es associated with each of these strate gies. The Nash Equilibrium pro vides a probability v alue corresponding to each of these mo v es. The nal mapping is performed by selecting the mo v e corresponding to these probability v alues. 4.4.1 The Conser v ati v e Strategy This the rst of the tw o strate gies used by the b uyers to generate bids. There is only one mo v e corresponding to this strate gy i.e. the machine that w as assigned by the static scheduler If a b uyer chose only the conserv ati v e strate gy the corresponding task w ould be gin e x ecution in the machine that w as assigned by the static scheduler at the time determined it. This strate gy does not help in searching the solution space. Ho we v er this strate gy is used for the heuristic approach presented in the ne xt chapter Also, the mo v e corresponding to this strate gy leads to selection of a machine that is idle at that instant. Hence this strate gy is included in the pure dynamic approach. 45 PAGE 57 m0 m1 m2 m3 m4 si si1 si+1 sj sk Machines Time (to Infinity) Window(si) si : A subtask in an application si1: The slowest parent of si si+1: The first child of si sj, sk : Tasks that do not depend on si m0m5 : The machines in the network Players: si, sj, sk conservative(si) : m1 aggressive(si) : m0, m2, m3, m4 conservative(sj) : m3 aggressive(sj) : m0, m1, m2, m4 conservative(sk) : m4 aggressive(sk) : m0, m1, m2, m3 Figure 4.6. Freezeframe Illustrating the System State when! "$#and%are being Dynamically Scheduled using the Proposed Game Theoretic Scheduler 4.4.2 The Aggr essi v e Strategy The second of the tw o strate gies proposed is used to e xplore the solution space. When this strate gy is chosen, the b uyer looks for a machine other than the one assigned to it by the static scheduler Steps of Algorithm 1 achie v e this goal. The follo wing is a step by step description of ho w a bid corresponding to this strate gy is generated.Firstly a windo w of time within which a subtaskneeds to e x ecute is determined. F or a pure dynamic scheduler the lo wer boundary is set to the current time instant and the upper boundary is set to innity Algorithm 2 describes this process.In the second step, all the machines #that are a v ailable during the windo w of subtaskare deemed potential candidates. That is,can e x ecute on #at this time instant. A list is maintained for each. This list forms the preference list of the b uyer .Amongst all the #in the list, machines which pro vide a completion time earlier than correspond to the list of mo v es of this strate gy 46 PAGE 58 Algorithm 1 Dynamic Scheduler based on First Price Sealed Bid Auction 1: f or do 2: Set conserv ati v e[] = ( ) 3: = W indo w() 4: f or # do 5: if within IdleT ime( #) then 6: Append #to the list Candidates[]. 7: end if 8: end f or 9: end f or 10: Set aggressi v e[] = BestCandidates(Candidates[]) 11: P = P ayof fs( ,conserv ati v e,aggressi v e) 12: = NashEquilibrium( ,conserv ati v e, aggressi v e,P) 13: Schedule( ) Algorithm 2 W indo w() 1: Set W indo w[].Lo werBoundCompletion T ime of the Slo west P arent 2: Set W indo w[].UpperBound 3: Return W indo w A nal list is then complied using the mo v es possible when using the tw o strate gies. This nal list is submitted as the bid of. 4.4.3 Calculation of the P ay off An outcome of the g ame is dened as a partial mapping obtained when all the b uyers follo w a mo v e. The payof f function for an outcome is proportional to the o v erall completion time of the partial mapping. If a lo wer completion time is achie v ed for an outcome, it is re w arded with a higher payof f v alue for the players. 4.4.4 Dynamic T ask Mapping based on Nash Equilibrium The auction is implemented as an nperson g ame. The b uyers, mo v es and payof fs constitute an nperson g ame. Nash Equilibrium is then used to solv e the g ame. The solution of the g ame is e xpressed as a probability v alue corresponding to each mo v e for e v ery player 47 PAGE 59 Algorithm 3 BestCandidates(List) 1: f or do 2: f or # List do 3: if Cost( #)Cost( ) then 4: Append to the bid list 5: end if 6: end f or 7: end f or 8: Return the list of bids Dynamic task assignment is hence performed by generating a discrete random v ariable corresponding to the probability v alues of the mo v es. 4.4.5 Analysis of the Algorithm The w orst case comple xity of the proposed algorithm depends on the size of the payof f matrix. F or a g ame withplayers andstrate gies, the size of the payof f matrix is Hence, the comple xity of the proposed scheduler is dominated by the algorithm used for nding the Nash Equilibrium. It has been sho wn that ther e is a guar anteed equilibrium point if the mo v es were allo wed to be mix ed [63 ]. Because of a guaranteed solution it is unlik ely for the problem to be NPhard. Also, a proof for the comple xity of computing a Nash Equilibrium point to lie between P and NP can be found in [63 ]. The proposed w ork uses Gambit [66 ], a softw are tool for computing Nash Equilibrium. 4.5 Simulation Results and Discussion The pseudocode in Algorithm 4 describes the procedure used to simulate the proposed dynamic scheduling technique. The application task o w graph is di vided into dif ferent le v els such that all the tasks within a le v el are independent and can e x ecute in parallel. When a task from le v el be gins its e x ecution, the tasks at le v elare scheduled. The dynamic scheduler is implemented a batch scheduler that uses a sliding windo w technique for task scheduling. Though, the best optimization can be achie v ed by including all the 48 PAGE 60 subtasks in le v elfor scheduling, it w ould not justify the enormous o v erhead that w ould be incurred by the scheduler Hence the sliding windo w technique is adopted. A detailed description of the sliding windo w technique is presented in [67 ]. Step of Algorithm simulate the sliding windo w technique. Algorithm 4 Simulation Procedure 1: Le v els = Le v elize(T) 2: f or Listtasks in each Le v el do 3: Arrange the subtasks in the List in nondecreasing order of their static schedule times 4: while There are unscheduled subtasks in List do 5: if sizeof(List)WINDO W THRESHOLD then 6: DynamicSchedule(List) 7: else 8: Remo v e the rst WINDO W THRESHOLD v alues from List and append them to B A TCH 9: DynamicSchedule(B A TCH) 10: Clear B A TCH 11: end if 12: end while 13: end f or In order to sho w the ef fecti v eness of the proposed dynamic scheduler it is compared with a stateoftheart dynamic scheduling algorithm based on Genetic Algorithm proposed in [68]. The tw o dynamic schedulers use the solution from the Learning Automata based static scheduler proposed in [2] as an initial solution and try to then impro v e it. Therefore the static scheduler is also compared with the tw o schedulers. As yet, a representati v e set of heterogeneous computing task benchmarks do not e xist [69]. Therefore, simulations are performed on randomly generated task o w graphs. The data from T able 4.1 w as used to generate the random graphs. As adopted in [2], TFGs were cate gorized into three major cate gories depending on the communication comple xity A TFG with number of edges equal to are classied as TFGs with lo w communication comple xity Those with v alues andare classied as TFGs with medium and high communication comple xity The TFG size w as v aried from 10 nodes to 100 nodes. The PG size w as also v aried from 2 machines to a 20 machine system. The graphs were plotted to study 49 PAGE 61 the ef fects of the communication comple xity problem size and netw ork size on the completion time of the application. In order to capture the realtime beha viour A second set of e x ecution times and communication cost were used by the dynamic algorithm. These v alues were randomly generated with the e xpected e x ecution time/communication cost as the mean. Simulations were performed on 10 instances of graphs for each type of the TFG and the a v erages were used for constructing the graphs. T able 4.1. Data used for Generating Random Graphs Number of T asks 10, 25, 50, 100 Number of Machines 2, 5, 10, 20 Number of Edges , Ex ecution Matrix Data range 1000 Communication Matrix Data range 4 Data Exchange Matrix Data range 500 Proposed Genetic Algorithm Learning Automata # TasksCompletion Time x 100 0 5 10 15 10 25 50 100 Problem Size vs. Average Completion Time Figure 4.7. Lo w Communication Comple xity ,&From Figures 4.7 4.18 we can observ e that the proposed approach pro vides superior schedules than the other scheduling algorithms for major spectrum of class of TFG. There were some class of TFG e.g. Figures 4.8, 4.11, 4.14 for which Genetic Algorithms produced better results. It is also observ ed that the proposed scheduling technique w as able 50 PAGE 62 Proposed Genetic Algorithm Learning Automata # TasksCompletion Time x 100 0 5 10 15 10 25 50 100 Problem Size vs. Average Completion Time Figure 4.8. Medium Communication Comple xity ,&to consistently impro v e the static solution by about& for all the cases. Also the proposed g ame theoretic scheduler pro vides, on an a v erage, an impro v ement of o v er the dynamic genetic algorithm. 51 PAGE 63 Proposed Genetic Algorithm Learning Automata # TasksCompletion Time x 100 0 5 10 15 20 10 25 50 100 Problem Size vs. Average Completion Time Figure 4.9. High Communication Comple xity ,& Proposed Genetic Algorithm Learning Automata # TasksCompletion Time x 100 0 5 10 15 20 25 30 10 25 50 100 Problem Size vs. Average Completion Time Figure 4.10. Lo w Communication Comple xity 52 PAGE 64 Proposed Genetic Algorithm Learning Automata # TasksCompletion Time x 100 0 10 20 30 40 50 10 25 50 100 Problem Size vs. Average Completion Time Figure 4.11. Medium Communication Comple xity Proposed Genetic Algorithm Learning Automata # TasksCompletion Time x 100 0 10 20 30 40 50 60 10 25 50 100 Problem Size vs. Average Completion Time Figure 4.12. High Communication Comple xity 53 PAGE 65 Proposed Genetic Algorithm Learning Automata # TasksCompletion Time x 100 0 10 20 30 40 10 25 50 100 Problem Size vs. Average Completion Time Figure 4.13. Lo w Communication Comple xity Proposed Genetic Algorithm Learning Automata # TasksCompletion Time x 100 0 10 20 30 40 50 60 10 25 50 100 Problem Size vs. Average Completion Time Figure 4.14. Medium Communication Comple xity ,54 PAGE 66 Proposed Genetic Algorithm Learning Automata # TasksCompletion Time x 100 0 10 20 30 40 50 60 10 25 50 100 Problem Size vs. Average Completion Time Figure 4.15. High Communication Comple xity Proposed Genetic Algorithm Learning Automata # TasksCompletion Time x 100 0 10 20 30 40 10 25 50 100 Problem Size vs. Average Completion Time Figure 4.16. Lo w Communication Comple xity & 55 PAGE 67 Proposed Genetic Algorithm Learning Automata # TasksCompletion Time x 100 0 10 20 30 40 10 25 50 100 Problem Size vs. Average Completion Time Figure 4.17. Medium Communication Comple xity & Proposed Genetic Algorithm Learning Automata # TasksCompletion Time x 100 0 10 20 30 40 50 60 10 25 50 100 Problem Size vs. Average Completion Time Figure 4.18. High Communication Comple xity ,& 56 PAGE 68 CHAPTER 5 D YN AMIC SCHEDULING USING HEURISTICS 5.1 Moti v ation In the scheduler proposed in the pre vious chapter dynamic scheduler is in v ok ed for all the subtasks. Ev ery time the dynamic scheduler is in v ok ed it incurs an o v erhead because some system resources are dedicated to the dynamic scheduler for performing the scheduling operation. In order to reduce the number of times the dynamic scheduler is in v ok ed, a heuristic based approach to select tasks for dynamic scheduler is in v estig ated. Consider a h ypothetical scenario in which subtaskis scheduled to machineand subtasks # "%to machinesandrespecti v ely by the static scheduler Also assume that can e x ecute only onwhereas #and %can run on all three machines If,$#and%f all in to the same time windo w for scheduling, auction is performed with three b uyers. Subtasks#and%play the g ame with possiblyincluded in their mo v es set. At the end of auction, winsand #and%ha v e played the g ame with an e xtra mo v e that will ne v er benet them. One approach to reduce the occurrence of such a scenario w ould be if were not considered for dynamic scheduling at all. The auction w ould ha v e one less player and one less mo v e for each player This could potentially impro v e the time tak en by the dynamic scheduler to complete a partial mapping because the comple xity of the scheduler depends on the number of players and number of mo v es for each player Selecting subtasks that are to be dynamically scheduled such that the o v erall completion time of the application is reduced, is NP hard. Therefore, heuristics are used to select the subtasks that are to not be scheduled by the dynamic scheduler ie. the static schedule is performed on such selected subtasks. Figure 5.1 sho ws the conceptual block diagram for the proposed heuristic 57 PAGE 69 technique. Figure 5.1 illustrates the frame w ork of the proposed heuristic based approach to task scheduling using the g ame theoretic dynamic scheduler proposed in the pre vious chapter The approach is based on the f act that the reduction in the number of subtasks inv olv ed in an auction could considerably increase the speed of the dynamic algorithm. The reduction in number of players in v olv ed in an auction also restricts the impro v ement of the solution quality of the dynamic mapping. That is, fe wer the number of players in v olv ed, the more closer the obtained solution will be to the static mapping. Therefore a trade of f must be struck between the solution quality of and the o v erhead consumed by the dynamic scheduler This chapter in v estig ates this. 5.2 Heuristic Based Model The proposed heuristic scheduler comprises of tw o components. The rst component is the Learning Automata based static scheduler proposed in [2]. This component is used prior to the e x ecution of the tasks. The second component consists of the dynamic scheduler proposed in the pre vious chapter This component e x ecutes concurrently with the e x ecution of the subtasks. The proposed dynamic scheduler attempts to impro v e the initial static mapping. As e vident from Figure 5.1, there are tw o additional steps in v olv ed in the heuristic based approach. The y are used to identify how many subtasks need to be dynamically scheduled and whic h subtasks are to be statically scheduled. The former is achie v ed by dening a parameterand the latter is achie v ed by using heuristics. Once the subtasks are identied, dynamic scheduling is performed only on those subtasks. The follo wing sections present a detailed discussion on ho w the process of task selection is performed. T o determine the number of tasks that need to be statically scheduled i.e. ignored by the dynamic scheduler we dene a parametercalled the Mix. Theparameter is represented as the ratio of tasks in the application that needs to be ignored by the dynamic scheduler to the total number of subtasks. This parameter enables the heuristic scheduler 58 PAGE 70 to beha v e in a wide spectrum ranging from pure static scheduler with to a pure dynamic scheduler with Processor Graph Proposed Learning Automata Based Static Scheduler Dynamic Scheduler Initial Static Mapping Final Mapping (PG) Task Flow Graph (TFG) Task Selection Heuristics Mix Figure 5.1. Frame w ork of the proposed Hybrid Scheduler 5.3 Selection Heuristics f or the Pr oposed Scheduler The determination of the subset of nodes to be statically scheduled () and those to be dynamically scheduled () to obtain an optimal solution, is an open problem. Therefore heuristics are used to determine these sets. In this w ork six heuristics are in v estig ated. The details of the heuristics are discussed in the follo wing sections. 59 PAGE 71 5.3.1 Selection Heuristic H1 The critical path of an application pro vides a lo wer bound of the mak espan of the problem. The rst heuristic, called H1, selects the tasks in the critical path of the application. Hence H1 can be formally dened as: nr nr S t a t i c D y n a m i c 4 ( 3 6 7 ) 7 ( 2 3 5 ) 4 1 8 ( 2 1 1 ) 1 3 1 0 ( 1 8 8 ) 2 4 9 ( 2 6 ) 4 2 1 ( 3 2 7 ) 3 0 2 ( 5 1 ) 4 9 3 ( 5 1 ) 5 ( 7 3 ) 2 6 6 ( 1 8 2 ) 4 3 5 3 5 Figure 5.2. The TFG of Figure 4.3 after Application of Selection Heuristic H1 Figure 5.2 depicts the ef fect of applying this heuristic to the task o w graph of Figure 4.3. The cluster mark ed Static contains the nodes that are to be scheduled statically and the cluster mark ed Dynamic contains the nodes there are to be scheduled dynamically 60 PAGE 72 Algorithm 5 Pseudocode for T ranslation of Selection Heuristic H1 1: FindCriticalP ath( ); 2: ; 3: ; 4: f or do 5: ifr then 6: ; 7: else 8: ; 9: end if 10: end f or 5.3.2 Selection Heuristic H2 The second heuristic, H2, simply selects the subtasks at random. That is, & % random This heuristic, by f ar is the simplest heuristic. It does not tak e the structure of the into account. The pseudocode for implementing this heuristic an be outlined in algorithm 6 and the ef fect after the heuristic is applied is sho wn in Figure 5.3. S t a t i c D y n a m i c 2 ( 5 1 ) 1 0 ( 1 8 8 ) 4 9 4 ( 3 6 7 ) 7 ( 2 3 5 ) 4 1 8 ( 2 1 1 ) 1 3 9 ( 2 6 ) 4 2 2 4 3 5 1 ( 3 2 7 ) 3 0 3 ( 5 1 ) 5 ( 7 3 ) 2 6 6 ( 1 8 2 ) 4 3 5 Figure 5.3. The TFG of Figure 4.3 after Application of Selection Heuristic H2 61 PAGE 73 Algorithm 6 Pseudocode for T ranslation of Selection Heuristic H2 1: ; 2: ; 3: int ; 4: f orintodo 5: random(); 6: ; 7: end f or 8: ; 5.3.3 Selection Heuristic H3 The third heuristic, H3, selects subtasks that ha v e high dependenc y between them for static scheduling and the remaining subtasks are scheduled dynamically Mathematically H3 can be stated as: nsuch that Where nis dened as the threshold weight. It is calculated as n r EdgeW eight n The a v erage bandwidth of netw ork can also be added as an additional constraint to e v aluate n. If the weight of an edge e xceeds this v alue then the subtasks that share this edge are declared hea vily dependent and the y are selected for static scheduling. The pseudocode for implementing this heuristic is outlined in algorithm 7. Figure 5.4 sho ws the result of applying heuristic H3 on the TFG of Figure4.3. 62 PAGE 74 Algorithm 7 Pseudocode for T ranslation of Selection Heuristic H3 1: ; 2: ; 3: Ev aluate n; 4: f or n rdo 5: if EdgeW eight() nthen 6: head()tail(); 7: else 8: head()tail(); 9: end if 10: end f or S t a t i c D y n a m i c 2 ( 5 1 ) 1 0 ( 1 8 8 ) 4 9 4 ( 3 6 7 ) 7 ( 2 3 5 ) 4 1 8 ( 2 1 1 ) 1 3 6 ( 1 8 2 ) 4 3 9 ( 2 6 ) 5 4 2 2 4 3 5 1 ( 3 2 7 ) 3 0 3 ( 5 1 ) 5 ( 7 3 ) 2 6 Figure 5.4. The TFG of Figure 4.3 after Application of Selection Heuristic H3 63 PAGE 75 5.3.4 Selection Heuristic H4 The fourth heuristic, H4, identies the nodes that can e x ecute in parallel. It is achie v ed by le velizing the TFG: The root node(s) are at le v el 0, all the nodes that can be reached from root are at le v el 1, and so on. All the nodes in a le v el are data parallel nodes. The y can be e x ecuted concurrently If the number of such data parallel nodes e xceeds a certain threshold, then the tasks in that le v el are scheduled dynamically F ormally: Nodes nrrIf NumberOfNodes n The v alue nrepresents the a v erage number of dataparallel nodes. It is currently e v aluated as n NumberofLe v els The pseudocode for translating this heuristic is outlined in algorithm 8. Figure 5.5 depicts the ef fect of applying heuristic H4 to the input TFG of Figure 4.3. The function Le velize(TFG) of stepof algorithm 8 returns a list of nodes that are in the same le v el. The function Nodes() in stepof the algorithm returns a list of nodes in a gi v en le v el. Algorithm 8 Pseudocode for T ranslation of Selection Heuristic H4 1: ; 2: ; 3: Le v elsLe v elize ; 4: Ev aluate n; 5: f orin Le v els do 6: if Length() nthen 7: Nodes(); 8: end if 9: end f or 10: ; 64 PAGE 76 S t a t i c D y n a m i c 7 ( 2 3 5 ) 9 ( 2 6 ) 4 2 1 0 ( 1 8 8 ) 2 4 8 ( 2 1 1 ) 3 5 1 ( 3 2 7 ) 3 0 2 ( 5 1 ) 4 9 3 ( 5 1 ) 4 ( 3 6 7 ) 4 1 1 3 5 ( 7 3 ) 2 6 6 ( 1 8 2 ) 4 3 5 Figure 5.5. The TFG of Figure 4.3 after Application of Selection Heuristic H4 5.3.5 Selection Heuristic H5 and H6 Heuristics H5 and H6 use graph partitioning to translate the mix. H5 di vides the TFG using standard bipartitioning techniques with the criteria of partition being the edge cost. This results in tw o subgraphs of TFG with minimum dependenc y between them.andare determined from the subgraphs. The partition criteria may also be changed to balancing the weights of the partition. In the latter case, both the schedulers ha v e to schedule subgraphs with nearly equal computational requirements. Algorithm 9 sho ws the pseudocode for implementing this heuristic. This heuristic is named H6. The results of applying these heuristics to the TFG of Figure 4.3 is sho wn in Figures 5.6 and 5.7. Figure 5.6 is obtained by using node cost as partition criteria and Figure 5.7 is obtained by using edge cost as the partitioning criteria. The partitioning of tasks w as performed using METIS partitioning softw are. 65 PAGE 77 Algorithm 9 Pseudocode for T ranslation of Selection Heuristics H5 and H6 1: ; 2: ; 3: P artition(,Criteria); S t a t i c D y n a m i c 4 ( 3 6 7 ) 8 ( 2 1 1 ) 1 3 7 ( 2 3 5 ) 4 1 5 ( 7 3 ) 2 6 6 ( 1 8 2 ) 4 3 9 ( 2 6 ) 5 1 0 ( 1 8 8 ) 3 5 1 ( 3 2 7 ) 3 0 2 ( 5 1 ) 4 9 3 ( 5 1 ) 4 2 2 4 Figure 5.6. The TFG of Figure 4.3 after Application of Selection Heuristic H5 with criteria of partition being node weight 5.4 Heuristic based Dynamic Scheduling Algorithm The heuristic scheduler dif fers from the dynamic scheduler in the f act that the heuristic scheduler has one additional constraint that must be satised by the machines in order to qualify as potential candidates. A machine is declared a candidate if the subtask can be gin its e x ecution on the machine no sooner than its slo west parent can nish and complete its e x ecution in that machine and perform data transfer to all its dependents not later than its rst dependent subtask can be gin its e x ecution. This is achie v ed by adjusting the W indo w( ) function. This additional constraint ensures that the static schedule is preserv ed. Algorithm 10 presents the pseudocode for the proposed heuristic based dynamic scheduler Algorithm 11 is modied to include the additional constraint. By comparing Figure 5.8 with the Figure 4.6 we can clearly see the adv antage of this approach. The 66 PAGE 78 S t a t i c D y n a m i c 2 ( 5 1 ) 1 0 ( 1 8 8 ) 4 9 5 ( 7 3 ) 8 ( 2 1 1 ) 2 6 6 ( 1 8 2 ) 4 3 9 ( 2 6 ) 5 3 5 1 ( 3 2 7 ) 3 0 3 ( 5 1 ) 4 ( 3 6 7 ) 1 3 7 ( 2 3 5 ) 4 1 2 4 4 2 Figure 5.7. The TFG of Figure 4.3 after Application of Selection Heuristic H6 with criteria of partition being edge weight number of b uyers in the auction has decreased from 3 to 2. Also, it can be seen that the number of mo v es for each player has also been reduced. 5.5 Simulation Results and Discussion Simulations were performed to study the ef fect of mix on solution quality and the to study ef fecti v eness of the heuristics on the solution quality of the dynamic mapping. F or the rst set of e xperiments the data from T able 5.2 were used to generate a random task o w graph and a processor graph. Graphs with number of edges equal to are classied as applications with lo w data dependenc y Those with ratios of andare classied as graphs with medium and high data dependenc y [2]. From Figures 5.9, 5.10 and 5.11 we can infer that the completion times start to impro v e when the mix v alue is in the range of to The solution quality is best when that is when of tasks were 67 PAGE 79 Algorithm 10 Heuristic Based Dynamic Scheduling Algorithm 1: Fix the v alue2: T ,Y = T ranslateMix (S,heuristics H1H6) 3: StaticVSSA(S) 4: Le v els = Le v elize(S) 5: f or Listtasks in each Le v el do 6: f or tList do 7: if t T then 8: Append t to StaticList 9: else 10: Append t to DynamicList 11: end if 12: end f or 13: DynamicSchedule(DynamicList) 14: end f or Algorithm 11 Modied W indo w() 1: Set W indo w[].Lo werBoundCompletion T ime of the Slo west P arent 2: Set W indo w[].UpperBoundStart T ime of the First Dependent 3: Return W indo w statically scheduled and of the tasks were dynamically scheduled. The reason for this is due to the f act that the dynamic scheduler w as able to use the information about the beha vior of the application such as data transfer patterns from the static scheduler to impro v e the completion times of of the tasks. It can also be inferred from the graph that for there is no impro v ement in static schedule. T o ensure the balanced use of both the static and dynamic schedulers,v alue of is chosen T able 5.1. Data used for Ev aluatingparameter Number of T asks 10 Number of Machines 10 Number of Edges 3,6 and 10 Ex ecution Matrix Data range 1000 Communication Matrix Data range 4 Data Exchange Matrix Data range 500 The second set of e xperiments is performed to compare the ef fecti v eness of the six heuristics. F or conducting this e xperiment, a library of random task graphs and processor 68 PAGE 80 m0 m1 m2 m3 m4 si1 si+1 Machines Time si : A subtask in an application si1: The slowest parent of si si+1: The first child of si sj, sk : Tasks that do not depend on si m0m5 : The machines in the network Window(si) sk (static) sj si (dynamic) (dynamic) Players: si, sj conservative(si) : m1 aggressive(si) : m0, m2 conservative(sj) : m3 aggressive(sj) : m0, m2 Figure 5.8. Freezeframe Illustrating the System State when! "$#and%are being Dynamically Scheduled using the Proposed Heuristics on the Game Theoretic Scheduler graphs were created using data from T able 5.2. Random task o w graphs were generated with node sizes, andfor lo w medium and high data dependencies. Random processor graphs were generated for node sizes of,and& processors. From the graphs in Figures 5.12 5.20 it is observ ed that heuristic H2 pro vides better results for most of the situations. It is also observ ed that for TFG with lar ge sizes, heuristics H5 and H6 also pro vide better results. It is to be noted that the proposed heuristics are used for the random graphs that are generated. Ho we v er it w ould be w orthwhile to in v estig ate heuristics, other the proposed ones, that e xploit a specic feature of the application. In essence,and the heuristics are parameters that are highly dependent on the application that is being mapped to the HC suite. 69 PAGE 81 H1 H2 H3 H4 H5 H6 MixCompletion Time x 100 0 5 10 15 20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Mix vs. Completion Times Figure 5.9. Mix v Completion T ime for, with Lo w Dependenc y H1 H2 H3 H4 H5 H6 MixCompletion Time x 100 0 5 10 15 20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Mix vs. Completion Times Figure 5.10. Mix v Completion T ime for,with Medium Dependenc y 70 PAGE 82 H1 H2 H3 H4 H5 H6 MixCompletion Time x 100 0 5 10 15 20 25 30 35 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Mix vs. Completion Times Figure 5.11. Mix v Completion T ime for,with High Dependenc y T able 5.2. Data used for Ev aluating Heuristics Number of T asks 10, 50, 100 Number of Machines 5,10,20 Number of Edges , Ex ecution Matrix Data range 1000 Communication Matrix Data range 4 Data Exchange Matrix Data range 500 71 PAGE 83 H1 H2 H3 H4 H5 H6 Static Completion Time # TasksCompletion Time x 100 0 10 20 30 40 50 10 50 100 Problem Size vs. Average Completion Time Figure 5.12. Size vs. Completion T ime for and with Lo w Data Dependenc y H1 H2 H3 H4 H5 H6 Static Completion Time # TasksCompletion Time x 100 0 10 20 30 40 50 10 50 100 Problem Size vs. Average Completion Time Figure 5.13. Size vs. Completion T ime forand with Lo w Data Dependenc y 72 PAGE 84 H1 H2 H3 H4 H5 H6 Static Completion Time # TasksCompletion Time x 100 0 10 20 30 40 50 10 50 100 Problem Size vs. Average Completion Times Figure 5.14. Size vs. Completion T ime for& and with Lo w Data Dependenc y H1 H2 H3 H4 H5 H6 Static Completion Time # TasksCompletion Time x 100 0 10 20 30 40 50 60 10 50 100 Problem Size vs. Average Completion Time Figure 5.15. Size vs. Completion T ime for and with Medium Data Dependenc y 73 PAGE 85 H1 H2 H3 H4 H5 H6 Static Completion Time # TasksCompletion Time x 100 0 10 20 30 40 50 60 10 50 100 Problem Size vs. Average Completion Time Figure 5.16. Size vs. Completion T ime forand with Medium Data Dependenc y H1 H2 H3 H4 H5 H6 Static Completion Time # TasksCompletion Time x 100 0 10 20 30 40 50 60 10 50 100 Problem Size vs. Average Completion Time Figure 5.17. Size vs. Completion T ime for& and with Medium Data Dependenc y 74 PAGE 86 H1 H2 H3 H4 H5 H6 Static Completion Time # TasksCompletion Time x 100 0 20 40 60 80 100 10 50 100 Problem Size vs. Average Completion Time Figure 5.18. Size vs. Completion T ime for and with High Data Dependenc y H1 H2 H3 H4 H5 H6 Static Completion Time # TasksCompletion Time x 100 0 20 40 60 80 100 10 50 100 Problem Size vs. Average Completion Time Figure 5.19. Size vs. Completion T ime forand with High Data Dependenc y 75 PAGE 87 H1 H2 H3 H4 H5 H6 Static Completion Time # TasksCompletion Time x 100 0 20 40 60 80 100 10 50 100 Problem Size vs. Average Completion Time Figure 5.20. Size vs. Completion T ime for& and with High Data Dependenc y 76 PAGE 88 CHAPTER 6 CONCLUSIONS Heterogeneous computing pro vides a structured methodology to e xploit the di v ersity in the application and computational domain. There are tw o main reasons wh y HC is important for the construction of high performance computing systems: (i) most HPC systems achie v e only a fraction of their peak performance on real application sets, and (ii) the economic viability of HC systems. Among the issues in de v eloping HC systems, task assignment and scheduling is identied to be the most critical. The contrib utions of the thesis are:A ne w scheduling frame w ork based auctions based on g ame theory .T ask assignment algorithms with six heuristics to reduce the scheduling o v erhead. The techniques presented in the thesis are a rst step in this direction. It is possible to enhance the speed of e x ecution of the proposed algorithms, by studying ne w heuristics and using creati v e techniques to solv e the Nash equilibrium. It will be w orthwhile to study the performance of the proposed techniques for specic high performance applications. 77 PAGE 89 REFERENCES [1] R. F Freund, SuperC or Distrib uted Heterogenous HPC, in Computing Systems in Engineering 1991, v ol. 2, pp. 349355. [2] Raju.D.V enkataramana, Task Assignment And Sc heduling Algorithms For Heter o g enous Computing Systems Ph.D. thesis, Uni v ersity of South Florida, August 2000. [3] Muhammad Kal and Ishf aq Ahmad, Optimal Task Assignment in Heterogenous Computing Systems, in Pr oceedings of Sixth Heter o g enous Computing W orkshop (HCW'97) April 1997, pp. 135146. [4] H. T opcuoglu, S. Hariri, and M. Y W u, T ask scheduling algorithms for heterogenous processors, in Pr oceedings of Heter o g enous Computing W orkshop 1999. [5] R.F .Freund and H.J.Seigel, Heterogenous processing, in IEEE Computer 1993, v ol. 26, pp. 8895. [6] A.A.Khokhar V .K.Prasanna, M.E.Shaaban, and C.W ang, Heterogenous computing: Challenges and opportunities, in IEEE Computer 1993, v ol. 26, pp. 1827. [7] H.J.Seigel et al., Report of the purdue w orkshop on grand challenges in computer architecture for the support of highperformance computing, in J ournal of P ar allel and Distrib uted Computing 1992, v ol. 16, pp. 199211. [8] H.J.Seigel et al., P ar allel Computing: P ar adigms and Applications chapter 25, pp. 78114, International Thomson Computer Press, London, 1996. [9] C.C.W eems et al., The image understanding architecture, in International J ournal of Computer V ision 1989, v ol. 2, pp. 251282. [10] V .S. Sunderam, PVM: a frame w ork for parallel distrib uted computing, in Concur r ency: Pr actise and Experience 1990, v ol. 2, pp. 315339. [11] Message P assing Interf ace F orum, MPI: a messagepassing interf ace standard, in International J ournal of Super computer Applications 1994. [12] N.Carriero, D. Gelernter and T .G.Mattson, Linda in heterogenous computing environments, in Pr oceedings of W orkshop on Heter o g enous Pr ocessing 1992, pp. 4346. [13] R.M. Butler and E.L. Lusk, Monitors,messages and clusters: The p4 parallel programming system, in P ar allel Computing 1994, v ol. 20, pp. 547564. 78 PAGE 90 [14] A.S.Grimsha w Easytouse objectoriented parallel processing with mentat, in IEEE Computer 1993, v ol. 26, pp. 3951. [15] A.L.Be guelin, J. Dong arra, G.A. Geist, R. Manchek, and V .S. Sunderam, V isualization and deb ugging in a heterogenous en vironment, in IEEE Computer [16] A.Y .H. Zomaya, P ar allel and Distrib uted Computed Handbook Mc Gra wHill Publishers, Ne w Y ork, NY 1996. [17] R. F Freund, Optimal Selection Theory for Superconcurrenc y in Pr oceedings Super computing '89 No v ember 1989, pp. 699703. [18] E. Arnould et al., The design of nectar: A netw ork backplane for heterogenous multicomputers, in Pr oceedings of the International Confer ence on Ar c hitectur al Support for Pr o gr amming Langua g es and Oper ating Systems (ASPLOS III) IEEECS Press, Los Alamitos, California, 1989, pp. 205216. [19] ANSI X3T9.3, Highperformance parallel interf ace: Hippiph, hippisc, hippifp, hippile, hippimi, in American National Standar d for Information Systems Amer ican National Standards Institute,Ne w Y ork, 1991. [20] D. Arapo v A. Kalino v A. Lasto v etsk y and I Ledo vskih, A programming en vironment for heterogenous distrib uted memory machines, in Pr oceedings of Heter o g enous Computing W orkshop (HCW'97) April 1997. [21] H.S.Stone, Multiprocessor Scheduling with the aid of Netw ork Flo w Algorithms, January 1977, v ol. 3, pp. 8593. [22] V .M.Lo, Heuristic Algorithms for Task Assignment in Distrib uted Systems, 1988, v ol. 37, pp. 13841397. [23] M.M.Eshagian and Y .C.W u, Mapping Heterogenous Task Graphs onto Heterogenous System Graphs, in Pr oceedings of Sixth Heter o g enous Computing W orkshop (HCW '97) April 1997, pp. 147160. [24] M. T an et al., Scheduling and Data Relocation for Sequentially Ex ecuted Subtasks in a Heterogenous System, in Pr oceedings of F ourth Heter o g enous Computing W orkshop (HCW'95) 1995, pp. 109120. [25] A. V Aho, J.E.Hopcroft, and J.D. Ullman, The Design and Analysis of Computer Algorithms AddisonW esle y MA, 1974. [26] C. Leangsuksun and J. Potter Designs and Experiments on Heterogenous Mapping Heuristics, in Pr oceedings of Thir d Heter o g enous Computing W orkshop (HCW'94) 1994, pp. 1722. [27] R. Armstrong, D. Hensgen, and T Kidd, The relati v e performance of v arious mapping algorithms is independent of sizeable v ariances in runtime predictions, in nIEEE Heter o g enous Computing W orkshop (HCW'98) March 1998, pp. 7987. 79 PAGE 91 [28] R. Freund, M. Gherrity S. Ambrosius, M. Campbell, M. Halderman, D. Hensgen, E. K eith, T Kidd, M. K usso w J. Lima, F Mirabile, L. Moore, B. Rust, and H. Seigel, Scheduling resources in multiuser heterogenous computing en vironments using SmartNet, in IEEE Heter o g enous Computing W orkshop (HCW'98) March 1998, pp. 184199. [29] O. Ibarra and C. Kim, Heuristic algorithms for scheduling independent tasks on nonidentical processors, in J ournal of the A CM April 1977, pp. 280289. [30] M. Y W u, W Shu, and H. Zhang, Se gmented min min: A static mapping algorithm fr metatasks on heterogenous computing systems, in Pr oceedings of Heter o g enous Computing W orkshop [31] M. W ang et al., Augmenting the Optimal Selection Theory for Superconcurrenc y in Pr oceedings of the Workshop on Heter o g enous Pr ocessing 1992, pp. 1322. [32] S. Chen et al., Selection Theory for Methodology for Heterogenous Supercomputing, in Pr oceedings of Heter o g enous Computing W orkshop (HCW'93) 1993, pp. 1522. [33] B. Narahari, A. Y oussef, and H.A.Choi, Matching and Scheduling in a Generalized Optimal Selection Theory in Pr oceedings of Heter o g enous Computing W orkshop(HCW'94) 1994, pp. 38. [34] M. Coli and P P alazzari, Real time piplined system design through simulated annealing, December 1996, v ol. 42, pp. 465475. [35] S. Krikpatrick, C. D. Gelatt Jr ., and M. P V ecchi, Optimization by simulated annealing, in Science May 1983, v ol. 220, pp. 671680. [36] S. J. Russell and P Norvig, Articial Intellig ence: A Modern Appr oac h Prentice Hall, Engle w ood Clif fs, NJ, 1995. [37] I. De F alco, R. Del Balio, E. T arantino, and R. V accaro, Impro ving search by incorporating e v olution principles in parallel tab u search, in IEEE Confer ence on Evolutionary Computation 1994, v ol. 2, pp. 823828. [38] F Glo v er and M. Laguna, T ab u Sear c h Kluwer Academic Publishers, Boston, MA, June 1997. [39] H. Singh and A. Y oussef, Mapping and scheduling heterogenous task graphs using genetic algorithms, in nIEEE Heter o g enous Computing W orkshop (HCW'96) April 1996, pp. 8697. [40] L. W ang, H. J. Seigel, V P Ro ycho wdhury and A. A. Macieje wski, T ask matching and scheduling in heterogenous computing en vironments using a geneticalgorithmbased approach, in J ournal of P ar allel and Distrib uted Computing No v ember 1997. 80 PAGE 92 [41] J. H. Holland, Adaptation in Natur al and Articial Systems Uni v ersity of Michig an Press, Ann Arbor 1975. [42] D.Whitle y The Genitor Algorithm and Selection Pressure:wh y Rankbased allocation of reproducti v e T rials is the Best, in Pr oceedings of the Thir d International Confer ence on Genetic Algorithms 1989. [43] A. Brindle, GA for Function Optimization, T ech. Rep., 1981. [44] J.Bak er Reducing Bias and Inef cienc y in the Selection Algorithm, in Pr oceedings of the Second International confer ence on Genetic Algorithms 1989. [45] D. Goldber g, Genetic Algorithms in Sear c h, optimization and Mac hine Learning AddisonW esle y MA, 1989. [46] A. Mahmood, A h ybrid genetic algorithm for task scheduling in multiprocessor realtime systems, in Studies in Informatics and Contr ol J ournal 2000, v ol. 9, pp. 207218. [47] T D. Braun, H. J. Seigel, N. Beck, L. L. B l ni, M. Mahesw aran, A. I. Reuther J. P Robertson, M. D. The ys, B. Y ao, D. Hensgen, and R. F Freund, A comparison study of static mapping heuristics for a class of metatasks on heterogenous computing systems, in Pr oceedings of Heter o g enous Computing W orkshop [48] N. S. Flann H. Chen and D. W W atson, A massi v ely parallel simd approach, in IEEE T r ansactions on P ar allel and Distrib uted Computing February 1998, v ol. 9, pp. 126136. [49] P Shrof f, D. W atson, N. Flann, and R. Freund, Genetic simulated annealing for scheduling data dependent tasks in heterogenous en vironments, [50] R. D. V enkataramana and N. Rang anathan, Ne w cost metrics for iterati v e task assignment algorithms in heterogenous computing systems, in Pr oceedings of Heter o g enous Computing W orkshop May 2000. [51] R .D. V enkataramana and N. Rang anathan, Multiple Cost Optimization for Task Assignment in Heterogenous Computing Systems Using Learning Automata, in Pr oceedings of Heter o g enous Computing W orkshop April 1999, pp. 137145. [52] M. Y W u and D. D. Gajski, Hypertool: A programming aid for messagepassing systems, in IEEE T r ansactions on P ar allel and Distrib uted Systems July 1990, pp. 330343. [53] G. L. P ark, B. Shirazi, J. Marquis, and H. Choo, Decisi v e path scheduling: A ne w list scheduling method, in Pr oceedings of International Confer ence on P ar allel Pr ocessing 1997. [54] A. Radulescu and A. J. C. v an Gemund, On the comple xity of list scheduling algorithms for distrib utedmemory systems, in Pr oceedings of A CM International Confer ence on Super computing 1999. 81 PAGE 93 [55] J. J. Hw ang, Y C. Cho w F D. Anger and C. Y Lee, Scheduling precedence graphs in systems with interprocessor communication times, in SIAM J ournal on Computing April 1989, pp. 244257. [56] C. Y Lee, J. J. Hw ang, Y C. Cho w and F D. Anger Multiprocessor scheduling with interprocessor communication delays, in Oper ations Resear c h Letter s June 1988, pp. 141147. [57] A. Radulescu and A. J. C. v an Gemund, Flb: F ast load balancing for distrib utedmemory machines, in Pr oceedings of International Confer ence on P ar allel Pr ocessing 1999. [58] G. C. Sih and E. A. Lee, A compile time scheduling heuristic for interconnectionconstrained heterogenous processor architectures, in IEEE T r ansactions on P ar allel and Distrib uted Systems February 1993, v ol. 4, pp. 175186. [59] M. Iv erson, F Ozguner and G. F ollen, P arallelizing e xisting applications in a distrib uted heterogenous en vironments, in Pr oceedings of Heter o g enous Computing W orkshop 1995, pp. 93100. [60] H. ElRe wini and T G. Le wis, Scheduling parallel program tasks onto arbitrary tar get machines, in J ournal of P ar allel and Distrib uted Computing 1990, v ol. 9, pp. 138153. [61] M. Mahesw aran and H. J. Seigel, A dynamic matching and scheduling algorithm for heterogenous computing systems, in Pr oceedings of Heter o g enous Computing W orkshop 1998. [62] John V on Neumann, Zur theories der gesellschaftsspiele, in Mathematisc he Annalen 1928, pp. 100:295320. [63] J. Nash, Noncooperati v e g ames, in Annals of Mathematics 1951, pp. 54:289295. [64] Anatol Rapoport, NP er son Game Theoy Concepts and Applications The Uni v ersity of Michig an Press. [65] J. Nash, Equilibrium points in nperson g ames, in Pr oceedings of The National Academy of Sciences U S. A. 1951, pp. 36:4849. [66] Richard Mck elv e y Andre w McLennan, and Theodore T uroc y Gambit: Softwar e T ools for Game Theory T e xas A & M Uni v ersity [67] A. Y Zomaya, Observ ations on using genetic algorithms for dynamic loadbalancing, in IEEE T r ansactions on P ar allel and Distrib uted Systems 2001, v ol. 12. [68] Andre w J. P age and Thomas J. Naughton, Dynamic task scheduling using genetic algorithms for heterogeneous distrib uted computing, in Pr oceedings of the 19th IEEE/A CM International P ar allel and Distrib uted Pr ocessing Symposium Den v er Colorado, USA, April 2005, IEEE Computer Society 82 PAGE 94 [69] M. D. The ys, T D. Braun, H. J. Sie g al, A. A. Macieje wski, and Y .K. Kw ok, Mapping T asks onto Distrib uted Heter o g eneous Computing Systems Using a Genetic Algorithm Appr oac h John W ile y and Sons, Ne w Y ork, USA. [70] R. D. V enkataramana and N. Rang anathan, A Learning Automata Based Frame w ork for Task Assignment in Heterogenous Computing Systems, in A CM Symposium on Applied Computing 1999. [71] R. Freund and H. J. Seigel, Heterogenous processing, in IEEE Computer June 1993, pp. 1317. [72] K. T aura and A. Chien, A heuristic algorithm for mapping communicating tasks on heterogenous resources, in Pr oceedings of Heter o g enous Computing W orkshop [73] M. Mahesw aran, S. Ali, H. J. Seigel, D. Hensgen, and R. F Freund, Dynamic matching and scheduling of a class of independent tasks onto heterogenous computing systems, in Pr oceedings of Heter o g enous Computing W orkshop 83 