xml version 1.0 encoding UTF8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001416916
003 fts
005 20061130083645.0
006 med
007 cr mnuuuuuu
008 031010s2003 flua sbm s0000 eng d
datafield ind1 8 ind2 024
subfield code a E14SFE0000054
035
(OCoLC)52846882
9
AJJ4768
b SE
SFE0000054
040
FHM
c FHM
049
FHME
090
TK7885
1 100
Namballa, Ravi K.
0 245
CHESS
h [electronic resource] :
a tool for CDFG extraction and highlevel synthesis of VLSI systems /
by Ravi K. Namballa.
260
[Tampa, Fla.] :
University of South Florida,
2003.
502
Thesis (M.S.Cp.E.)University of South Florida, 2003.
504
Includes bibliographical references.
516
Text (Electronic thesis) in PDF format.
538
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
500
Title from PDF of title page.
Document formatted into pages; contains 97 pages.
520
ABSTRACT: In this thesis, a new tool, named CHESS, is designed and developed for control and dataflow graph (CDFG) extraction and the highlevel synthesis of VLSI systems. The tool consists of three individual modules for:(i) CDFG extraction, (ii) scheduling and allocation of the CDFG, and (iii) binding, which are integrated to form a comprehensive highlevel synthesis system. The first module for CDFG extraction includes a new algorithm in which certain compilerlevel transformations are applied first, followed by a series of behavioralpreserving transformations on the given VHDL description. Experimental results indicate that the proposed conversion tool is quite accurate and fast. The CDFG is fed to the second module which schedules it for resource optimization under a given set of time constraints. The scheduling algorithm is an improvement over the Tabu Search based algorithm described in [6] in terms of execution time. The improvement is achieved by moving the step of identifying mutually exclusive operations to the CDFG extraction phase, which, otherwise, is normally done during scheduling. The last module of the proposed tool implements a new binding algorithm based on a gametheoretic approach. The problem of binding is formulated as a noncooperative finite game, for which a NashEquilibrium function is applied to achieve a poweroptimized binding solution. Experimental results for several highlevel synthesis benchmarks are presented which establish the efficacy of the proposed synthesis tool.
590
Adviser: PhD, N. Ranganathan.
653
cdfg extraction.
binding.
scheduling.
highlevel synthesis.
690
Dissertations, Academic
z USF
x Computer Engineering
Masters.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.54
PAGE 1
CHESS: A T OOL FOR CDFG EXTRA CTION AND HIGHLEVEL SYNTHESIS OF VLSI SYSTEMS by RA VI K N AMB ALLA A thesis submitted in partial fulllment of the requirements for the de gree of Master of Science Department of Computer Science and Engineering Colle ge of Engineering Uni v ersity of South Florida Major Professor: N.Ranganathan, Ph.D. Murali V aranasi, Ph.D. Abdel Ejnoui, Ph.D. Date of Appro v al: July 8, 2003 K e yw ords: HighLe v el Synthesis, Resource Optimization, Lo w Po wer Binding, CDFG Extraction, T ab u Search, Game Theory cCop yright 2003, RA VI K N AMB ALLA
PAGE 2
DEDICA TION T o My Mother
PAGE 3
A CKNO WLEDGEMENTS I w ould lik e to e xpress gratitude to my major professor Dr N. Ranganathan, for his encour agement, guidance, support and friendship throughout my Master' s program. W ithout his patience and his v aluable suggestions, this thesis w ould not ha v e been completed. I w ould also lik e to thank Dr V aranasi and Dr Abdel for guiding me as my committee members. I w ould also lik e to thank Ashok Muruga v el for his ideas and his help throughout my thesis w ork. I wish to thank Sarju Mohanty for pro viding me with his collection of related w orks in VLSI. I w ould also lik e to thank all members of VCAPP group for their help and support. I really appreciate the in v aluable support that I recei v ed from my brother without which this w ork w ould not ha v e been possible. Also, I w ould lik e to ackno wledge the support of my roommates and friends.
PAGE 4
T ABLE OF CONTENTS LIST OF T ABLES iii LIST OF FIGURES i v ABSTRA CT vi CHAPTER 1 INTR ODUCTION 1 1.1 System Description and Intermediate Representation 2 1.2 Scheduling 3 1.2.1 T imeConstrained Scheduling 4 1.2.2 Resource Constrained Scheduling 5 1.2.3 Other Scheduling Approaches 6 1.3 Allocation and Binding 7 1.4 Moti v ation for Our Thesis 7 1.5 Thesis Outline 8 CHAPTER 2 RELA TED W ORK 10 2.1 Compiler Le v el T ransformations in HighLe v el Synthesis 10 2.2 DFGBased W orks 13 2.2.1 Scheduling 14 2.2.2 Allocation and Binding 18 2.3 CDFGBased W orks 20 2.3.1 Scheduling 20 2.3.2 Allocation/Binding 24 2.3.3 Our W ork 25 CHAPTER 3 CDFG EXTRA CTION FR OM VHDL 26 3.1 Introduction 26 3.2 Preliminaries 28 3.2.1 Control and Data Flo w Graph 28 3.2.2 Basic VHDL Constructs 29 3.3 Implementation Details 31 3.3.1 Methodology 32 3.3.2 Algorithmic Description 32 3.3.3 T ransforming VHDL Constructs 36 3.3.3.1 Operational Statements 36 3.3.3.2 Assignment Statements 37 3.3.3.3 Conditional Statements 41 3.3.3.4 Loop Statements 43 3.3.4 Output F ormats 45 i
PAGE 5
3.3.4.1 Adjacenc yList Representation 45 3.3.4.2 Set Representation 45 3.3.4.3 V isual Representation 45 3.4 Unhandled Features 46 3.5 Summary 46 CHAPTER 4 SCHEDULING THE CDFG 47 4.1 Introduction 47 4.2 Mutual Exclusion Among Operations 50 4.3 Penalty W eights 52 4.4 Scheduling Algorithm 54 CHAPTER 5 PO WEROPTIMIZED BINDING 57 5.1 Po wer Optimization During Binding 57 5.2 Basic Concepts 59 5.2.1 Game Theory 59 5.2.2 Auction Theory 60 5.3 Problem F ormulation 60 5.3.1 Algorithmic Description 63 5.4 Summary 64 CHAPTER 6 EXPERIMENT AL RESUL TS 66 6.1 CDFG Extraction From Beha vioral VHDL 66 6.2 Scheduling 67 6.2.1 The Dif ferential Equation Benchmark 72 6.2.2 Elliptic Filter 75 6.3 Binding 78 CHAPTER 7 CONCLUSIONS 80 REFERENCES 81 ii
PAGE 6
LIST OF T ABLES T able 6.1. Experimental Results for CDFG Extraction From Beha vioral VHDL Specication 67 T able 6.2. Comparison of Schedules for the Dif ferential Equation Benchmark Circuit 75 T able 6.3. Comparison of Schedules for the Elliptic Filter Benchmark Circuit 75 T able 6.4. Po wer and Delay V alues of the Library Cells 79 T able 6.5. Comparison of Binding Results 79 iii
PAGE 7
LIST OF FIGURES Figure 1.1. VLSI Design Flo w 2 Figure 2.1. T axonomy of Related W orks in HighLe v el Synthesis 11 Figure 3.1. CDFG Representation 30 Figure 3.2. Steps In v olv ed in Extraction of the CDFG From VHDL Code 33 Figure 3.3. Extraction of CDFG From a Sample VHDL Code 34 Figure 3.4. Algorithm for CDFG Extraction 35 Figure 3.5. CDFG: AND Operation 37 Figure 3.6. CDFG: V ariable Assignment 37 Figure 3.7. CDFG: Signal Assignment 38 Figure 3.8. CDFG: IfThenElse Statement 38 Figure 3.9. CDFG: Loop Statement 39 Figure 4.1. Mutually Exclusi v e Nodes 51 Figure 4.2. Penalty W eights 55 Figure 4.3. LifeT ime and Number of Buses From CDFG 56 Figure 4.4. Scheduling Algorithm 56 Figure 5.1. Scheduled CDFG and its Binding Matrix 58 Figure 5.2. Algorithm for Finding the Nash Equilibrium 64 Figure 5.3. Algorithm for Finding the Cost Matrix 65 Figure 5.4. Binding Algorithm 65 Figure 6.1. CDFG Extracted for the Dif ferential Equation Benchmark Circuit 68 Figure 6.2. CDFG Extracted for the Elliptic Filter Benchmark Circuit 69 Figure 6.3. CDFG Extracted for the Greatest Common Di visor Benchmark Circuit 70 Figure 6.4. CDFG Extracted for the F ast F ourier T ransform Benchmark Circuit 71 i v
PAGE 8
Figure 6.5. ASAP Schedule for Dif ferential Equation Benchmark 72 Figure 6.6. ALAP Schedule for Dif ferential Equation Benchmark 73 Figure 6.7. Optimal Schedule for the Dif ferential Equation Benchmark in 4 Control Steps 73 Figure 6.8. Schedule for the Dif ferential Equation Benchmark in 6 Control Steps 74 Figure 6.9. Scheduled CDFG for the Elliptic Filter Benchmark Circuit 76 Figure 6.10. Impro v ement in Memory Requirement 78 v
PAGE 9
CHESS: A T OOL FOR CDFG EXTRA CTION AND HIGHLEVEL SYNTHESIS OF VLSI SYSTEMS RA VI K N AMB ALLA ABSTRA CT In this thesis, a ne w tool, named CHESS, is designed and de v eloped for control and datao w graph (CDFG) e xtraction and the highle v el synthesis of VLSI systems. The tool consists of three indi vidual modules for:(i) CDFG e xtraction, (ii) scheduling and allocation of the CDFG, and (iii) binding, which are inte grated to form a comprehensi v e highle v el synthesis system. The rst module for CDFG e xtraction includes a ne w algorithm in which certain compiler le v el transformations are applied rst, follo wed by a series of beha vioralpreser vin g transformations on the gi v en VHDL description. Experimental results indicate that the proposed con v ersion tool is quite accurate and f ast. The CDFG is fed to the second module which schedules it for resource optimization under a gi v en set of time constraints. The scheduling algorithm is an impro v ement o v er the T ab u Search based algorithm described in [6] in terms of e x ecution time. The impro v ement is achie v ed by mo ving the step of identifying mutually e xclusi v e operations to the CDFG e xtraction phase, which, otherwise, is normally done during scheduling. The last module of the proposed tool implements a ne w binding algorithm based on a gametheoretic approach. The problem of binding is formulated as a noncooperati v e nite game, for which a NashEquilibrium function is applied to achie v e a po wer optimized binding solution. Experimental results for se v eral highle v el synthesis benchmarks are presented which establish the ef cac y of the proposed synthesis tool. vi
PAGE 10
CHAPTER 1 INTR ODUCTION VLSI technology has adv anced to a le v el where it w ould be e xtremely dif cult to design digital systems starting at the transistor le v el or at the physical le v el. The increasing comple xity of the designs and the e v er gro wing competiti v eness in the design mark et ha v e made ine vitable, the need to tak e the design process to much higher le v els of abstraction where the design tradeof fs of time and ef cienc y could be carefully e v aluated by the design engineer This led to the automation of the design process based on a topdo wn methodology starting from the conceptualization of the design to its realization on silicon. No w VLSI technology has gradually e v olv ed to a point where the highle v el synthesis of VLSI design systems has become more cost ef fecti v e and less time consuming than the traditional method of designing e v erything by hand. A typical VLSI design o w is sho wn in gure 1.1.. The rst le v el of the design o w is the systems le v el specication, which is the most abstract form of representation of the design and mostly gi v es its description in plain English. The ne xt le v el is the beha vioral description which gi v es a functional description of the design while a v oiding the structural details of the design. The R TL description, on the other hand, is composed of instances of modules such as adders, multipliers, re gisters, etc. that pro vide the structural details of the design. The process of translation of a beha vioral description into a structural description is termed as HighLe v el Synthesis. The process of synthesizing an R TL structure from the functional description during the highle v el synthesis in v olv es three phases:Allocation: determining the number of instances of each resource needed.Binding: assignment of resources to computational operations.Scheduling: timing of computational operations. 1
PAGE 11
System Specification Behavioral Description SystemLevel Design RTL Description Gate Level Description Physical Layout High Level Synthesis Logic synthesis Layout Synthesis Transformation Compilation Scheduling Allocation/Binding Figure 1.1.. VLSI Design Flo w Highle v el synthesis starts at the system le v el and proceeds do wnw ards to R TL le v el, passing through each of the abo v e phases, each time adding some additional information needed at the ne xt le v el. Beha vioral synthesis requires transformation of the VHDL code into an internal representation which e xtracts re gisters, combinational logic equations and macros lik e '+', '', etc. for the scheduling, allocation and binding processes. Most systems use a representation lik e the control o w graph and/or the data o w graph or the combination of the tw o lik e the CDFG as their inter mediate format. 1.1 System Description and Intermediate Repr esentation The system to be designed is described at the most abstract le v el in plain English, i.e., in a form most easily understood by the user The beha vior of the system is captured at the algorithmic le v el 2
PAGE 12
through a programming language such as Ada, P ascal or a hardw are description language such as VHDL, Hardw areC [73 ] MIMOLA [114 ] and SILA GE [40 ]. The beha vior of the system, specied in a highle v el language, is compiled into a internal representation that w ould be suitable for the rest of the synthesis process. The transformation of the beha vioral specication into its unique graphical representation is analogous to the nonoptimizing compilation of a programming language. The data representation adopted by se v eral beha vioral synthesis systems may v ary slightly in style and structure, b ut, in general, the control and data dependencies are encapsulated in one or tw o graphs. The data o w graph is a directed graph which depicts the o w of data, while the control o w graph is a directed graph which indicates the sequence of operations. 1.2 Scheduling Scheduling is dened as that step in highle v el synthesis in which the operations are grouped into controlsteps based on their types and dependencies in such a w ay that the operators in the same control step could be e x ecuted simultaneously A wide v ariety of approaches e xist in ef cient scheduling which are directed at either reducing the total time of e x ecution or minimizing the number of resources needed for the design. Broadly these approaches could be classied into four cate gories: Basic scheduling, time constrained scheduling, resource constrained scheduling and miscellaneous scheduling. The control and data o w graphs depict the inherent parallelism in a design, based on which, each node could be assigned a range of control steps. Most of the scheduling algorithms require the earliest and the latest bounds that dene the range of control steps for each node in the CDFG. T w o simple schemes that are widely used to determine these bounds are called the As Soon As Possible (ASAP) and the As Late As Possible (ALAP) algorithms. The ASAP algorithm be gins with scheduling the initial nodes, i.e. nodes without an y predecessors, in the rst time step, and assigns the time steps in increasing order as it proceeds do wnw ards. The algorithm is guided by the simple principle that a particular node can be e x ecuted only if all of its predecessors ha v e been e x ecuted. Ignoring resource constraints, this algorithm gi v es the least 3
PAGE 13
number of control steps required for the design, and hence, could be used for near optimal micro code compilation [84 ] The ALAP algorithm is analogous to the ASAP scheme, e xcept that the operations here are intentionally postponed to the latest possible control step. The algorithm be gins at the bottom of the CDFG, i.e., with nodes that ha v e no successors, and proceeds upw ards to nodes that ha v e no predecessors. This algorithm gi v es the slo west possible schedule for a gi v en design. 1.2.1 T imeConstrained Scheduling The timeconstrained scheduling approach is often adopted for designs tar geted to w ards applications in realtime systems, lik e the digital signal processing systems, which are often limited by the response time. Here, the main objecti v e w ould be to realize the design with minimum possible hardw are while meeting the time constraint. T ime constrained scheduling is usually implemented using three dif ferent techniques: Mathematical programming Constructi v e heuristics Iterati v e Renement. Inte ger Linear Programming The ILP method is a mathematical formulation of the scheduling problem, which applies a branchandbound search algorithm with backtracking to nd the optimal schedule.nrwhene v ern(1.1) The ILP approach be gins with nding the earliest () and the latest () timebounds for each operation using the ASAP and ALAP algorithms respecti v ely From these, the mobility range for each operation is calculated as! #" $&%(')%+* (1.2) and the scheduling problem is formulated by the equation, Minimize, 0/21 43 657& 8and. 9;:=n: ?@BA DCEGFHC %IJ%LK no. of operations(1.3) 4
PAGE 14
where,C % %operation types are a v ailable, and7 is the number of functional units of operation type k, and3 is the cost of each FU. @ A is equal to 1 if the operation i is assigned in control step j and 0, otherwise. Such a formulation could be e xtended to further include resource and data dependenc y constraints using the equations,, @ /21 ;@ A %7 @and8 5 A n 5 ;@ A 8 % CEr (1.4) whereandare the control steps assigned to the operations@and respecti v ely One major dra wback of the ILP formulation is that its comple xity increases rapidly with the number of control steps. F or a single additional control step, n additional x v ariables ha v e to be considered. The ILP approach is computationally intensi v e and hence, can be applied only to v ery small problems. One other approach for time constrained scheduling is a heuristic method, called the F orce directed scheduling. This algorithm tries to reduce the total number of functional units used by uniformly distrib uting the operations of the same type o v er the a v ailable control steps. 1.2.2 Resour ce Constrained Scheduling Resource constrained scheduling algorithms are used in applications where the design is restricted by the silicon area. The goal of these algorithms is to minimize the number of control steps while satisfying the resource constraints. The schedule is b uilt one operation at a time, so that the resource constraints and data dependencies are not violated. The total number of control steps are minimized in such a w ay that the number of operations scheduled in an y control step does not e xceed the number of FUs a v ailable. T w o popular approaches for scheduling operations with resource constraints include listbased scheduling and staticlist scheduling. Listbased scheduling is based on including resource constraints in the ASAP algorithm. A priority list of ready nodes is maintained, and each such list is associated with a priority function that resolv es an y resource conicts. A ready node is a node whose predecessors ha v e already been scheduled. 5
PAGE 15
The algorithm proceeds by rst scheduling operations with higher priority while the lo wer priority operations are deferred to later control steps. At e v ery step, the successors of a scheduled operation are added to the priority list of ready nodes. The ef cienc y of such a list scheduling algorithm depends mostly on the priority function emplo yed. A simple priority function could be chosen as to assign a priority that is in v ersely proportional to the mobility of the operation, and thereby ensure that operations with lar ge mobility are deferred to later control steps since the y could go into more number of control steps. Alternati v ely we could assign a priority based on the length of the longest path from the operation node to a node with no immediate successor One major dra wback of the listbased scheduling is the increased time and space comple xity because of the se v eral lists that ha v e to be maintained dynamically The staticlist scheduling is based on b uilding a single list of operations statically as opposed to the normal listbased scheduling, where the list gro ws dynamically The ASAP and ALAP algorithms are applied initially to nd the mobility range for each operation. The operations are sorted in ascending order based on their greatest control step assignment, and then, the operations with the same greatest control step v alue are sorted in descending order of their least control step v alue. The operations are then scheduled sequentially in the descending order of their priority The operations that cannot be scheduled in a control step due to una v ailability of resources are deferred to later control steps. 1.2.3 Other Scheduling A ppr oaches Apart from the pre viously discussed scheduling algorithms, se v eral other approaches, lik e the Simulated Annealing, ha v e been successfully used to solv e the scheduling problem. In the Simulated Annealing based approach [26 ], scheduling is treated as a placement problem, where the operations are to be placed in a tw odimensional table of control steps v ersus a v ailable functional units. The algorithm be gins with an initial placement of operations, and iterati v ely modies the table by displacing an operation. The ne w schedule is e v aluated based on the cost of displacement, and is accepted with a probability e v en when it may not be better from the pre vious one, in order to o v ercome local minima in the solution space. The simulated annealing approach 6
PAGE 16
is, thus, suitable for obtaining globally optimum solutions, b ut, requires long e x ecution times for nding them. Another approach is the P athbased scheduling [13 ], which is based on minimizing the number of control steps needed to e x ecute the critical path in a CDFG. Initially all possible paths of the CDFG are e xtracted and scheduled independently and later these schedules are combined to get the nal schedule. The algorithm transforms the problem of introducing minimum control step constraint into a cliquepartitioning problem. A clique partitioning solution w ould indicate the minimum o v erlapping of interv als in a gi v en path. 1.3 Allocation and Binding Allocation is the process of determining the functional units of each type for performing the operations while binding includes the process of assigning each such operation to a particular functional unit. Allocation ensures that suf cient number of resources are a v ailable for e x ecuting the operations and binding decides the actual components to be used for each operation. Binding has an impact on the amount of multiple xing and interconnections in the nal design. Allocation and binding could be classied into three cate gories based on their objecti v e: allocation and binding for functional units, for memory units, and for interconnections. Allocation and binding for functional units consists of grouping operations in such a w ay that each group consists of mutually e xclusi v e operations while the total number of groups is minimized. In memory unit allocation/binding v alues that are generated in one control step and used in another are assigned to memory units for storage. Here, the objecti v e is to minimize the number of memory units and also to simplify the communication paths. Interconnection allocation and binding includes assignment of b uses, multiple x er and demultiple x er connections to perform the data transfer in each time step. 1.4 Moti v ation f or Our Thesis The automation of design process has been deemed necessary by the increasing comple xity of the designs and the decreasing mark etingtime requirements of the design mark et. Shifting the design process to higher le v els of abstraction has been the moti v ating f actor for se v eral research w orks in the Highle v el synthesis phase. Despite the a v ailability of se v eral tools for synthesizing 7
PAGE 17
beha vioral descriptions of designs, their application in research w ork is quite limited since most of them are commerciallyoriented tools. Moreo v er most of the pre vious w orks on highle v el synthesis tar get datadominated designs, b ut, are not adequate enough to handle controldominated designs. Controlo w intensi v e beha viors with inherent loops and conditionals are quite possible in netw orkcentered systems. This has moti v ated us to de v elop a comprehensi v e highle v el synthesis system that could be used for both datao w and controlo w intensi v e designs. The system, generating outputs at dif ferent stages of the synthesis process, aids researchers by pro viding them with the e xibility of se v eral entry and e xit points in the system. The highle v el synthesis process requires the compilation of the beha vioral description of the design into a graphical representation, capturing the control and data dependencies. The deri v ation of such a Control and Data Flo w Graph (CDFG) has been done mostly manually which mak es this process timeconsuming and error prone at least in the earlier stages of synthesis. Our synthesis system, therefore, includes a tool for automatic con v ersion of a gi v en beha vioral VHDL description into its corresponding CDFG. Such a CDFG is generated in se v eral formats to accommodate dif ferent implementation approaches. T raditionally the design automation tools were de v eloped with the objecti v e of reducing area and impro ving the speed of designs. Ho we v er with the introduction of portable wireless de vices and other micro equipment lik e laptop computers, po wer dissipation of the circuits has slo wly e v olv ed as a major concern of the design process. Such a trend has placed the problem of po wer optimization in the early design c ycle. W e ha v e addressed this problem of po wer optimization in the binding phase of our synthesis system. 1.5 Thesis Outline The rest of the thesis is or ganized as follo ws: W e enumerate some of the pre vious w orks related to this eld in chapter 2. An automatic con v ersion tool that is used to e xtract a CDFG from the gi v en beha vioral description is described in chapter 3. Chapter 4 gi v es a brief o v ervie w of the scheduling approach used in our synthesis system. Chapter 5 describes our gametheory based binding algorithm that incorporates po wer optimization. Experimental results obtained upon some 8
PAGE 18
of the standard highle v el synthesis benchmark circuits are presented in chapter 6. Finally we gi v e the concluding remarks in chapter 7. 9
PAGE 19
CHAPTER 2 RELA TED W ORK The adv ent of design automation has resulted in a signicant amount of w ork at man y le v els of design abstraction. A number of techniques ha v e been proposed for highle v el synthesis, some of which are briey discussed here. A taxonomy of related w orks in Highle v el synthesis(HLS) has been pro vided in Figure 2.1.. The related w orks are classied on the basis of the intermediate representation the y use (DFG or CDFG), and the tasks that the y tar get in HLS (scheduling, allocation or binding). W e ha v e also enumerated w orks on transformations of initial beha vioral descriptions. W e no w present a summary of these w orks according to our classication. 2.1 Compiler Le v el T ransf ormations in HighLe v el Synthesis In this section, we cite v arious w orks on compiler le v el transformations of original beha vioral descriptions that aid in the ne xt steps of HLS. Aho et al. [7 ] proposed the application of se v eral compiler optimization techniques, such as constant folding and redundant operator elimination, on the o wgraph representation. Arrayed v ariables were another source of compiler le v el optimizations for HLS considered in [36 ] and [79 ]. Since arrays in the beha vioral descriptions get mapped to memories, it w as proposed in [55 ] that reducing the number of array accesses decreases the o v erhead resulting from accessing memory structures. Lis and Gajski [67 ] identied some of the adv antages of capturing design requirements in a beha vioral form. These include:T echnology dependent details of implementation are not embedded in the design specication. 10
PAGE 20
Scheduling Park et al. 1991 Jain et al. 1991 Paulin et al. 1989 Devadas et al. 1989 Pangrle et al. 1987 Genotys et al. 1987 Tseng et al. 1986 Marwedel 1986 Davidson et al. 1981 Achatz et al. 1993 Ly et al. 1993 Aloqeely et al. 1994 Chaudhuri et al. 1994 Gebotys et al. 1994 Lanner et al. 1994 Wang et al. 1994 Dhodhi et al. 1995 Kawaguchi et al. 1995 Kim et al. 1995 Lee et al. 1995 Sharma et al. 1995 Unaltuna et al. 1995 Wilson et al. 1995 Raghunathan et al. 1997 Gruian et al. 1998 Lakshminarayana et al. 1998 Park et al. 1999 Crenshaw et al. 1999 Prabhakaran et al. 1999 Benini et al. 2000 Shiue et al. 2000 Sllame et al. 2002 Mohanty et al. 2003 Thepayasuwan et al. 2003 Chang et al. 1996 Chang et al. 1995 Tsay et al. 1990 Kucukcakar et al. 1990 Huang et al. 1990 Paulin et al. 1989 Kurdahi et al. 1987 Tseng et al. 1986 Hitchcock et al. 1983 Srikantam et al. 2000 Shiue et al. 2000 Agarwal et al. 2001 Kumar et al. 2000 Hong et al. 2000 Crenshaw et al. 1999 DasGupta et al. 1998 Gebotys et al. 1997 Ferguson et al. 1996 Allocation/Binding Rosien et al. 2002 Mendias et al. 2002 Grant et al. 1991 Goosens et al. 1990 Bhaskar et al. 1990 Walker et al. 1989 Hartley et al. 1989 Park et al. 1988 Girczyc et al. 1987 Rosentiel et al. 1986 Orailogulu et al. 1986 Aho et al. 1986 Mekenkamp et al. 1996 Potkonjak et al. 1995 Lee et al. 1994 Kolson et al. 1994 Chaiyakul et al. 1993 Nicolau et al. 1991 Mehra et al. 1996 Crenshaw et al. 1998 Choi et al. 2002 Zhong et al. 2002 Elgamel et al. 2002 DFGbased CDFGbased HighLevel Synthesis Compilerlevel Transformations Gisczyc et al. 1987 Polkonjak et al. 1998 Kim et al. 1999 Shiue et al. 2000 Kumar et al. 1999 Crenshaw et al. 1998 Amellal et al. 1994 Kim et al. 1994 Gajski et al. 1992 Camposano et al. 1991 Michael et al. 1990 Wakabayashi et al. 1989 Park et al. 1988 Begamaschi 2001 Allocation/Binding Scheduling Wang et al. 2003 Kollig et al. 1997 Lakshminarayana et al. 1997 This Work 2003 Allocation/Binding Scheduling/ Figure 2.1.. T axonomy of Related W orks in HighLe v el Synthesis 11
PAGE 21
The beha vioral description could be applied to a simulator to v erify the correctness of a ne w design, or to v alidate an e xisting design specication.As the implementation technologies change, the a v ailable beha vioral description could be used to redesign a circuit to mak e it compatible with the ne w technology .Beha vioral synthesis increases producti vity minimizes errors, decreases design time without an y technology specic e xpertise from the designer W ith the increasing use of VHDL for design description, some approaches ha v e been proposed that are specic to transformations on VHDL. Bhask er and Lee [10 ] proposed approaches to identify specic syntactic constructs and replace them with attrib utes on signals and nets to indicate their functions. In order to reduce the syntactic v ariation of descriptions with the same semantics, Chaiyakul et al [14 ] proposed a transformation technique that uses assignment decision diagrams to minimize syntactic v ariance in the gi v en description. The COMET (ClusterOriented and Minimum Ex ecution T ime) design system proposed by Chang, Rose and W alk er [20 ] synthesizes synchronous pipeline ASICs. It uses VHDL for describing the beha vioral specications. Such a description is restricted to statements with arithmetic and logical operations, control constructs lik e if, case and loops. The y designed a subsystem, named VCOMP for con v erting such a beha vioral description into a DFG representation. The DFG is then subject to optimizing transformations. Another approach for o wgraph transformations is the tree height reduction [39 ] that tries to impro v e the parallelism of the design. A similar method described in [77 ] uses the commutati vity and distrib uti vity properties of the language operators to decrease the height of a long e xpression chain, e xposing the inherent parallelism in a complicated datao w graph. Other commonly used transformations include pipelining [81 ], loop folding [33 ], softw are pipelining [35 ] and retiming [70 ]. Some patternmatching transformations were applied by Rosenstiel in [90 ], which are based on R T semantics of the hardw are components corresponding to o wgraph operators. W alk er et al. [107 ] applied system le v el transformations to di vide parts of the o w graph into separate processes that run concurrently or in a pipelined f ashion. Mek enkamp et al. described a system, called TRADES (T ransformational DEsign System) in [72 ], which uses a syntax based translation to transform a subset of VHDL constructs into a CDFG on a per statement basis. Due to such a 12
PAGE 22
syntax based approach, the VHDL e v ent mechanism appears in the CDFG without imposing an y guidelines on the synthesis process. Nijhar and Bro wn [78 ] identify signicant dif ferences between the optimizations of VHDL code and that of a con v entional, sequential programming language which are often assumed to be on the same line. According to them, transformations applied on sequential programming languages are limited by a x ed tar get architecture, i.e., the architecture on which the program is to be run. VHDL optimization, ho we v er has an e xtra de gree of freedom associated with it in that it can manipulate the e x ecuting hardw are itself. Potk onjak et. al. [70 ] proposed methods for transforming a beha vioral description so that synthesis of the ne w description requires less area o v erhead. The y proposed a tw ostage objecti v e function for estimating the area and testability as well as for e v aluating the ef fects of a transfor mation. From there, a randomized branchandbound steepest decent algorithm w as emplo yed to search for the best sequence of transformations. In [91 ], Rosien et. al. present a method to automatically generate a CDFG from a C/C++ source code. Such a CDFG is used to automate the programming of a Field Programmable Function Array (FPF A), which is a e xible and ener gy ef cient recongurable de vice. Their CDFG is represented using the hyper graph model, in which the operations are represented by edges (hyperedges) and the inputs and outputs are represented by the nodes which connect the edges. W ith such a representation, an operation can ha v e an y number of distinguishable inputs/outputs. Also, a hyper graph itself can be used as a denition of a ne w hyperedge and a whole hierarchical graph can be created this w ay The authors ha v e di vided the process of generating such a CDFG from C/C++ code into se v eral steps. First, a parse tree is generated from the code, from which the language constructs are con v erted into a list of hyper graph templates. A complete CDFG is b uilt from these templates, which is then subjected to a series of beha vior preserving transformations. Finally a clean CDFG is obtained in which the control lines and the statespace are trimmed as much as possible. 2.2 DFGBased W orks The w orks discussed in this section ha v e used a Data Flo w graph as their intermediate representation. 13
PAGE 23
2.2.1 Scheduling P aulin and Knight [85 ] introduced the forcedirected scheduling (FDS) that uses a global selection criterion to choose the ne xt operation for scheduling. Their FDS algorithm relied on the ASAP and ALAP scheduling algorithms to determine the range of control steps for e v ery operation. The algorithm achie v es its objecti v e of reducing the number of Functional Units by uniformly distrib uting the operations of the same type into all the a v ailable control steps. The HAL system, which is based on their forcedirected scheduling approach, performs beha vioral synthesis on a global scheme with stepwise renement. Some of the constraints and features supported by the FDS algorithm include,multic ycle and chained operations.mutually e xclusi v e operations.scheduling with x ed global timing constraints, aimed at minimizing functional unit costs, re gister costs and global interconnect requirements.scheduling with local timing constraints.scheduling with x ed resource constraints.functional pipelining.structural pipelining. The FDS scheme does not tak e into account future scheduling of operations into the same control step which leads to a lack of compromises between early and late decisions, which may result in a suboptimal solution. P ark et al. [81 ] o v ercome this weakness by iterati v ely rescheduling some of the operations in the gi v en schedule. An initial solution is obtained using a standard algorithm, and that solution is maximally impro v ed by rescheduling a sequence of operations till no impro v ement is attainable. The COMET system [20 ] applies the concept of F orcedirected scheduling to interacting with cluster structure information. Their system is based on a tool called the Cluster Oriented Scheduling (COS), which uses pattern matching techniques to recognize the 14
PAGE 24
cluster structures of a ne w algorithm as an instance of a dependenc y structure for mapping an algorithm to architecture. In [38 ], Gupta et al. present a latenc yconstrained scheduling algorithm to optimize a design for dynamic po wer Their w ork is moti v ated by the force directed scheduling algorithm proposed by P aulin and Knight [84 ]. Their algorithm reduces dynamic po wer by reducing switched capacitance inside resources, after e v aluating the switched capacitance of combinations among DFG operations that could share resources. A force is associated with each feasible combination corresponding to the po wer consumption, and a distrib ution of such forces is obtained, whose mean, standard de viation and sk e w are used to produce a po wer optimized schedule. Rim and Jain [89 ] demonstrate a performance e xtension tool that computes a lo wer bound completion time for nonpipelined resourceconstraine d scheduling problem for a gi v en datao w graph with a set of resources and for a specied resource delay and a clock c ycle. Chaudhuri and W alk er [18 ] produced an algorithm for computing lo wer bounds on the number of functional units of each type required to schedule a datao w graph in a specied number of control steps. The bounds are found by relaxing either the precedence constraints or the inte grity constraints on the scheduling problem. A listbased scheduling algorithm that uses information from a DFG to guide its search for optimal / near optimal schedules is presented in [100 ]. A DFG analysis is performed initially which includes, nding the successors and predecessors of e v ery node and the tree to which the node belongs to. W ith this a v ailable kno wledge, the scheduler is supposed to mak e a perfect choice for the operation to be scheduled ne xt. The most basic constructi v e approaches for HLS, the ASAP and ALAP algorithms, ha v e no priority assigned to operations, while the list scheduling approaches use a global criterion for selecting the ne xt operation to be scheduled. P angrle et al. [12 ] used the mobility of an operation as its global priority function, where mobility is dened as the dif ference between the ASAP and ALAP v alues of that operation. Another priority function, named ur genc y w as used by Girczyc et al. in [33 ], which is dened as the minimum number of control steps from the bottom at which an operation can be scheduled before a timing constraint is violated. The list of ready operations is ordered according to these priority functions and processed for each state. 15
PAGE 25
Other scheduling approaches were proposed to address the problems of memory and storage. Kim and Liu [52 ] laid emphasis on minimizing the interconnection and then tried to group the v ariables to from memory modules. Lee and Hw ang [65 ] proposed taking multiport memory into account as early as during scheduling. A multiport access v ariable (MA V) w as dened for a control step, and the MA Vs across all the control steps were equalized in order to achie v e a better memory utilization. Aloqeely and Chen [5 ] proposed a sequencer based architecture, where a sequencer is a stack or queue connecting one functional unit to another High quality datapaths could be synthesized for man y signal processing and matrix computation algorithms by letting the v ariables to either stay or ow through the sequencers for future use. Achatz [1 ] proposed an e xtension to the ILP formulation so that it can handle multifunctional units as well as units with dif ferent e x ecution times for dif ferent instances of the same operation type. W ang and Grainger [109 ] came up with a method to reduce the number of constraints in the original ILP formulation without reducing the e xplored design space, thereby making the computation more ef cient and more applicable to lar ger sized problems. Chaudhuri et al.[19 ] described a welldesigned ILP formulation for e xploiting the structure of the assignment, timing, and resource constraints, and the y further impro v ed the wellstructured formulation by adding ne w v alid inequalities. Landwehr et al. proposed the OSCAR system in [63 ], which represents a 0/1 inte ger programming model for solving the three tasks of HLS. In [32 ], Gebotys proposed an inte ger programming model for the synthesis of multichip architecture which can simultaneously deal with partitioning, scheduling, and allocation. W ilson et al.[110 ] generalized the ILP approach in an inte grated solution to the scheduling, allocation and binding in datapath synthesis. L y et al. [68 ] proposed a method for using beha vioral templates for scheduling, where each template locks a number of operations into a relati v e schedule with respect to one another It eases the handling of timing constraints, sequential operation modeling, prechaining of certain operations, and hierarchical scheduling. Unaltuna et al. presented a threephase neural netw ork based scheduling algorithm in [105 ], while, Ka w aguchi and T okada combined simulated annealing with neural netw orks in [50 ] for solving the scheduling problem. Dhodhi et al. [28 ] proposed the application of a problemspace genetic algorithm for datapath synthesis, that performs concurrent scheduling and allocation with the objecti v e of minimizing the resource cost and the total e x ecution time. L y et al. [69 ] adapted the simulated annealing procedure 16
PAGE 26
to highle v el synthesis that e xplores the design space by repeatedly ripping up parts of a design in a probabilistic manner and reconstructing them using applicationspecic heuristics. Sharma et al. [96 ] combined the allocation and scheduling of functional, storage and interconnect units into a single phase, using the concept of re gister state (free, b usy or undecided) for optimizing re gisters in a incomplete schedule where the lifetimes of v ariables are yet to be a v ailable. Lange vin and Cern y [64 ] described a recursi v e method for estimating a lo wer bound on the performance of schedules under resource constraints for ac yclic nite DataFlo w graphs. The recursi v e method is based on the greedy lo wer bound estimator of Rim and Jain [89 ], which w as formulated as resolving a relaxation of the general scheduling problem and allo wed for chaining of operations, and pipelined and myltic ycle operations. K ollig and AlHashimi [54 ] described a ne w simulated annealingbased algorithm capable of solving scheduling, allocation and binding tasks simultaneously without the need of independent interconnect optimization. Their algorithm be gins with an initial solution, and proceeds by gener ating ne w solutions which are either accepted or rejected based on an acceptance criterion dened in the algorithm. The probability of accepting solutions with increasing cost depends ona cost parameter which is gradually lo wered as the annealing process proceeds [25 ] The mo v es used applied in the annealing procedure are chosen in a w ay to co v er scheduling, allocation and binding tasks simultaneously The four dif ferent kinds of mo v es that could applied on a randomly chosen operation in the algorithm at an y time are:Randomly schedule the operation one control step earlier or later .Bind the v alue to a ne w re gister from the set of a v ailable re gisters.Bind the operation to a ne w functional module from a set of a v ailable modules.Sw ap the inputs of the operation if it is commutati v e. Zhu and Gajski [113 ] established a theoretical frame w ork for another concept of scheduling called soft scheduling. In soft scheduling, the decisions made are soft, i.e., the y could be adjusted later The authors discuss the applicability of soft scheduling to alle viate the phase coupling problem of HLS. 17
PAGE 27
Shantna wi et. al.[4 ] presented a no v el technique to obtain a rateoptimal and processor optimal schedule for a fullystatic data o w graph onto a multiprocessor system. The authors emplo y the Flo ydW arshall' s shortest path algorithm to e v aluate the relati v e ring times of the nodes of the DFG. 2.2.2 Allocation and Binding Thepayasuw an et al. [102 ] proposed a no v el technique for resource binding and operation scheduling to maximize the latenc y of the digital hardw are such that its simultaneous switching noise is k ept within feasible limits. The technique in v olv es the automatic generation of perfor mance models for each input specication and then applying an e xploration algorithm to nd the best resource binding and operation scheduling alternati v e. Tseng and Sie wiorek [104 ] di vided the allocation problem into three tasks of storage, functionalunit, and interconnection allocation which are solv ed independently by mapping each task to the popular cliquepartitioning problem of graphs. In the graph formulation, operations, v alues, or interconnection are represented by nodes. An edge between tw o nodes indicates those tw o nodes can share the same hardw are. The allocation problem is thus transformed to the problem of nding the minimal number of cliques in the graph. Since the problem of nding the minimal number of cliques in a graph is a NPhard problem, Tseng and Sie wiorek adopted a heuristic approach to tackle it. The cliquepartitioning problem can minimize the storage requirements when applied to storage allocation. Ho we v er it totally ignores the interdependence between storage and interconnection allocation. P aulin and Knight [85 ] e xtend this approach by augmenting the graph edges with weights that reect the impact on interconnection comple xity due to re gister sharing among v ariables. Hitchcock et al. [41 ] proposed a allocation system, named EMUCS, that starts with an empty datapath and b uilds it gradually by adding functional, storage and interconnection units as necessary A similar approach w as used in the MAB AL system of K ucukacakar and P ark er [59 ] which uses a global criterion based on functional, storage and interconnection costs to determine the ne xt element to assign and where to assign it. 18
PAGE 28
K urdahi and P ark er [60 ] used the leftedge algorithm to solv e the re gister allocation problem The leftedge algorithm has the adv antage of a polynomial time comple xity when compared to the cliquepartitioning approach which is NPcomplete. While the leftedge algorithm can successfully allocate the minimum number of re gisters, it f ails to consider the impact of re gister allocation on the interconnection cost, which could be tak en care by a weighted v ersion of the cliquepartitionin g algorithm. The re gister and functionalunit allocation problems ha v e been transformed into weighted bipartitematching algorithms in [43 ]. The authors use a polynomial time maximum weight matching algorithm that allocates a minimum number of re gisters and also tak es, partially into consideration, the impact of re gister allocation on interconnection allocation. K umar and Bayoumi [3 ] considerd the binding of function units operating at multiple v oltages. Their w ork is aimed at minimizing the po wer consumption due to switching acti vities on the physical components. The y transformed the problem of binding into a graphtheory problem which w as later solv ed using tw o approaches: greedy approach and an optimal approach. Shiue et. al. [98 ] presented a no v el approach to lo w po wer binding in highle v el synthesis based on linear programming methods. The binding problem w as mapped on to a graph called the parallel graph(PG), upon which the linear programming techniques were applied to search all paths to nd the optimal binding that minimizes the o v erall po wer consumption due to switching acti vity A datapath synthesized by constructi v e or decomposition methods, could be further impro v ed by an iterati v e renement approach, named reallocation. Tsay et al. [103 ] propose the application of a sophisticated branchandbound method by reallocating a group of dif ferent types of entities for datapath renement. In [2 ], the authors e xplored the potential of precision sensiti v e approach for the highle v el synthesis of multiprecision DFGs. The y focus on x ed latenc y implementation of the DFGs. The y present re gister allocation, functional unit binding and scheduling algorithms to e xploit the multiprecision nature of the DFGs for optimizing the area. An iterati v e impro v ement approach is de v eloped with cost function being formulated in terms of number of bits of arithmetic operators and storage units. Dasgupta and Karri [24 ] proposed algorithms for scheduling and binding to minimize data b us transitions. The algorithm w as based on a simulated annealing process. Hong and Kim [42 ] proposed the repeated application of the computation of maximum o w of minimum cost in netw orks 19
PAGE 29
for lo w po wer b us optimization during scheduling and binding. Chang and Pedram [16 ] proposed a ne w technique for reducing po wer consumption through re gister allocation and binding. The problem, in their algorithm, w as formulated as a minimum cost clique co v ering problem, and solv ed for optimality using a maxcost o w algorithm. The same authors proposed an approach of po wer reduction in [17 ] for binding of functional units. The problem, in this case, w as formulated as a maxcost multicommodity o w problem and solv ed for optimality Since the multicommodity problem is NPhard, the functional unit binding problem domain w as restricted to functionally pipelined designs with shorter latenc y The approaches pro vided in [16 ] and [17 ] ha v e the dra wback of their application being limited to a number of specic small sized lo wpo wer problems. 2.3 CDFGBased W orks 2.3.1 Scheduling The w orks described in preceding sections ha v e considered only blocks of straightline code. Ho we v er in addition to blocks of straightline code, a realistic design description usually contains both conditional and loop constructs. The problem of scheduling for controldominated applications is considered in [111 ] and [58 ]. Se v eral scheduling techniques based on a control o w graph (CFG) model are presented in [13 ] [11 ] [9 ] [31 ]. The CFG is basically a graphical description of a sequential program based implementation of the functionality Ev en though such a CFG model could be comfortably used to capture the e x ecution of instructions on a generalpurpose uniprocessor its application in e xploiting the parallelism inherent in typical control o w intensi v e applications is limited. Kim et al. [53 ] ha v e proposed some techniques to schedule conditional constructs. In [106 ], a conditional v ector is used to identify mutually e xclusi v e operations so that an operation can be scheduled in dif ferent control steps for dif ferent e x ecution instances. Camposano et al. [13 ] proposed a pathbased approach, called the As F ast As Possible (AF AP) scheduling, which rst e xtracts all possible e x ecution paths from a gi v en beha vior and schedules them independently The schedules for the dif ferent paths are then combined by resolving conicts among the e x ecution paths. Similarly dif ferent approaches ha v e been proposed for handling loop constructs, lik e the pipelining method described in [82 ] and loop folding [33 ]. 20
PAGE 30
Some graph representations that combine control and data o w into a single graph are presented in [47 ][29 ][56 ]. In [47 ], the control dependencies are e xpressed, b ut only the data dependencies are tak en into account, while the control o w is not e xploited. Here, the loops are represented using a branch node at the be ginning of the loop and a mer ge node at the end. Such a branchmer ge loop construct is de v eloped for each v ariable in the loop, thus, resulting in a comple x sub graph with a high de gree of redundanc y of branch and mer ge nodes. A similar representation is used in [29 ]. The Sprite Input Language described in [56 ] uses a single signal o w graph and is more conned to DSP applications. Amellal and Kaminska [6 ] presented a control and data o w graph (CDFG) model for system representation which includes a ne w representation of conditional branches. The y had de v eloped a mutual e xclusion testing procedure that pro vides for optimized resource sharing and critical path reduction. Some of the salient features of their CDFG representation include,W ith the representation of the data and control o ws in the same graph, both the datapath and the controller can be synthesized from that same graph.The CDFG is an optimized representation of the control and data o ws without an y redundant dependenc y representations.Their CDFG representation of the beha vioral descriptions does not impose an y restrictions on the scheduling tasks, thereby resulting in a better e xploration of the design space.A branch numbering scheme is de v eloped to solv e the problem of resource sharing among mutually e xclusi v e operations of the CDFG. Apart from the singlegraph model, the authors ha v e formulated a ne w mathematical approach for the scheduling problem based on penalty weights. The y had used the T ab u Search technique, which has been ef fecti v e in nding optimal solutions for man y types of lar ge and dif cult combinatorial optimization problems. The authors claim that the f ast and intelligent solution space e xploration pro vided by the T ab u technique mak es their scheduling algorithm quite po werful. The W a v esched scheduling algorithm presented in [61 ] uses a CDFG model that preserv es par allelism inherent in the application. This algorithm is aimed at minimizing the a v erage e x ecution time of controlo w intensi v e beha vioral descriptions. W ith its ability to o v erlap the schedules 21
PAGE 31
of independent iterati v e constructs, the bodies of which share resources, the W a v esched algorithm could e xplore pre viously une xplored re gions of the solution space. A general loop handling technique w as de v eloped to incorporate other optimization techniques lik e loop unrolling. Also, the algorithm can support multic ycled and pipelined functional units and can use chaining to enhance the c ycle time utilization. Potk onjak and Sri v asta v a [86 ] introduced a transformation, named rephasing, to manipulate the timing parameters in a CDFG during the highle v el synthesis of datapath intensi v e applications. The y use the rephasing approach to manipulate the v alues of the phases (or the relati v e times when corresponding samples are a v ailable at input and delay nodes) as an algorithmic transformation before the scheduling/allocat ion stage. The y ha v e sho wn that phase v alues can be chosen to transform and optimize the algorithms for f actors lik e area, throughput, latenc y and po wer In ef fect, the authors presented a technique for beha vioral optimization through the manipulation of timing constraints. Sentie ys et al. [94 ] presented an architectural synthesis tool dedicated to DSP applications, in which, synthesis is achie v ed under time as well as silicon cost constraints. The algorithm is described in VHDL beha vioral language, from which a CDFG is obtained and synthesized into processing, control, memory and communication units. The specications of the designs in VHDL allo wed for the interconnection with CAD and simulation tools. Kim et al. present a v erication method for VHDL beha vioral le v el design in [51 ]. T o identify coding errors that a compiler cannot detect, the VHDL code is con v erted into a CDFG and v erication patterns are applied on the CDFG. The y ha v e also proposed other algorithms lik e backw ard training and forw ard training algorithm to actuate coding error and propagate it. In an attempt to bridge the gaps between highle v el and logic synthesis, Ber gamaschi [8] presented a no v el internal model, called the Beha vioral Netw ork Graph (BNG), that represents both data and control constructs, for synthesis that co v ers the domains of both highle v el and logic synthesis. This model is an R T le v el netw ork capable of representing all possible schedules that a gi v en beha vior may assume. It allo ws highle v el synthesis algorithms to be formulated as logic transformations and ef fecti v ely o v erlapped with logic synthesis. The author has also addressed the problem of a lack of formal representation to be used by dif ferent algorithms and systems, which mak es the sharing of benchmark e xamples dif cult. 22
PAGE 32
K ollig et al. [54 ] described a ne w simulated annealingbased algorithm capable of solving all HLS tasks concurrently without the need of independent interconnect optimization. The HLS problem is formulated with a schedule time a module binding for each operation, a re gister binding for generated v alues and a Boolean v ariable indicating whether the inputs of commutati v e operations are to be sw apped. This formulation is subject to constraints including data dependencies, e x ecution time and module a v ailability Starting with an initial solution, ne w solutions are generated and either accepted or rejected depending on the acceptance criterion dened in the simulated annealing algorithm. The probability of accepting solutions with increasing cost depends on a control parameter which is gradually decreased while the annealing process proceeds. In GEM (Geometric Algorithm for Scheduling), proposed by Raje and Sarrafzadeh [88 ], a critical path based approach is used for scheduling operations. The algorithm uses weighted geometric point dominance matching from the operations onto the control steps. It starts with a CDFG and con v erts it into directed ac yclic graphs by breaking the loops and remo ving the feedback edges while maintaining the conditions for the loops. Such an algorithm is performed in O(nplogp) steps, where n is the number of disjoint paths in the CDFG and p the number of nodes in the longest path. The problem of scheduling a path is transformed into the problem of obtaining a matching from a set of points OPi' s, which represent the v arious operations in a path, onto a set of points Ci' s, which represent the control steps. Certain constraints are imposed onto the matching o wing to the data dependencies of the operations to be matched. The dependencies are usually described as precedence relations. W ang et al. [108 ] described a comprehensi v e highle v el synthesis system for controlo w intensi v e as well as datadominated beha viors. Their algorithm is based on an iterati v e impro v ement strate gy and performs clock selection, scheduling, module selection, resource allocation and assignment simultaneously and also consider the interactions between these tasks to benet completely from the design space e xploration at beha vior le v el. Their scheduling algorithm supports concurrent loop optimization and multic ycling under resource constraints. The authors use a v ariation of the general CDFG model for their synthesis system, where their CDFG model includes some additional nodes to represent the start of a loop, etc. Ho we v er their system assumes that the initial CDFG representation of the application is a v ailable. 23
PAGE 33
2.3.2 Allocation/Binding In [30 ], Elgamel et al. present a no v el approach for utilizing genetic algorithms to solv e Highle v el synthesis tasks with multiple v oltages. The y ha v e incorporated a ne w w ay of modeling and encoding the resulting chromosomes. Their system, tak es as its inputs, a CDFG, hardw are library and the time, area constraints. W ith this information, the algorithm solv es the tasks of scheduling, allocation and binding simultaneously in order to generate a solution optimized for a v erage and peak po wer The e v aluation function is formulated so as to consider the a v erage and peak po wer consumptions while satisfying the gi v en constraints. Choi and Kim [21 ] proposed an ef cient binding algorithm for po wer optimization in highle v el synthesis. The authors claim that the traditional approach of formulating binding problem as a multicommodity o w problem is limited to a class of small sized problems o wing to the NPhard nature of the multicommodity o w problem. The y ha v e de v eloped a ne w technique that uses the property of ef cient o w computations in a netw ork so that it is e xtensi v ely applicable to practical designs while producing near optimal results. The y propose a heuristic algorithm, named BINDlp, that nds a feasible binding by utilizing the o w computation steps and later rening them incrementally The application of increased parallelism to allo w v oltage reduction for the same computational throughput is depicted in [15 ]. This w ork led to se v eral other w orks lik e [74 ], which uses slack to a v oid unnecessary computations, and [5] which sho ws ho w a DFG might be partitioned for multiple v oltages. In [7], the v oltage idea w as combined with an iterati v e impro v ement approach using a square switching matrix as the basis for a signal correlation matrix. Crensha w et al. proposed se v eral heuristics in [22 ] to in v estigate the problem of e xploiting signal correlation between operations to nd a schedule and binding which minimizes switching. The y describe an algorithm for scheduling communications on a b us, which reduces b us switching upto 60% without an y increase in the number of c ycles required for the schedule. Their technique of capturing signal correlation information during beha vioral simulation can be applied in addition to popular v oltage reduction methods. The switching information, thus obtained through simulations, is stored in the from of a switching table. A cubic table represents the switching acti vity for conditional nodes by including data for switching from a nodeK @to a nodeK @ where j1. In 24
PAGE 34
the absence of conditional nodes, all the data represents switching between nodeK @andK @ 1, for allI. Thus, their method could be applied to graphs with both conditional and data nodes. Mehra et al. proposed the partitioning of CDFG into groups with minimized inter group communication in order to reduce the switched capacitance. Zhong et al. [112 ] presented a general suf cient condition for re gister binding to ensure that a gi v en set of functional units is perfectly po wer managed, i.e., does not contain an y spurious switching acti vity Their method is applicable to both datao w intensi v e and controlo w intensi v e beha viors and leads to a straightforw ard po wer managed re gister binding algorithm. The authors claim that their algorithm, be gin independent of the functional unit binding and scheduling steps, could be easily incorporated into e xisting highle v el synthesis systems. In [27 ], a technique is proposed to redesign the control logic to congure e xisting multiple x er netw orks to minimize spurious switching acti vity A re gister binding algorithm which guarantees an R TL circuit for control and datao w intensi v e beha viors, which is free of spurious switching acti vity is presented in [62 ]. 2.3.3 Our W ork The w ork presented here describes a HighLe v el Synthesis tool that could be used to solv e each of the aforementioned tasks. This tool is tar geted at both controlo w intensi v e and datao w intensi v e beha viors, and incorporates se v eral additional features lik e optimization for resources and po wer consumption. Our system uses a single graph representation of control and data o w dependencies as its intermediate form, and also incorporates a tool for transforming the original VHDL description into the corresponding CDFG. 25
PAGE 35
CHAPTER 3 CDFG EXTRA CTION FR OM VHDL The rst, and often ignored, step in the highle v el synthesis (HLS) process is the con v ersion of the original beha vioral description in VHDL into an intermediate representation that captures the details of the description in a form suitable for the ne xt steps of HLS. Some of the most commonly used intermediate representations include, the Data Flo w Graph (DFG), the Control Flo w Graph (CFG), and the Control and Data Flo w Graph (CDFG). While DFG is the most prominent form of representation, the CDFG has the adv antage of depicting both control and data constructs in a single graph, gi ving a better statespace e xploration capability for the later steps. In this chapter we describe a con v ersion tool that e xtracts such a CDFG from a gi v en beha vioral VHDL code. This tool is based on compiler lik e transformations and other beha vior preservi ng transformations. 3.1 Intr oduction T oday V ery High Speed Inte grated Circuit(VHSIC) Hardw are Description Language (VHDL) has emer ged as the def ault hardw are description language. VHDL is intended to describe a hardw are. This description can be fed to a simulator which simulates the beha vior of hardw are modeled in the VHDL description. If the description is correct, the simulation e xhibits the same beha vior as the hardw are. Initially VHDL w as intended for use as an input to the VHDL simulator and not for synthesis. The simulation of the code w as done to accurately predict the v oltage v alues on all nets at an y time and v erify timing relationships between changes on these nets. On the contrary the goal of the synthesis is to implement the gi v en beha vior by interconnecting components from a gi v en library Hence, simulation deals with timing while synthesis deals with connecti vity F or a simulation to generate the correct models for the beha vior of nets of interest, the description dri ving those nets 26
PAGE 36
need not be minimal as long as it produces the correct beha vior Therefore, dif ferent descriptions for the same functionality are lik ely to synthesize into designs of v arying quality VHDL models the hardw are as concurrently running processes. Each process contains an algorithmic description of the process' beha vior Thus, modeling hardw are in a single process results in a purely beha vioral description at the highest le v el of abstraction. Ho we v er as soon as a VHDL description contains more than one process, it denes some structure within the hardw are. This structure implicitly adds to the model' s beha vior It also lo wers the le v el of abstraction, which can come close to the hardw are if each process only has the functionality of a gate. VHDL modeling is analogous to the concept of a container The v alues within these containers change with time as the y are af fected by v aried input stimuli. Such a model is suitable for both synchronous and asynchronous beha vior The process of synthesizing an R TL structure from the functional description during the highle v el synthesis in v olv es three phases: the timing of computational operations ( sc heduling ) deter mining the number of instances of each resource needed ( allocation ), and the assignment of resources to computational operations ( binding ). Beha vioral synthesis requires transformation of the VHDL code into a internal representation which e xtracts re gisters, combinational logic equations and macros lik e '+', '*', etc., for the scheduling, allocation and binding processes. Most systems use something lik e the control o w graph and/or the data o w graph or the combination of the tw o lik e CDFG as their intermediate format. A CDFG is e xpected to capture all the control and data o w information of the original VHDL description while preserving the v arious dependencies [93 ]. This CDFG under goes incremental renement as it passes through v arious stages of a highle v el synthesis system to nally yield a re gister transfer le v el representation of the beha vioral specication. Scheduling partitions the set of arithmetic and logical operations in the CDFG into groups of operations so that the operations in the same group can be e x ecuted concurrently while trying to minimize the total e x ecution time and/or the hardw are cost. Researchers w orking on the highle v el synthesis problem be gin with the assumption that a o wgraph representation of the beha vioral description is a v ailable. Here, an often o v er look ed signicant step is the con v ersion of the VHDL code into a CDFG. This step, normally carried out manually prior to the highle v el synthesis, may tak e fe w minutes to fe w days based on the size and 27
PAGE 37
the comple xity of the VHDL code. Also, the chances for errors in the resulting CDFG increase with the increasing comple xity of the code, and being one of the initial steps, this w ould drastically af fect the accurac y of the whole designo w T o counter this problem, we ha v e de v eloped a ne w con v ersion tool that automates the process of obtaining CDFG from a VHDL code, thereby reducing the design time signicantly and pro viding a useful aid to the de v eloper The task of such a VHDL to CDFG compiler w ould be to e xtract a fully beha vioral description which w ould be fed to an architectural synthesis system that will add structural information to that description. 3.2 Pr eliminaries In this section, we pro vide the basic denitions and concepts which from the basis for our con v ersion tool. 3.2.1 Contr ol and Data Flo w Graph The Control and Data Flo w Graph is a directed ac yclic graph in which a node can be either an operation node or a control node ( representing a branch, loop, etc.) [92 ]. The directed edges in a CDFG represent the transfer of a v alue or control from one node to another An edge can be conditional representing a condition while implementing the if/case statements or loop constructs. Figure 3.1. sho ws a CDFG representation for the follo wing VHDL code fragment. A := B C + D; while( A0 ) loop A := A 1; end loop; Nodes In general, the nodes in a CDFG can be classied as one of the follo wing types [92 ].Operational nodes: These are responsible for arithmetic, logical or relational operations.Call nodes: These nodes denote calls to subprogram modules. 28
PAGE 38
Control nodes: These nodes are responsible for operations lik e conditionals and loop constructs.Storage nodes: These nodes represent assignment operations associated with v ariables and signals. In the CDFG representation sho wn in gure 3.1. nodes numbered 1, 2, and 6 are operational nodes, while node number 3 is a storage node, and nodes 4 and 5 are control nodes. Dependencies Gi v en an y tw o operational nodes in the CDFG, represented as N@and N where N@is a predecessor or ancestor of N in the CDFG, the possible dependencies between them are dened as follo ws:N is owdependent on N@if the output of N@is one of the inputs of N .N is antidependent on N@if the output of N is one of the inputs of N@.N is outputdependent on N@if the output of N is also the output of N@. A data dependenc y is said to e xist between tw o nodes in the CDFG if an y one of the abo v e three types of dependencies holds between them. 3.2.2 Basic VHDL Constructs VHDL is a language for describing digital electronic systems and is designed to ll a number of needs in the design process[45 ]. It allo ws description of the structure of a system, ho w it is decomposed into subsystems and ho w those subsystems are interconnected ( structur al description ). It also allo ws the specication of the function of the design using f amiliar programming language forms ( behavior al description ). The entity in VHDL, is the most basic abstraction that represents the design model. It is a structural model of some piece of hardw are. An entity consists of processes and instantiations of other entities, and when fully e xpanded, only the processes remain. These processes are connected through signals, and the processes and signals together dene the structure of the hardw are model. 29
PAGE 39
* + VA loop > end 0 1 false true A A B C D 1 2 3 4 5 6 7 Figure 3.1.. CDFG Representation 30
PAGE 40
A description of the internal implementation of the entity is gi v en in the ar c hitectur e body of that entity The beha vioral architecture body of entity describes its function in an abstract w ay and the concurrent statements in it are limited to process statements, subprogram calls and signal assignments. A process is a sequence of statements which can af fect the v alues of the signals interconnecting the processes. Each such process implements a part of the beha vior of the entity in which it is contained. All processes of an entity constitute the beha vioral model of that entity The process statements are further made up of sequential statements that are much lik e the kinds of statements we see in a con v entional programming language such as statements e v aluating e xpressions, statements assigning v alues to v ariables ( variableassignment statements ), conditional e x ecution statements ( ifthenelse case etc. ), repeated e x ecution statements ( loops ) and subprogram calls. In addition, there is the signal assignment statement, which is unique to hardw are modeling languages. This statement is similar to v ariable assignment statement, e xcept that it causes the v alue on a signal to be updated at some future time. VHDL is dened as an e v entdri v en model. An e v ent is said to ha v e occurred on a signal if updating that signal causes its current v alue to change [44 ]. Each e v ent is the result of an assignment to a signal. Ev ery signal assignment is bound to a specic moment in simulated time. Due to the e v ent dri v en model and the timing information in signal assignments, the concept of absolute time is b uried deeply into the semantics of VHDL [44 ] The tool described in this chapter handles the beha vioral description and the v arious constructs in v olv ed in it. The process body of the beha vioral architecture embodies the data and control o w of the design specication for synthesis purposes [92 ]. The concurrent signal assignment statements outside the process body can be vie wed as equi v alent process statements with the signal assignments being made within the process body 3.3 Implementation Details In this section, we will see the v arious details in v olv ed in the implementation of the tool. Section 4.1 describes the methodology follo wed in the e xtraction of the CDFG from VHDL code. Section 4.2 gi v es the transformations of v arious VHDL constructs and section 4.3 presents the 31
PAGE 41
v arious outputs generated by the tool. Section 4.4 describes a couple of VHDL constructs that are not handled by the tool. The o wgraph representation of the beha vioral description is obtained by parsing the input VHDL code. Figure 3.2. depicts the v arious steps in v olv ed in this methodology These steps are quite similar to the phases of a compiler each of which transforms the source program from one representation to another The le xical analysis phase translates the source program into a stream of tok ens, where each tok en is a sequence of characters with collecti v e meaning, such as an identier a k e yw ord, an operator or a punctuation character [7]. This stream of tok ens is further subjected to syntactic analysis which imposes a hierarchical structure on them to v erify the syntax of the program. The codes for these tw o phases were generated by applying the standard compiler construction tools, Le x and Y A CC [45 ], upon the IEEE standard VHDL syntax [44 ]. 3.3.1 Methodology While the parsing of the tok ens using Y A CC code imposes a hierarchical structure that could be visualized as a treestructure ( par se tr ee ), it does not actually generate such a parse tree. The parse tree, that is assumed to be a v ailable, is just a conceptual visualization of the syntactic structure of the program. Explicit codes are required to e xtract such a parse tree from the Y A CC code [92 ].Specic C++ codes were used for this purpose in our tool. The parse tree, thus obtained, is further compressed to obtain a syntax tr ee in which the operators appear as the interior nodes, and the operands of an operator are the children of the node for that operator The syntax tree is transformed, through another C++ code, into the nal control and data o w graph that depicts the total o w of the control and data in the original description. The v arious data dependencies that are inherent in the o w graph can be re v ealed through one full scan of the graph. Figure 3.3. gi v es a vi vid description of the abo v e methodology through an e xample. 3.3.2 Algorithmic Description In this section, we pro vide an o v ervie w of the algorithm, follo wed by the details. Figure 3.4. presents the pseudocode for the con v ersion. First, we generate codes for le xicalanalysis and the 32
PAGE 42
VHDL Source Code Lexical Analysis Tokens Syntax Analysis Parse Graph Graph Compression Syntax Graph Transformation CDFG Figure 3.2.. Steps In v olv ed in Extraction of the CDFG From VHDL Code syntaxanalysis using the standard Le x and Y A CC tools. The Y A CC program includes e xplicit functions to create nodes at each step in the syntactichierarchy Each such node is a collection of information on its function or purpose at that stage in the hierarchy and links to its predecessors and successors in that hierarchy F or e xample, an ar c hitectur e statement w ould create an ar c hitectur e node that stores the details of the corresponding entity node (a predecessor) and pr ocess nodes (successors) and nodes corresponding to statements within those processes (successors), etc. In short, each node in the hierarchy stores all the information rele v ant to that node. Once we ha v e e xtracted all the details re garding v arious operations and their order of e x ecution, the nodes corresponding to the constructs lik e Â”architectureÂ” and Â”processÂ”, which were used to obtain the links between e xternal inputs and the operations on them, can no w be safely discarded. Our ne xt step in the algorithm truncates the parse tree by remo ving all such nodes while preserving 33
PAGE 43
+ C 20 identifier B : = Statement Variable Assignment identifier A expression expression identifier number expression expression expression : = A + C 20 B Parse Tree Compress Transform : = 20 20 B Final CDFG Syntax Tree + B VHDL Code (e.g., A : = B 20 + C) Lexical Analysis Tokens (e.g., A, : =, B,..) Parsing (YACC) Figure 3.3.. Extraction of CDFG From a Sample VHDL Code 34
PAGE 44
(01) VHDL le xGenerate Le x code (02) VHDL parseGenerate Y A CC code (03) tok en setVHDL le x(VHDL code) (04) parse treeVHDL parse(tok en set); (05) CDFGtruncate(parse tree); (06) Generate adjacenc y lists (07) be gin (08) Create an empty list for each node in the CDFG (09) for each node N@in the CDFG (10) for each input k in the inputset of N@(11) if k is a v ariable, constant or a signal, add its node to the adjacenc y list of N@(12) if k is another node, add N@to the adjacenc y list of k (13) end for (14) end for (15) end (16) Generate set representation (17) be gin (18) V *; E *; (19) for each node N@in the graph (20) VVN@; (21) for each node N in the adjacenc y list of N@(22) EE(N@,N ); (23) end for; (24) end for; (25) end Figure 3.4.. Algorithm for CDFG Extraction their dependenc y information. Nodes representing se v eral punctuation marks and other syntactic details of the language, which are of least signicance to the hardw are design, are also scrapped in this step. A succinct representation of the CDFG is thus obtained and further transformed into the adjacenc y list representation by creating lists for each node in the CDFG and adding nodes cor responding to the inputset to that list. A depthrst search procedure is then emplo yed on the adjacenc ylists to obtain the setrepresentation, from which the output le for the VCG tool [37 ] is generated. Ne xt, we use the adjacenc ylists and the original noderepresentation to obtain the dependenc yrelation s. While obtaining such relations for each node, we tra v erse the CDFG up35
PAGE 45
w ards from that node until we reach a control node, so that we co v er the dependencies within each control block [92 ]. 3.3.3 T ransf orming VHDL Constructs In this section, we describe the transformations of each of the VHDL constructs that are handled by this tool. Each node in the transformation can be vie wed as a 3tuple gi v en as (id, type, input set). The id denotes the unique identier (usually a number) of that node. The type denotes the type of the node operational( *,+,,AND,OR,etc.), conditional (ifthenelse, case). The input set gi v es the set of all elements input to the node, which could be either v ariables/signals or e xpressions denoted by other nodeids. The id of a node is used to denote the output of that node. The input set of each node automatically enforces a data or control dependenc y and thus reduces the o v erhead of maintaining output and dependenc y sets for each node. This simplies, to a lar ge e xtent, the nal graph representation while preserving all the dependencies that e xist in the original VHDL description. W e no w describe the v arious VHDL statements and their corresponding transformations belo w: 3.3.3.1 Operational Statements These statements in v olv e arithmetic operators( +, , *, / ) or logical operators(AND, OR, NO T N AND, NOR, XOR, XNOR). The node type indicates the type of operator in v olv ed and the input list includes the tw o operands in v olv ed in the operation. In case of an unary minus () the input list in v olv es only one operand. A typical AND statement and its corresponding representation are sho wn belo w The corresponding graphical representation is sho wn in gure 3.5.. 36
PAGE 46
AND A B Figure 3.5.. CDFG: AND Operation AND : = C B A Figure 3.6.. CDFG: V ariable Assignment VHDL Statement: A and B T e xtual CDFG: Node Id: 1 T ype : And Input1 : A Input2 : B 3.3.3.2 Assignment Statements A VHDL assignment statement has a storage (v ariable or a signal) on the left side and an e xpression (consisting of primiti v e operations) on the right hand side. 37
PAGE 47
S OR < = D[0] D[1] Figure 3.7.. CDFG: Signal Assignment T F SEL 0 end D[0] D[1] OR AND A : = < = S B C > Figure 3.8.. CDFG: IfThenElse Statement 38
PAGE 48
LOOP SEL 0 T AND : = OR < = B A C D[1] D[0] S end F > Figure 3.9.. CDFG: Loop Statement 39
PAGE 49
V ariable Assignment : A v ariable assignment statement assigns a ne w v alue to a v ariable. The syntax of such a statement is gi v en belo w: v ariable assignment statement ::= tar get ':=' e xpression';' The tar get denotes the v ariable which is assigned the v alue of the e xpression. The v ariable assignment operator (:=) is represented as a Â”v ar assignÂ” node. The input set of such a node contains the nal data o w edge of the corresponding e xpression tree. A typical v ariable assignment statement and its te xtual representation are gi v en belo w The corresponding graphical representation is gi v en in gure 3.6.. VHDL Statement: C := A and B; T e xtual CDFG: Node Id: 2 T ype : V ar Assign Input1 : C Input2 : Node 1 Signal Assignment : A signal assignment statement is represented similar to that of v ariable assignment statement e xcept that the type of the node is Â”signal assignÂ” as sho wn belo w This statement is represented in the CDFG as sho wn in gure 3.7.. 40
PAGE 50
VHDL Statement: S= D[0] or D[1]; T e xtual CDFG: Node Id: 3 T ype : Or Input1 : D[0] Input2 : D[1] Node Id: 4 T ype : Signal Assign Input1 : S Input2 : Node 3 3.3.3.3 Conditional Statements ifthenelse statement The representation for such a statement is required to capture information about the condition, the statements that are e x ecuted when the condition is true and the statements that are e x ecuted when the condition is f alse. Therefore, the node for an ifthenelse statement is accompanied by a node describing the condition for that statement. Such a representation is sho wn belo w The graphical representation for such a node is gi v en in gure 3.8.. 41
PAGE 51
VHDL Statement: if(SEL0) then C := A and B; else S= D[0] or D[1]; T e xtual CDFG: Node Id: 5 T ype : Comparator Input1 : SEL Input2 : 0 Node Id: 6 T ype : If Condition: Node 5 Statements When T rue : Node 2 Statements When F alse: Node 4 Case Statement This is similar to the ifthenelse representation, e xcept that its node is accompanied by a node for the e xpression in case statement and one node each for the 'when' clauses of the case statement. An e xample is gi v en here. The corresponding graphical representation is similar to that of the ifthenelse statement. 42
PAGE 52
VHDL Statement: Case SEL when 0 =C := A and B; when others =S= D[0] or D[1]; end case; T e xtual CDFG: Node Id: 7 T ype : Case Expression: SEL Statements When 0 : Node 2 Statements When Others: Node 4 3.3.3.4 Loop Statements The input set of the nodes representing loop statements consists of nodeids representing statements in the loop. Also, these nodes are accompanied by a condition node( in case of while loop) or a range node (in case of for loop). The r ang e node is used to depict the range used for the iteration of the loop. F ollo wing is such the representation for the tw o loop constructs. Such a loop is represented in the CDFG as sho wn in gure 3.9.. 43
PAGE 53
VHDL Statement: while(E= 0) loop C := A and B; S= D[0] or D[1]; end loop; T e xtual CDFG: Node Id: 8 T ype : Comparator Input1 : E Input2 : 0 Node Id: 9 T ype : Loop Condition: Node 8 Statements In Loop : Node 2, Node 4 (v).Subprogram Calls: The input set of such a node w ould consist of the name of the subprogram called and the parameter list passed to that subprogram. The type is simply Â”subprogram callÂ”. The resulting format is sho wn belo w: VHDL Statement: VECT2INT(D A T A); T e xtual CDFG: Node Id : 10 T ype : Subprogram Call Ne w Control: VECT2INT P arameters : Data 44
PAGE 54
3.3.4 Output F ormats The con v ersion tool w as de v eloped in this w ork with an objecti v e to ease the process of generating an intermediate format from the beha vioral description which is used as the input to the highle v el synthesis. [93 ] [83 ] and [57 ] describe such intermediate formats used for synthesis. Ho we v er there is no standard representation for such an intermediate format [95 ] and researchers across the w orld emplo y dif ferent formats. Therefore, we ha v e come up with dif ferent output for mats to pertain to the needs of dif ferent users. Apart from the node r epr esentation of the o wgraph described in the pre vious subsection, the other representations generated by the tool are described here: 3.3.4.1 AdjacencyList Repr esentation Probably the most widely used form of representation for the CDFG is the adjacenc y list representation in which a link edlist structure is used to describe the edges of the graph. T ypically a node N is present in the link ed list associated with node N@if there is a directed edge from node N@to node N in the graph. The adjacenc y list representation for the CDFG of gure is sho wn in gure 6(a). The dependenc y relations can be e xtracted through a simple depthrst tra v ersal of the list structure. 3.3.4.2 Set Repr esentation In this, the CDFG is represented as a 2tuple (V ,E), where V is the set of nodes and E is the set of all edges in the CDFG. Set E is made up of tuples of the form (N@,N ) if there is a directed edge from node N@to node N 3.3.4.3 V isual Repr esentation The o w information and the dependencies in the CDFG are best understood through a visualization of the graph. A simple te xtual visualization of graphs is often too confusing and unreadable, especially when the graphs are huge. The V isualization of Compiler Gr aphs (VCG) tool described in [37 ] can be used to obtain a graphical vie w of the CDFG. Our tool generates an output le in the Gr aph Description Langua g e (GDL) used by the VCG tool to produce a visualization of the 45
PAGE 55
CDFG. The graphical vie w obtained with the VCG tool is similar to the CDFG depicted in gure 3.3.. 3.4 Unhandled F eatur es Our tool can ef ciently handle most of the VHDL constructs as specied in the IEEE standard 1164, b ut f ails to include a small subset of the VHDL. This subset includes,W ait StatementFile declaration: the concept of les is not used in hardw are.Alias declarationAttrib ute declaration and attrib ute specication: The attrib ute specication is mostly used to annotate entities. An attrib ute specication, in conjunction with attrib ute declaration, allo ws the user to dene and specify attrib utes. These attrib utes could be used to allo w the user to pro vide information to the synthesis system. F or e xample, time constraints could be indicated using attrib utes.Conguration specication and declaration. It has been observ ed that such constructs are least lik ely to be used in most of the VHDL synthesis problems, and therefore, do not question the applicability of our tool in the synthesis o w 3.5 Summary W e ha v e presented a tool for the con v ersion of a beha vioral VHDL description into its corresponding control and data o w graph representation that captures all the o w information of the original description. Experimental results presented in chapter 6 sho w that the tool is quite ef cient in con v erting v arious VHDL constructs into their corresponding o wgraph representations. The output les generated by the tool are e xpected to meet the requirements of v arious researchers across the w orld. W ith its accurac y and speed in con v ersion, our tool w ould be a signicant aid to speedup the initial steps in the VLSI design o w 46
PAGE 56
CHAPTER 4 SCHEDULING THE CDFG In this chapter we describe a scheduling approach, which is based on the T ab u search method, and is aimed at reducing the number of resources under the gi v en time constraints.This scheduling algorithm is an adaptation of the T ab usearch based scheduling algorithm described in [6 ], that uses a ne w mathematical formulation based on penalty weights. This algorithm is applied on the CDFG representation obtained from the rst module to nd the optimum number of resources that w ould be needed for the design. 4.1 Intr oduction Scheduling and allocation are tw o important steps in the synthesis process after translating the algorithmic description into an internal representation. Scheduling is the process of assigning a control step to each operation of the Control and Data Flo w Graph. A control step refers to the clock c ycle in which the corresponding operation w ould be e x ecuted. Allocation is the process of assigning v arious functional units to operations, storage units to v alues, and b uses to data transfers. A control unit could then be synthesized to synchronize the e x ecution of operations based on the w ay that operations are scheduled and the hardw are units allocated. A number of scheduling and allocation methods ha v e been proposed so f ar Amellal and Kaminska [6 ] enumerate certain quality measures that could be used to e v aluate the performance of scheduling and/or allocation algorithms:The quality of the solution produced; an optimal or a sub optimal design.The comple xity of the algorithm; the CPU runtime.The solution space e xploration.The possibility of handling lar ge applications ef ciently 47
PAGE 57
The controllability of the synthesis process through designer constraints: area, delay design rules, po wer consumption, etc.The possibility of predicting, with a maximum de gree of accurac y the pre vious parameters at a high le v el. The scheduling approach used in our synthesis system is deri v ed from the one described in the T ASS synthesis system [6 ]. In this approach, a timeconstrained scheduling is performed based on a popular mathematical optimization technique called the T ab u Search. The mathematical for mulation of this approach is de v eloped in a w ay to o v ercome some of the limitations of pre vious algorithms. The T ab u search technique pro vides for an ef cient and f ast solution space e xploration. Upon obtaining a satisf actory schedule, hardw are units are allocated to the graph. From the scheduled and allocated graph, an R TL structure could be e xtracted and a Finite State Machine synthesized for the controller This scheduling approach dif fers from other techniques since it is based on a control and data o w graph model, rather than just a Data Flo w graph, where the control o w is implemented using more po werful procedures than e xisting ones. Moreo v er the representation of the whole beha vioral description by a single CDFG results in a lo wsynthesis CPU runtime, as well as an optimized design due to the a v ailability of global information in a single graph. Also, the mathematical formulation used in this approach is based on penalty weights instead of cost estimation. A complete formulation of the scheduling problem is gi v en as a mathematical description which tak es into account almost all the area parameters, without an y increase in the comple xity The cost functions used in most of the other scheduling approaches impose a restriction on the solution space by xing the Functional Unit(FU)s performing each type of operation. W ith the scheduling approach adopted in our synthesis system, module selection could be performed after scheduling in order to better optimize the design that is to be generated. The T ab u Search technique used in the current scheduling approach is a metaheuristic procedure originally de v eloped by Glo v er for solving combinatorial optimization problems [34 ], in which heuristics are applied at dif ferent le v els and certain unoptimal mo v es are made to come out of locally optimal solutions. The application of this technique to a number of lar ge and dif cult combinatorial optimization problems [34 ] has sho wn that the T ab u Search technique is f aster and 48
PAGE 58
more ef cient than the better kno wn optimization techniques lik e Simulated Annealing for nding optimal solutions. Most of the ILPbased synthesis systems [97 ] [66 ] [23 ] characterize the cost of the hardw are units needed by the scheduling. The ILP models tend to simplify the objecti v e function, sometimes including only the functional units [97 ] often ignoring the area cost of storage, interconnections and control circuitry The inclusion of these parameters increases the comple xity of the formulation and w ould not be manageable for lar ge size applications. Also, their search space is restricted by their rigid cost functions and the prior specication of Functional Units for each operation. Better solutions could be obtained by considering the a v ailability of dif ferent ALUs for e x ecuting the same type of operations. The area cost could be reduced by e x ecuting dif ferent operations by the same Functional Unit and when more than one ALU is a v ailable to e x ecute the same type of operation. The mathematical formulation of the current scheduling approach is based on penalty weights rather than on cost e v aluation. These penalty weights, that include the real costs of the hardw are units, mak e it possible to tak e into consideration dif ferent area parameters of the design. An objecti v e function based on these penalty weights can be used to optimize the number of functional and storage units, as well as the number of interconnections. Most of the scheduling algorithms proceed by assigning control steps to operations one at a time, and hence, their results are strongly af fected by the order of such assignments. The ILP method, ho we v er being a global method, is more suitable for scheduling control and data inter dependent nodes than other scheduling approaches lik e the list scheduling. The transformational approach used in our system be gins with an initial scheduling, and then the nodes are mo v ed in order to minimize a dened objecti v e function, which is dened so as to reduce the area required by the current schedule. The assignment of control steps to nodes is done simultaneously instead of a single one at a time. In order to a v oid the ine xibility associated with other costbased formulations, a more complete and ef cient formulation is de v eloped taking adv antage of the use of real unit costs. The objecti v e function for our scheduling approach is gi v en as follo ws: @ @ (4.1) 49
PAGE 59
where each weight @is a costbased penalization of a w orst case assignment of operation nodes requiring the greatest amount of a gi v en hardw are resource [6 ] The scheduling approach, adopted from [6] is formulated as follo ws: Gi v en the number of control steps K, Minimize CE8 8 K J @ @ (4.2) satisfying the constraints, @ % @ % @ forC %LI%K (4.3) @+@ % for I I C %LIand'%LK (4.4) where,Kis the total number of nodes of the control and data o w graph,!@is the control step to which node i is assigned, @is the As Soon As Possible step and Li is the As Late As Possible step of node i, @is the propagation delay associated with an operation node i, and,I 'denotes node j is data dependant on node i. The penalty weights W i are selected in such a w ay that minimizing these weights w ould optimize the assignment of operation nodes, reducing the number of resources of a gi v en type required for these operations, while utilizing the mutual e xclusion among nodes in a control step arising from control dependencies. 4.2 Mutual Exclusion Among Operations The control dependencies inherent in a CDFG induce certain amount of mutual e xclusion among the operations in the CDFG. The mutual e xclusion property could be e xploited for potential resource sharing among such operations. Denition : T w o nodes,7 @and7 in a CDFG, are said to be mutually e xclusi v e, if and only if, the y cannot e xist in an y e x ecution path of the CDFG at the same time. It can be clearly seen that the nodes7 @and7 ha v e to be present in dif ferent branches of the same conditional block. 50
PAGE 60
> T F END > T F (0) (1) (2) (3) END (4) (5) (6) (7) + + * Figure 4.1.. Mutually Exclusi v e Nodes 51
PAGE 61
Consider the CDFG with a tw o conditional statements sho wn in gure 4.1.. F our possible e x ecution paths e xist for the gi v en CDFG, which are, (0) (1) (3) (4) (5) (7), (0) (2) (3) (4) (5) (7), (0) (1) (3) (4) (6) (7) and, (0) (2) (3) (4) (6) (7). W e can see that nodes (1) and (2) cannot e xist together in an y of these paths, and hence, the y are mutually e xclusi v e. Similarly nodes (5) and (6) are mutually e xclusi v e. Ho we v er node (2) is not mutually e xclusi v e with node (7), and neither is node (3) with node (6). Hence, only nodes in the same conditional block can be mutually e xclusi v e. W e observ e that nodes (1) and (2) can be scheduled in the same control step with a single adder allocated to them, since we kno w that only one of them w ould be e x ecuted. Similarly nodes (5) and (6) could be scheduled in a control step with a single multiplier allocated to them. Hence, it is adv antageous to schedule tw o mutually e xclusi v e operations of same type in the same control step. The CDFG representation, described in the pre vious chapter helps in determining all possible mutual e xclusions among the operations in the CDFG, which are then used in the scheduling algorithm described in the later sections. 4.3 P enalty W eights As discussed earlier the objecti v e function for the scheduling algorithm is based on a set of penalty weights that penalize an y assignment taht requires more hardw are resources. F our f actors that could af fect the number of resources are considered:F or each operation type m, the control step that has maximum number of operation nodes of type m is penalized, since that dictates the maximum number of resources e x ecuting operations of type m. Such a penalty weight is calculated as: 1 43 !5 n C % % (4.5) where, nis the maximum number of operations of type m that can be e x ecuted in control step s,3 is the cost of function unit e x ecuting operation of type m, andris the maximum number of operation types. The v alue of is obtained by considering the number of non mutually e xclusi v e operations of type m in each control step. 52
PAGE 62
F or the CDFG depicted in gure 4.2., there are a maximum of 2 multiplications in the control step 5, and a maximum of 2 additions in control step 2. Hence, the penalty weight associated w ould be W = 2 cost[*] + 2 cost[+].The ne xt weight penalizes control steps which ha v e a maximum number of non mutually e xclusi v e operations of same equi v alence class. Denition : T w o operations of type i and j are said to belong to the equi v alence class m, if and only if, both of them can be e x ecuted by a functional unit of type m. F or e xample, addition and subtraction operations could be performed by a single adder / subtracter module. T w o operations of type i and j belonging to equi v alence class m ha v e to be e x ecuted by 2 functional units of type i and j when assigned to the same control step, b ut, the y could be e x ecuted by a single functional unit of type m, when assigned to dif ferent control steps. The penalty for assigning them to the same control step is calculated as,3 I 3 3 (4.6) F or the CDFG in gure 4.2., the cost of assigning both addition and subtraction in same control step 4 is gi v en as 1 (C[+] + C[] C[+/]).Another penalty weight is associated with the life times of weights. Whene v er a node j uses the output of node i, the number of control steps separating i and j is the lifetime of the v ariable storing that v alue, The penalty f actor for storing such v alues is calculated as: @ @ 5 @ @ @ 5 @ 5 nr (4.7) where, @ = 1, ifI ', 0, otherwise. In gure 4.3., the output of node (1) is used by nodes (2) and (4). Node (4) is 2 control steps a w ay from node (1), and hence, it requires the output to be stored, leading to a weight of 1 cost[storage]. 53
PAGE 63
The nal penalty weight is associated with the number of b uses needed, which depends on the maximum number of distinct inputs used in a control step. In gure 4.3., the maximum number of inputs is 4 (used in control steps 1 and 3). Hence, the penalty weight associated with it is 4 cost[b us]. The de v elopment of objecti v e function based on the abo v e mentioned penalty weights adds e xibility to the scheduling algorithm. 4.4 Scheduling Algorithm A pseudocode for the T ab usearch based scheduling algorithm, e xtracted from [6 ] is sho wn in gure 4.4.. The ASAP and ALAP algorithms are emplo yed initially to obtain the least and greatest control steps for each node in the CDFG. The output from either of these algorithms could be chosen as the initial solution S. Such a solution is then e v aluated based on the penalty weights described in the pre vious section. The algorithm then under goes a number of iterations, in each of which, a best solution is selected from the neighborhood of current solution. If such a solution is better than the best one found so f ar then it is sa v ed as the best solution. The ne w neighborhood is calculated from the ne w solution. If this solution w as visited during the pre vious iterations, it is discarded to pre v ent c ycling of solutions. The iterations are repeated until a x ed number of iterations pass without an y impro v ement in the solution. This whole procedure could be repeated by changing the time constraints until a good schedule is obtained. The neighborhood of a solution is obtained by mo ving nodes from their current control step to another control step within the ASAP and ALAP v alues of that node. The nodes considered for mo v ement are chosen from the control steps that contrib ute most to the penalty weights in the current solution. Such nodes are mo v ed to control steps where there are lesser number of nodes of same operation type and of same equi v alence class. W ith the lar ge solution space e xplored by the T ab u search method, the scheduling algorithm w ould be able to perform global optimization of the number of resources used. Some e xperimental results to pro v e the ef cienc y of this approach are presented in chapter 6. 54
PAGE 64
F + > T F + + + + (9) (10) (0) (1) (2) (3) (4) (5) (6) (7) (8) 1 2 3 4 5 6 Figure 4.2.. Penalty W eights 55
PAGE 65
+ > F + + + (0) (1) (2) (3) (5) (4) 1 2 3 4 Figure 4.3.. LifeT ime and Number of Buses From CDFG (01) be gin (02) Determine ASAP and ALAP times for each node in CDFG (03) SGenerate initial solution (04) I fe wer thann r 7 ha v e passed without an y impro v ement on the best solution . choose best solution from neighborhood of S if ,not visited in pre vious iterations then if ,better than best solution found, then sa v e ,as the best solution ,. else discard ,. K I (05) end Figure 4.4.. Scheduling Algorithm 56
PAGE 66
CHAPTER 5 PO WEROPTIMIZED BINDING In this chapter we describe a gametheoretic binding algorithm for po wer optimization. An approach, based on Game Theory and Auction Theory w as proposed by Muruga v el et al. in [75 ] for minimizing the a v erage po wer of a circuit during scheduling and binding in highle v el synthesis. The y formulated these tw o problems as auctionbased noncooperati v e games, the solutions to which were obtained using the theory of Nash Equilibrium. W e ha v e adopted this approach of binding and e xtended it to include control constructs in order to apply it to the scheduled CDFG (SCDFG) obtained from the pre vious module. 5.1 P o wer Optimization During Binding The concept of po wer reduction has been tak en to higher le v els of abstraction with the automation of the design process. In f act, po wer consumption is gi v en special attention in the beha vioral synthesis, because the decision made at this le v el can ha v e signicant ef fects on the nal design. Also, pre vious research w orks indicate that designing at higher le v els of abstraction leads to a greater potential for po wer reduction [99 ]. Po wer reduction could be considered at v arious le v els of the highle v el synthesis process. Specically the reduction of po wer due to switching acti vity could be addressed while binding the functional units to operations in a scheduled Control and Data Flo w Graph (SCDFG). Here, we formulate the problem of po wer optimized binding using the Game Theoretic approach [76 ]. The binding algorithm is applied on the scheduled CDFG to obtain a binding matrix The binding matrix(B) is a matrix ordered by the control steps and functional units with each entry being either a '0' or a '1'. An entry I tak es the v alue '1' if functional unit'is bound to an operation in control stepI. A scheduled CDFG and its corresponding binding matrix are sho wn in gure 5.1.. Po wer optimization is achie v ed through the concept of functional unit sharing, in which, 57
PAGE 67
> + + + 1 2 3 o1 o2 o3 o4 o5 o6 (a) Scheduled CDFG Control Functional Units 1 1 1 1 1 1 1 0 0 0 0 step comp1 mult1 add1 add2 add3 0 1 1 0 0 0 0 (b) Corresponding BindingMatrix Figure 5.1.. Scheduled CDFG and its Binding Matrix 58
PAGE 68
neighborhood operations with at least one common input are assigned to the same functional unit so that the number of changing inputs are reduced [101 ]. 5.2 Basic Concepts W e no w gi v e a brief introduction of the general game theory and the Auction Theory used for the formulation of binding problem. 5.2.1 Game Theory Game theory w as de v eloped as a distinct approach by Neumann [76 ] to study the interaction among rational humans, each trying to maximize his or her prots in the gi v en circumstances. Initially the games were considered to be noncooper ative in which each player maximizes his or her re w ard irrespecti v e of the results of others. Nash [46 ] later e xtended the game theory to include cooper ative games in which the players agree upon a specic set of rules for the game and coordinate their strate gies to obtain the best result for the whole group. The players in a noncooperati v e game cannot collaborate and hence the prots are decentralized among them. The formal agreements among the players in cooperati v e games, which are enforced by an e xternal entity centralize the cost function among the players as opposed to the noncooperati v e games in which the control function is decentralized among the players trying to achie v e optimization at indi vidual le v el. In anKplayer game, each playerIis pro vided with a set of alternati v es (or strate gies) @, from which he can choose his mo v e, and a payof f function @that determines the player' s payof f for choosing a particular strate gy Each playerItries to apply a strate gy @ @such that his or her payof f is maximized in the gi v en circumstance, which is described by the set of strate gies selected by the otherK Cplayers. The game is played until it reaches a stable point where the strate gy of each player is optimal when compared to those of others. Nash has proposed a stable point solution, called the Nash Equilibrium [46 ], at which no player can impro v e his payof f by de viating alone from that point. In other w ords, Nash equilibrium is a set of actions, 1 @ @for each player such that no player can gain more by choosing an action @, dif ferent from @, when it is kno wn that each other player j (j i) is adhering to his or 59
PAGE 69
her strate gy Such an equilibrium can be calculated based on the number of players in the game and the strate gies a v ailable to each player The Nash equilibrium point indicates the most optimal solution for all players in the game. 5.2.2 A uction Theory Auction theory [49 ] is the study of selling objects among dif ferent b uyers in a w ay that is optimal for both the seller and the b uyers. Each b uyer bids a v alue for the object, which is the v alue that he is willing to pay for that object, and the seller selects the best bid from the a v ailable set of bids. Se v eral methods ha v e been proposed for the auction theory some of which, are the English auction, the Dutch auction, the rstprice sealedbid auction and the secondprice sealedbid auction. While applying the auction theory to the binding problem, the po wer consumption of each module is used as the bid v alue. The bid v alue is changed during the application of the English, Dutch or the secondprice sealedbid auction models. Ho we v er since the po wer consumed by a module for an operation is a constant v alue, none of these models can be applied for binding. Hence, the most suitable model is the rstprice sealedbid auction model, in which the bids are sealed and cannot be changed once submitted to the seller 5.3 Pr oblem F ormulation It is suggested in [46 ] that a v ery wide range of situations, lik e rms, action prices, etc., may be modeled as strate gic games. All such situations are characterized by the presence of a fundamental conict among the players. A conict arises in the binding problem too, with the a v ailability of more than one module for a single operation type. Hence, it is proposed to capture such a conict using the game theory The game theoretic model captures interactions among players by allo wing each player to be af fected by the actions of all players and not just by his/her action alone. This quality quite suits the problem of binding since the binding of an operation is af fected by that of other operations of same type. Applying Game theory and Auction theory the lo wpo wer binding problem is transformed into a problem of nding the Nash equilibrium for the bidding strate gies auction problem [75 ]. 60
PAGE 70
The a v ailable functional units are considered as the sellers and the operations as b uyers in the auction problem. F or an operation @, each functional units submits its po wer consumption for that operation as its sealed bid v alue and the lo west bidder is bound to that operation. The gametheoretic algorithm described in [75 ] w as aimed at datao w descriptions, and hence, their approach is applicable for scheduled data o w graphs (SDFGs). Ho we v er since the CHESS tool is intended to deal with both control and data o w descriptions, we e xtend the algorithm described in [75 ] to mak e it applicable to a scheduled CDFG (SCDFG). The e xtension stems from the inclusion of control nodes in the graph and by considering the corresponding functional units as sellers in the bidding problem. Also, the approach for SCDFGs includes the e xibility of binding the same functional unit to tw o mutually e xclusi v e operations of same type in a control step. The cost functions for each functional unit are e v aluated in the same w ay as described in [75 ]. The cost functions associated with each module are calculated initially and are assumed to be a v ailable to all the operations in the SCDFG. The cost functions are chosen so as to represent the po wer consumption of that module for that operation. It is calculated as the dif ference between the switching cost ( ) and the cost due to changing inputs ( @ ). Since each module is considered to be a tw oinput module, the number of changing inputs could be one, tw o or none. Also, the switching cost ( ) is calculated as the po wer consumption due to tw o changing inputs. The v alues @ and of the a v ailable modules are determined through simulations. Thus, the cost of e x ecuting an operation o on module m is gi v en as,3 @ (5.1) The binding strate gy is applied indi vidually to each control step independent of the other control steps. Moreo v er it is assumed that a module can be assigned to only one operation in each control step. Gi v en a set of modules 1 and the set of operators 1 a cost matrix is obtained initially which gi v es the cost of all possible binding pairs among them. The cost matrix,3is represented as,3 n n n n CE C CE8 + C 3 +8 r r r r 61
PAGE 71
where, c(a,b) gi v es the cost of binding a module to operation. Gi v en a set of modules @ 1 : and the set of operations, @ 1 : where @is the number of modules compatible to operations of typeI, and @is the number of operations of that type, we consider the feasible sets of moduleoperation pairs, and arri v e at the most optimal one using the Nash equilibrium. If S is the set of all possible moduleoperation pairs, then tehre could be man y feasiblesets, such that lik e 1 1 : : The total cost associated with such a set, is gi v en as,3 : /21 3 where is the module assigned to operation The problem of binding is no w reduced to nding a feasible set such that, C( ) = min C(A), and, we claim that such a set is the Nash equilibrium for the moduleoperation pairs, since, at that point, none of the modules can impro v e its cost with a dif ferent assignment, or in other w ords, by de viating form the stable point. If we assume that there are three adders, one multiplier and a comparator a v ailable for the SCDFG sho wn in gure 5.1., then the bidding strate gy could be applied to nd the optimal binding for the three add operations. Since there is only one multiplier it w ould be bound to the operations and. Similarly the comparator is assigned to the operation 1. Ho we v er we ha v e a choice of three adders for the tw o add operationsandin control step 2. If 1 andnare the three adder modules, then the costmatrix for these operations w ould be written as, n n n n 3 1 3 1 3 3 3 n 3 r r r r Each of the modules 1 and bids for the operations and and the Nash equilibrium of such a bidding strate gy gi v es the optimal binding solution. When the same strate gy is applied for control step 3, the Nash equilibrium gi v es the same module assigned to operationas the optimal one forr, since it reduces the number of input changes at that module. The binding problem, once formulated as an auction based noncooperati v e nite game, is solv ed for the Nash equilibrium with the aid of a computational game solv er named Gambit[71 ]. Gambit is a tool for solving nite games, which tak es, as its inputs, a matrix representation of a strate gic form game, and arri v es at all the Nash equilibriums for that game through an iterati v e elimination of strongly dominated strate gies. 62
PAGE 72
The binding strate gy at each control step is treated as a separate game in itself and hence, our algorithm needs to be applied indi vidually at each control step in the appropriate sequence. The ef fects of the bindings from pre vious control steps are considered during the calculation of the payof f matrix for the current step. 5.3.1 Algorithmic Description The pseudocode for the Game theoretic binding algorithm is gi v en in gure 5.4.. The algorithm tak es the SCDFG and gi v es a binding matrix for that graph. Since the bidding strate gy is applied separately for each control step, the Nash equilibrium has to be calculated for each set of applicable modules in each control step. Thus, the number of times the Nash equilibrium is applied depends on the number of control steps in the SCDFG as well as the number of modules applicable for an operation in that control step. The e v aluation of the Nash equilibrium using the Gambit tool requires the set of all possible strate gies in the form of a matrix. When applied to our problem, this corresponds to a matrix representation of the costs of all possible moduleoperation pairs in that step. A polynomialtime algorithm for obtaining such a matrix representation is described in gure 5.3.. It can be seen that the ef fecti v e cost includes the cost due to changing inputs based on the pre vious bindings. The po wer and delay based cost matrix at each iteration is calculated by considering each operation @in a control step and all the compatible modules for the operation @. Each entry (I,') in a the matrix lists the absolute and cumulati v e po wer v alues of assigning a module to the operation @along with the cumulati v e delay of assigning that module to operation @. The po wer v alues are calculated based on the equation 5.1, taking into consideration the number of changing inputs for each module assignment. Gi v e the strate gies a v ailable for each player and the number of alternati v es @of each player the Nash equilibrium can be obtained through the follo wing procedure,determine all the possible outcomes for the game based on the set of alternati v es, @, for each playerI.for each playerI, deri v e the inequality sho wing his/her equilibrium poitn, a de viation from which w ould not result in an y increase of gain for that player Such an inequality is gi v en as, 63
PAGE 73
input : Cost Matrix C, Number of players N, set of strate gies, S for each player output: Nash Equilibrium Solution NE (01) be gin (02)7 *; (03) for each player i do (04) for each set of strate gies (K 1 1),...,(K ) do (05) Calculate @ A A 3 I K @ n(06) end for (07) for each strate gy (K @ @) do (08)K @ Nash equilibrium for player i satisfying the inequality (09) @ A A : A r @ A A : A A ; (10) end for (11)7 7 K @; (12) end for (13) end Figure 5.2.. Algorithm for Finding the Nash Equilibriumthe set of strate gies satisfying all the inequalities from step (ii) gi v es the Nash equilibrium for the model. The most timeintensi v e step of the algorithm lies in nding the Nash equilibrium, the algorithm for which is gi v en in gure 5.2.. It is stated in [80 ] that there is a guaranteed e xistence of an equilibrium point when a number of alternati v e strate gies are considered. Due to this, it is sho wn that the comple xity of an algorithm for nding the Nash equilibrium is between P and NP while, the problem of nding a feasible binding solution itself is NPcomplete [48 ]. 5.4 Summary The problem of po wer optimized binding could be successfully formulated as an auctionbased game and the optimal solution obtained using the concept of Nash Equilibrium. The Nash equilibrium indicates a stable point for a set of players, with the optimization function distrib uted among them, and hence is suitable for nding a globally optimal binding solution for the gi v en scheduled CDFG. The ef cienc y of such an algorithm is discussed in the ne xt chapter 64
PAGE 74
input : Set of operations O, set of modules M in control step i output: P ayof f matrix C (01) be gin (02) for each module mM do (03) for each operation oO do (04) : Po wer to e x ecute operation o on module m in control step i; (05) : time delay for e x ecuting operation o on module m in control step i; (06) : @ 1 / = n ; (07) : @ 1 / r n ; (08)3 = : B; (09) end for (10) end for (11) end Figure 5.3.. Algorithm for Finding the Cost Matrix input : Scheduled Control Data Flo w Graph(SCDFG), Set of modules, Po wer and Delay V alues output: Binding Matrix B (01) be gin (02) for each control step i do % Conider all the modules that can e x ecute that operation and select the best one (03) for each set of compatible modulesin control step i do % Use algorithm in gure to nd the Cost matrix based on equation 5.1 (04)3 D Calculate the po wer and delay based cost matrix % Use algorithm in gure to nd the Nash Equilibrium solution (05)7 Determine the Nash Equilibrium based on M, O and CM (06) Represent the Nash equilibrium solution as a binding matrix B (07) end for (08) end for (09) end Figure 5.4.. Binding Algorithm 65
PAGE 75
CHAPTER 6 EXPERIMENT AL RESUL TS This chapter presents a set of e xperimental results to v erify the accurac y and ef cienc y of each module in the CHESS tool. Each phase of the synthesis system w as coded and tested separately and later these codes were inte grated to form a comprehensi v e tool for synthesis. F or inte gration of the dif ferent codes, the output of each code w as made compatible for the ne xt step. Ho we v er other output formats are a v ailable at each step in a w ay suitable for the user The user has the option of multiple entry and e xit points in out tool. F or e xample, the user can either be gin with a beha vioral VHDL code initially or with a CDFG representation, if already a v ailable. In the earlier case, the CDFG for the VHDL code w ould be e xtracted by the tool and used for further steps. The modularization of the synthesis tool, combined with the feature of multiple entry and e xit points, has pro vided us with an opportunity to test e xtensi v ely and indi vidually the ef cienc y of each part, the outcomes of which are presented here. 6.1 CDFG Extraction Fr om Beha vioral VHDL The proposed con v ersion tool has been implemented and tested successfully with v arious VHDL source les. These source les were chosen carefully to co v er all possible constructs of the VHDL that the tool can handle in dif ferent combinations. It w as observ ed that the tool could e xtract the o w information from these les quite accurately The code for le xical analysis of the VHDL code w as de v eloped using the GNU Fle x, and the code for the syntactic analysis using the GNU Bison tool. The output from these is then applied to a sequence of C++ codes which e xtract the parsetree and trim it to obtain the nal CDFG. W e tested our tool with some of the standard HLSynthesis VHDL benchmarks. T w o dif fer ent classes of e xamples were considered for testing : operationdominated e xamples and controldominated e xamples. From the rst class of e xamples, we used standard VHDL programs lik e the 66
PAGE 76
T able 6.1.. Experimental Results for CDFG Extraction From Beha vioral VHDL Specication Benchmark No of nodes T otal e x ecution. time operational control call storage dif feq 10 1 0 15 1 sec ellipf 26 1 0 37 2 sec radix512 16 3 12 23 2 sec f ft 21 8 0 48 2 sec dhrc 13 1 18 33 2 sec controller counter 17 12 0 19 3 sec gcd 4 5 0 10 3 sec kalman lter 17 11 0 30 5 sec barcode 6 18 0 26 4 sec display 26 2 0 37 5 sec f alsepath 4 10 0 52 4 sec beamformer 15 10 0 8 4 sec dif fer ential equation (dif feq) and the elliptic wave lter (ellipf) From the second class of e xamples, we used algorithms lik e the Gr eater Common Divider (GCD), Kalman lter F ast F ourier T r ansform (f ft), etc. The number of tasks e xtracted from these codes and the time required to e xtract them ha v e been presented in table 6.1. The number of v arious operational, control and other nodes gi v e an idea of the size of the problem. The corresponding CPU times sho wn in the other column depicts the runtimes required for the e xtraction of these CDFGs. It can be seen from the table that the runtimes are reasonably short, e v en when the model is used for lar ge e xamples. All the e xperiments ha v e been run on a 40MHz Sun Sparc station running SunOS with 256MB RAM. The results sho w that CDFG e xtraction for VHDL descriptions that are more operationintensi v e is f aster than that for controlintensi v e descriptions. The graphical representations of the CDFGs obtained for some of these circuits are sho wn in gures 6.1. through 6.4.. In these gures, data dependencies are sho wn by solid arro ws, and the control edges by dashed arro ws. In most of the gures, the storage nodes were not sho wn for the sak e of simplicity 6.2 Scheduling The scheduling algorithm w as also implemented in C++ on a Sparc station. The algorithm tak es, as its inputs, the CDFG, the types of functional units a v ailable, and the time constraints. F or illustrating the ef cienc y of this algorithm, we concentrate on tw o of the mostwidely used 67
PAGE 77
* * * + < end T T T F Dataedge Controledge < Comparator Subtractor + Adder Multiplier Ni Node i N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 Figure 6.1.. CDFG Extracted for the Dif ferential Equation Benchmark Circuit 68
PAGE 78
+ + + + + + + + + + + + + + + + + + + + + < * * * Dataedge Controledge < Comparator Subtractor + Adder Multiplier Ni Node i N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 N15 N16 N17 N18 N19 N20 N21 N22 N23 N24 N25 N26 N27 N28 Figure 6.2.. CDFG Extracted for the Elliptic Filter Benchmark Circuit 69
PAGE 79
< < < < < < < < <= <= and and := < := end <= Dataedge Controledge < Comparator Subtractor + Adder Multiplier Ni Node i N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 N15 N16 N17 N18 N19 Figure 6.3.. CDFG Extracted for the Greatest Common Di visor Benchmark Circuit 70
PAGE 80
< * * < < < / < < + + + + + + / / + := := + N1 N15 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 N16 N17 N18 N19 N20 N21 N22 N23 N24 N25 N26 N27 N28 N29 N30 N31 Dataedge Controledge < Comparator Subtractor + Adder Multiplier Ni Node i / Divider := Var. Assignment Figure 6.4.. CDFG Extracted for the F ast F ourier T ransform Benchmark Circuit 71
PAGE 81
control step 1 o1 o2 * + < o3 o4 o5 o6 o7 o8 o9 o10 o11 4 3 2 + Figure 6.5.. ASAP Schedule for Dif ferential Equation Benchmark benchmark circuits, the dif feq and the elliptic lter and compare the results to some of the e xisting systems. The follo wing types of functional units were assumed to be a v ailable for scheduling the operations adder subtractor adder/subtractor multiplier di vider and comparator 6.2.1 The Differ ential Equation Benchmark The ASAP and ALAP schedules for the dif feq are sho wn in gures 6.5. and 6.6. respecti v ely It can be seen that the CDFG can be scheduled in four control steps, b ut, requires more resources. Our algorithm tries to optimize the number of resources by using the functional unit that can per form both addition and subtraction instead of separate adder and subtractor modules (see section 5.4). Such a schedule obtained by our algorithm is sho wn in gure 6.7.. Also, alternati v e time constraints are tried to achie v e further resourceoptimizatio n as sho wn in gure 6.8.. The number of components gi v en by this algorithm is compared against that of HAL[85] and SPLICER [12 ] systems for the same benchmark circuit in table 6.2. 72
PAGE 82
control step 1 o1 o2 4 3 2 * + < o3 o4 o5 o6 o7 o8 o9 o10 o11 + Figure 6.6.. ALAP Schedule for Dif ferential Equation Benchmark * 1 o1 o2 4 3 2 +/o4 +/o8 o5 < +/+/o3 o6 o7 o9 o10 o11 control step Figure 6.7.. Optimal Schedule for the Dif ferential Equation Benchmark in 4 Control Steps 73
PAGE 83
* 1 4 3 2 * +/+/o10 < +/+/* o1 o2 o3 o4 o5 o6 o7 o8 o9 o11 control step Figure 6.8.. Schedule for the Dif ferential Equation Benchmark in 6 Control Steps 74
PAGE 84
T able 6.2.. Comparison of Schedules for the Dif ferential Equation Benchmark Circuit Scheduling No. of No. of No of FUs Algorithm control steps b uses (+) () (+/) (*) () ASAP 4 10 2 1 3 1 ALAP 4 8 1 1 2 1 HAL [85 ] 4 3 1 1 2 1 SPLICER [12 ] 4 6 1 1 2 1 CHESS (Solution 1) 4 6 1 2 1 CHESS (Solution 2) 6 4 1 1 1 HAL[85] and SPLICER[12 ] could schedule the dif feq benchmark with tw o multipliers, an adder a subtractor and a comparator b ut with CHESS, we were able to combine the adders and the subtractors by using a single ALU from the library that performs both the operations. This could be achei v ed because of the notion of the equi v alence classes of operations used during the search for an optimized design. W ith six control steps, only one multiplier one comparator and a single ALU performing addition and subtraction are required. Such a schedule w as generated in 15 iterations in a CPU time of around 1 second. 6.2.2 Elliptic Filter The elliptic lter in v olv es more number of computations than the dif feq benchmark. The scheduled CDFG for the elliptic lter obtained by this algorithm is sho wn in gure 6.9.. Here, the number of control steps w as constrained to 18. The outcomes with this and other timeconstraints are compared to those of other systems in table 6.3. T able 6.3.. Comparison of Schedules for the Elliptic Filter Benchmark Circuit Scheduling No. of No. of No of FUs Algorithm control steps b uses (+) (*) ASAP 17 8 4 2 ALAP 17 8 3 2 HAL [85 ] 19 6 2 2 FDLS [84 ] 17 3 2 CHESS (Solution 1) 17 8 3 3 CHESS (Solution 2) 18 6 2 2 CHESS (Solution 3) 19 5 2 1 CHESS (Solution 4) 28 4 1 1 75
PAGE 85
* o17 o17 + o22 + o22 o17 + o22 + o22 o17 + o22 + o22 o17 o17 + o22 o17 o17 o17 + o22 o17 o17 + o22 + o22 + o22 o1 + + + o14 o16 + o6 + o3 o7 + o5 o8 o9 o11 + o11 + 1 2 3 4 5 6 7 8 9 13 14 15 16 17 18 12 11 10 Figure 6.9.. Scheduled CDFG for the Elliptic Filter Benchmark Circuit 76
PAGE 86
Se v eral design alternati v es ha v e been generated for 17, 18, 19 and 28 control steps. Only tw o adders, tw o multipliers and six b uses were required for the schedule in 18 control steps. The schedule with 17 control steps does not pro vide signicant impro v ement o v er other approaches because of the limited solution space e xplored in this case, 17 being the critical path length. W ith increasing control steps, the solution space is e xtended, and after a number of trials, the best solution is obtained with 28 control steps, where only a single adder and a single multiplier are required. These solutions were produced in a CPU time of around 6 to 14 seconds. The CDFG based scheduling algorithm, adopted from [6] w as further impro v ed for lesser e x ecution times. The determination of mutual e xclusi v eness among the operations is no w performed during the initial stages of e xtracting the CDFG from VHDL specication. This step, being remo v ed from the scheduling algorithm has enhanced the speed of the algorithm, while adding little or no timeo v erhead to the CDFG e xtraction step. In the original w ork, a branch numbering approach, similar to the node coloring approach, w as used for the mutual e xclusion test. That approach requires the assignment of pairs of numbers to each node, depicting its branch le v el in the graph, follo wed by a simple comparision procedure determining the mutual e xclusi v eness between each pair of nodes. W e a v oid this step by performing a similar test simultaneously while e xtracting each node of the CDFG in the rst phase of our tool. One other aspect of this scheduling algorithm is the memory requirement. The algorithm is based on an iterati v e impro v ement scheme, and hence, requires the storage of the best set of solutions from the pre vious iterations, where each solution is gi v en as a list of control step v alues assigned to the operations. This demands huge lists for designs in v olving more operations. T o o v ercome this problem of e xcessi v e memory requirement, instead of sa ving a whole soluiton, we sa v e only the modication characterizing the mo v es at each iteration. Such a modication could be stored as the node in v olv ed in the mo v e, its pre vious control step and its ne w control step. Finally the optimal solution is obtained by applying the stored sequence of changes upon the initial solution. The graph sho wn in gure 6.10. sho ws the impro v ement in memory utilization obtained with this optimization approach. It can seen that the memory requirement has nearly been reduced by 30% with our approach. 77
PAGE 87
Figure 6.10.. Impro v ement in Memory Requirement 6.3 Binding The algorithm described in section 5.4 w as coded in C++ and tested upon the scheduled CDFGs obtained pre viously The Nash equilibrium w as calculated at e v ery iteration of the algorithm using the Gambit tool for game theory [71 ]. T o pro vide for alternati v e binding strate gies for the operations, a library of FUs w as de v eloped using the Cadence design tool in 0.35MOSIS SCN3M SCMOS technology These cells were characterized for po wer and delay through simulations. Each simulation w as performed using the hspice simulation tool upon 100,000 random input v ectors that were generated using MA TLAB. Some other v alues were borro wed from [87 ]. The library cells and their po wer and delay v alues are depicted in table 6.4. The scheduled CDFGs of the benchmark circuits discussed in the pre vious section were subject to the game theoretic binding strate gy Before this, the total po wer for the SCDFGs were calculated using a random binding strate gy and a greedy approach. Later the po wer v alues obtained through our binding strate gy are compared against these, as illustrated in table 6.5. W e can see that, on an 78
PAGE 88
T able 6.4.. Po wer and Delay V alues of the Library Cells S.No. Functional Unit Po wer (mW) Delay(ns) 1 Add1(ripplecarry) 0.260 10.71 2 Add2(carrylookahead ) 0.475 9.52 3 Sub1(ripplecarry) 0.358 8.58 4 Sub2(carrylookahead) 0.549 7.02 5 Add Sub(ripplecarry) 0.816 15.4 6 Mult1(P arallel) 1.238 43.7 7 Mult2(W allace tree) 5.275 21.71 8 Comp 0.187 17.0 T able 6.5.. Comparison of Binding Results Benchmark Random Greedy Our approach %Red. o v er %Red. o v er Circuit binding(mW) binding(mW) (mW) Random Greedy FIR 24.005 18.300 15.005 37.49 18.005 IIR 16.290 15.304 10.290 36.83 32.76 dif feq 24.191 9.641 9.304 61.54 3.04 ellipf 34.595 28.890 25.595 26.02 11.41 f ft 43.804 25.105 22.005 49.76 12.34 kalman 33.585 27.780 24.890 25.87 13.85 gcd 2.624 2.216 2.216 15.55 0 a v erage, our tool could achie v e a 18% reduction in the a v erage po wer consumed o v er the greedy approach, for suf ciently lar ge benchmark circuits. The rst four benchmark circuits are more datadominated applications, while the last three in v olv e more control structures. The impro v ement obtained with the controldominated circuits is less compared to that of others because of the use of a single comparator for the control nodes. Also, it is observ ed that the reduction in po wer obtained through our approach is less for smaller circuits as compared to lar ger ones, because, the smaller circuits, when formulated with the gametheoretic approach, pro vide lesser number of strate gies to be e xplored while optimizing the cost function, as in the case of the gcd benchmark circuit. From these results, it can be concluded that the problem of binding for lo wpo wer can be successfully formulated using the Gametheoretic approach. 79
PAGE 89
CHAPTER 7 CONCLUSIONS W e ha v e presented a comprehensi v e tool for highle v el synthesis with the e xibility of multiple entry and e xit points. Our tool includes an automatic con v ersion of a beha vioral VHDL description into its corresponding CDFG representation that captures all the o w information of the original description. W e ha v e demonstrated, through se v eral e xamples, that the tool is quite ef cient in con v erting v arious VHDL constructs into their corresponding o wgraph representations. A ne w technique using the T ab u search method w as adapted to solv e the scheduling problem for global resource optimization. The lar ge design spaces e xplored through the tab u search result in producing better optimized design. A gametheoretic based po wer optimizing binding algorithm w as proposed that tak es adv antage of the decentralized optimization features of noncooperati v e games for obtaining an optimal assignment for each operation. The output les generated by the tool are e xpected to meet the requirements of v arious researchers across the w orld. W ith its accurac y and speed, our tool w ould be a signicant aid to speedup the initial steps in the VLSI design o w 80
PAGE 90
REFERENCES [1] H. Achatz. Â”Extended 0/1 IP F ormulation for the Scheduling Problem in High Le v el SynthesisÂ”. In EUR OD A C'93 pages 226 Â–231, 1993. [2] V Agarw al, A. P ande, and M. Mehendale. Â”High Le v el Synthesis of MultiPrecision Data Flo w GraphsÂ”. In F ourteenth Int. Conf on VLSI Design pages 411 Â–416, 2001. [3] A.K umar and M. Bayoumi. Â”Lo wpo wer binding of function units in highle v el synthesisÂ”. In 42nd Midwest Symp. on Cir cuits and Systems v olume 1, pages 214 Â–217, 1999. [4] M. A. Ali Shatna wi and M. Sw amy Â”Scheduling of DSP Data Flo w Graphs onto Multiprocessors for Maximum ThroughputÂ”. In IEEE Int. Symposium on Cir cuits and Systems v olume 6, pages 386Â–389, 1999. [5] M. Aloqeely and C. Chen. Â”Sequencer based Datapath Synthesis of Re gular Iterati v e AlgorithmsÂ”. In Design A utomation Confer ence pages 155 Â–160, San Die go, CA, 1994. [6] S. Amellal and B. Kaminska. Â”Functional Synthesis of Digital Systems with T ASSÂ”. In IEEE T r ansactions on Computer Aided Design of Inte g Cir cuits and Systems v olume 13, pages 537Â–552, May 1994. [7] R. A.V .Aho and J.D.Ullman. Â”Compiler s: principles, tec hniques and toolsÂ” AddisonW esle y 1997. [8] R. Ber gamaschi. Â”Beha vioral Netw ork Graph Unifying the Domains of HighLe v el and Logic SynthesisÂ”. In 36th Design A utomation Confer ence pages 213 Â–218, June 1999. [9] R. Ber gamaschi, S. Raje, I. Nair and L. T re villyan. Â”ControlFlo w v ersus DataFlo w SchedulingÂ”. IEEE T r ans. on VLSI Systems pages 491 Â–496, June 1994. [10] J. Bhask er and H. Lee. Â”An Optimizer for Hardw are SynthesisÂ”. IEEE Des. T est Comput. 7(5):20 Â–36, October 1990. [11] S. Bhattacharya, S. De y and F Br glez. Â”Performance Analysis and Optimization of Schedules for Conditional and LoopIntensi v e SpecicationsÂ”. In Design A utomation Confer ence pages 491 Â–496, June 1994. [12] B.M.P angrle and D.D.Gajski. Â”Design tools for intelligent silicon compilationÂ”. In IEEE T r ans. on Computer Aided Design v olume CD A6, pages 1098 Â–1112, No v ember 1987. [13] R. Camposano. Â”P athBased Scheduling for SynthesisÂ”. In IEEE T r ans. on Computer Aided Design of Inte g Cir cuits and Systems v olume 10, pages 85 Â–93, January 1991. 81
PAGE 91
[14] V Chaiyakul, D. Gajski, and L. Ramachandran. Â”High Le v el T ransformation for Minimizing Syntactic V ariancesÂ”. In Design A utomation Confer ence pages 413 Â–418, Dallas, TX, June 1993. [15] A. Chandrakasan, M. Potk onjak, R. Mehra, J. Rabae y and R. Brodersen. Â”Optimizing Po wer Using T ransformationsÂ”. IEEE T r ans. on CAD 14:12 Â–31, 1995. [16] J. Chang and M. Pedram. Â”Re gister Allocation and Binding for Lo w Po werÂ”. In D A C 1995. [17] J. Chang and M. Pedram. Â”Module Assignment for Lo w Po werÂ”. In ED A C 1996. [18] S. Chaudhuri and R. W alk er Â”Computing Lo wer Bounds on Functional Units before SchedulingÂ”. In IEEE T r ans. on V ery Lar g e Scale Inte g of VLSI Systems v olume 4, pages 273Â–279, June 1996. [19] S. Chaudhuri, R. W alk er and J. Mitchell. Â”Analyzing and Exploiting the Structure of the Constraints in the ILP Approach to the Scheduling ProblemÂ”. IEEE T r ans. on VLSI Systems 2(4):456 Â–471, 1994. [20] K. R. ChingT ang Chang and R. A. W alk er Â”HighLe v el DSP Synthesis Using the COMET Design SystemÂ”. In Sixth Annual IEEE Int. ASIC Confer ence and Exhibit pages 408Â–411, September 1993. [21] Y Choi and T Kim. Â”An ef cient lo wpo wer binding algorithm in highle v el synthesisÂ”. In IEEE Int. Symp. on Cir cuits and Systems v olume 4, pages IV Â–321 Â– IV Â–324, 2002. [22] J. Crensha w and M. Sarrafzadeh. Â”Lo wPo wer Dri v en Scheduling and BindingÂ”. In GLSVLSI'98 Laf ayette, Lousiana, February 1998. [23] J. C.T .Hw ang and Y .C.Hsu. Â”A formal approach to scheduling problem in data path synthesisÂ”. In IEEE Int. Conf on CAD v olume 10, pages 464Â–475, Apr 1991. [24] A. Dasgupta and R. Karri. Â”HighReliability Lo wEner gy Microarchitecture SynthesisÂ”. In TCAD 1998. [25] A. Dekk ers and E. Aarts. Â”Global Optimisation and Simulated AnnealingÂ”. Math. Pr o gr am 50:367Â–393, 1991. [26] S. De v adas and A. R. Ne wton. Â”Algorithms for Allocation in Datapath SynthesisÂ”. In IEEE T r ans. on CAD on Inte g Cir and Systems v olume 8, pages 768Â–781, July 1989. [27] S. De y A. Raghunathan, N. Jha, and K. W akabayashi. Â”Controller based Po wer Management for ControlFlo w Intensi v e DesignsÂ”. In IEEE T r ans. on Comptuer Aided Design v olume 18, pages 1496 Â–1508, October 1999. [28] M. Dhodhi, F Hielscher R. Storer and J. Bhask er Â”Datapath Synthesis Using a ProblemSpace Genetic AlgorithmÂ”. In IEEE T r ans. on Computer Aided Design of Inte g Cir cuits and Systems v olume 14, pages 934 Â–944, August 1995. [29] J. Eijndho v en and L. Stok. Â”A Data Flo w Graph Exchange StandardÂ”. In Eur opean Design A utomation Confer ence pages 193 Â–199, 1992. 82
PAGE 92
[30] M. Elgamel and M. Bayoumi. Â”On Lo w Po wer High Le v el Synthesis Using Genetic AlgorithmsÂ”. In 9th Int. Conf on Electr onics, Cir cuits and Systems v olume 2, pages 725 Â–728, September 2002. [31] Y F ann, M. Rim, and R. Jain. Â”Global Scheduling for HighLe v el Synthesis ApplicationsÂ”. In Design A utomation Confer ence pages 542 Â–546, June 1994. [32] C. Gebotys. Â”An Optimization Approach to the Synthesis of MultiChip ArchitecturesÂ”. In IEEE T r ans. on VLSI Systems v olume 2, pages 11 Â–20, 1994. [33] E. Girczyc. Â”Loop W inding A Data Flo w Approach to Functional PipeliningÂ”. In Int. Symp. on Cir cuits and Systems pages 382 Â–385, 1987. [34] F Glo v er Â”T ab u Search: A T utorialÂ”. Interfaces 20(4):74Â–94, 1990. [35] G. Goossens, J. Rabae y J. V ande w alle, and H. D. Man. Â”An Ef cient Microcode Compiler for Application Specic DSP ProcessorsÂ”. IEEE T r ans. on Computer Aided Design Inte g Cir cuits and Systems 9(9):925 Â–937, September 1990. [36] D. Grant and P Den yer Â”Address Generation for Array Access Based on Modulus m CounterÂ”. In Eur opean Confer ence on Design A utomation pages 118 Â–123, P aris, France, 1991. [37] G.Sander Â”VCG: visualization of compiler gr aphsÂ” Uni v ersitat des Saarlandes, Feb 1995. [38] S. Gupta and S. Katk oori. Â”F orceDirected Scheduling for Dynamic Po wer OptimizationÂ”. In IEEE Computer Society Annual Symposium on VLSI pages 68 Â–73, 2002. [39] R. Hartle y and A. Casa v ant. Â”T reeheight Minimization in Pipelined ArchitecturesÂ”. In Int. Conf on Computer Aided Design pages 112 Â–115, Santa Clara, CA, No v ember 1989. [40] P Hilnger Â”A HighLe v el Language and Silicon Compiler Digital Signal ProcessingÂ”. In International Symposium on Cir cuits and Systems pages 213Â–216, 1985. [41] C. Hitchcock and D. Thomas. Â”A Method of Automatic Data P ath SynthesisÂ”. In Design A utomation Confer ence pages 484Â–489, June 1983. [42] S. Hong and T Kim. Â”Bus Optimization for Lo w Po wer Data P ath Synthesis Based on Netw ork Flo w MethodÂ”. In ICCAD 2000. [43] C. Huang, Y Chen, Y Lin, and Y Hsu. Â”Data P ath Allocation Based on Bipartite W eighted MatchingÂ”. In Int. Conf on Computer Aided Design pages 499 Â–504, Orlando, FL, June 1990. [44] Institute of Electrical and Electronics Engineers, Ne w Y ork. Â”IEEE Standar d VHDL Langua g e Refer ence ManualÂ” ieee std. 10761987 edition, 1988. [45] J.Bhaskar Â”A VHDL primerÂ” Addison W esle y 1999. [46] J.F .Nash. Â”Equilibrium Points in NPerson GamesÂ”. In National Academy of Sciences of the United States of America v olume 36, pages 48 Â–49, Jan 1950. [47] G. D. Jong. Â”Data Flo w Graph: System Specication with the most Unrestricted SemanticsÂ”. In Eur opean Design A utomation Confer ence pages 401 Â–405, 1991. 83
PAGE 93
[48] J.T eich, T Blickle, and L.Thiele. Â”An Ev olutionary Approach to SystemSynthesisÂ”. In F ir st Online W orkshop on Soft Computing 1996. [49] J.W .Friedman. Â”Game theory with applications to economicsÂ” Oxford Uni v ersity Press, 1986. [50] T Ka w aguchi and T T odaka. Â”Operation Scheduling by Annealed Neural Netw orksÂ”. In IEICE T r ans. Fund. Electr Commun. Comput. Sci. pages 656 Â–663, June 1995. [51] J. Kim, S. P ark, Y Seo, and D. Kim. Â”P attern Generation for V erication of VHDL Beha vioralLe v el DesignÂ”. In The F ir st IEEE Asia P acic Confer ence on ASICs pages 332 Â–335, August 1999. [52] T Kim and C. Liu. Â”A Ne w Approach to the Multiport Memory Allocation Problem in Datapath SynthesisÂ”. VLSI Inte gr ation 19(3):133 Â–160, 1995. [53] T Kim, N. Y oneza w a, J. Liu, and C. Liu. Â”A Scheduling Algorithm for Conditional Resource Sharing A Hierarchical Reduction ApproachÂ”. In IEEE T r ans. on Computer Aided Design of Inte g Cir cuits and Systems v olume 13, pages 425 Â–437, April 1994. [54] P K ollig and B. AlHashimi. Â”Simultaneous scheduling, allocation and binding in high le v el synthesisÂ”. In Electr onic Letter s v olume 33, pages 1516Â–1518. August 1997. [55] D. K olson, A. Nicolau, and N. Dutt. Â”Inte grating Program T ransformations in the Memorybased Syntehsis of Image and V ideo AlgorithmsÂ”. In Int. Conf on Computer Aided Design pages 27 Â–30, San Jose, CA, 1994. [56] T Krol, J. Meerber gen, C. Niessen, W Smits, and J. Huisk en. Â”The Sprite Input Language: An Intermediate F ormat for HighLe v el SynthesisÂ”. In Eur opean Design A utomation Confer ence pages 186 Â–192, 1991. [57] K.S.V allerio and N.K.Jha. Â”T ask graph e xtraction for embedded system synthesisÂ”. [58] D. K u and G. Micheli. Â”Relati v e Scheduling Under T iming ConstraintsÂ”. IEEE T r ans. on Computer Aided Design 11:696 Â–718, June 1992. [59] K. K ucukcakar and A. P ark er Â”Data P ath T radeof f using MAB ALÂ”. In Design A utomation Confer ence pages 511 Â–516, Orlando, FL, June 1990. [60] F K urdahi and A. P ark er Â”REAL: A Pr goram for Re gister AllocationÂ”. In Design A utomation Confer ence pages 210Â–215, Miami Beach, FL, June 1987. [61] G. Lakshminarayana, K. Khouri, and N. Jha. Â”W a v esched: A No v el Scheduling T echnique for ControlFlo w Intensi v e Beha vioral DescriptionsÂ”. In IEEE T r ans. on Computer Aided Design pages 244 Â–250, 1997. [62] G. Lakshminarayana, A. Raghunathan, N. Jha, and S. De y Â”T ransforming ControlFLo w Intensi v e Designs to F acilitate Po wer ManagementÂ”. In Int. Conf on Computer Aided Design pages 657 Â–664, No v ember 1998. [63] B. Landwehr P Marwedel, and R. Domer Â”OSCAR: Optimum Simultaneous Scheduling, Allocation and Resource Binding Based on Inte ger ProgrammingÂ”. In EUR OD A C'94 pages 90 Â–95, 1994. 84
PAGE 94
[64] M. Lange vin and E.Cern y Â”A Recursi v e T echnique for Computing Lo wer Bound Perfor mance of SchedulesÂ”. In A CM T r ansactions on Design A utomation of Electr onic Systems v olume 1, pages 443Â–456, October 1996. [65] H. Lee and S. Hw ang. Â”A Scheduling Algorithm for Multiport Memory Minimization in Datapath SynthesisÂ”. In The Asia and SouthP acic Design A utomation Confer ence pages 93 Â–100, 1995. [66] J. Lee, Y .Hsu, and Y Lin. Â”A Ne w Inte ger Linear Programming F ormulation for the Scheduling Problem in Data P ath SynthesisÂ”. In Int. Conf on Computer Aided Design pages 20Â–23, 1989. [67] J. S. Lis and D. D. Gajski. Â”Synthesis from VHDLÂ”. In IEEE Int. Conf on Computer Design pages 378Â–381, 1988. [68] T L y D. Knapp, R. Miller and D. MacMillen. Â”Scheduling Using Beha vioral T emplatesÂ”. In Design A utomation Confer ence pages 101 Â– 106, 1995. [69] T L y and J. Mo wchenk o. Â”Applying Simulated Ev olution to HighLe v el SynthesisÂ”. In IEEE T r ans. on Computer Aided Design of Inte g Cir cuits and Systems v olume 12, pages 389 Â–409, March 1993. [70] S. D. M. Potk onjak and R. Ro y Â”Synthesisfor testab ilit y using tranformationsÂ”. In Asia and SouthP acic Design A utomation Confer ence pages 485Â–490, 1995. [71] R. McK elv e y A. McLennan, and T T uroc y Â”Gambit: Softwar e T ools for Game TheoryÂ” California Inst. of T ech. and Uni v of Minnesota and T e xas A&M Uni v ., September 2002. [72] G. Mek enkamp, P Middelhoek, E. Molenkamp, J. Hofstede, and T Krol. Â”A Syntax based VHDL to CDFG T ranslation Model for HighLe v el SynthesisÂ”. In VHDL Int. User s F orum Santa Clara, March 1996. [73] G. D. Micheli and D. C. K u. Â”HERCULES A System for HighLe v el SynthesisÂ”. In Design A utomation Confer ence pages 483Â–488, 1988. [74] J. Monteiro, S. De v adas, P Ashar and A. Mauskar Â”Scheduling T echniques to Enable Po wer ManagementÂ”. In D A C 1996. [75] A. K. Muruga v el and N. Ranganathan. Â”A GameTheoretic Approach for Po wer Optimization during Beha vioral SynthesisÂ”. 16th Int. Conf on VLSI Design pages 452 Â–458, January 2003. [76] J. V Neumann. Â”Zur Theorie der GesellschaftsspieleÂ”, 1928. [77] A. Nicolau and R. Potasman. Â”Incremental T ree Height Reduction for High Le v el SynthesisÂ”. In Design A utomation Confer ence pages 770 Â–774, San Fransisco, CA, June 1991. [78] T Nijhar and A. Bro wn. Â”Appliation of Source Code T ransformations in a High Le v el Synthesis En vironmentÂ”. Design A utomation Gr oup Resear c h J ournal 1995. [79] A. Orailoglu and D. Gajski. Â”Coacti v e Scheduling and Checkpoint Determination During High Le v el Synthesis of SlefReco v ering MicroarchitecturesÂ”. In IEEE T r ans. on V ery Lar g e Scale Inte gr ation Systems v olume 2, pages 304 Â–311, September 1994. 85
PAGE 95
[80] C. P apadimitriou. Â”Algorithms, Games and the InternetÂ”. In A CM Symp. on Theory of Computing pages 749 Â–753, 2001. [81] I. P ark and C. K yung. Â”F ast and Near Optimal Scheduling in Automatci Data P ath SynthesisÂ”. In Design A utomation Confer ence pages 680 Â–685, San Francisco, CA, June 1991. [82] N. P ark and A. P ark er Â”Sehw a: A Softw are P ackage for Synthesis of Pipelined Data P ath from Beha vioral SpecicationÂ”. In IEEE T r ans. on COmputer Aided Design of Inte g Cir cuits and Systems v olume 7, pages 356 Â–370, March 1988. [83] S. P ark. Â”VDT : VHDL de v eloper' s toolkitÂ”. http://popp y .snu.ac .kr/v dt, 2002. [84] P G. P aulin and J. P Knight. Â”F orce Directed Scheduling for the Beha vioral Synthesis of ASIC' sÂ”. In IEEE T r ans. Computer Aided Design v olume 8, pages 661Â–679, June 1989. [85] P G. P aulin and J. P Knight. Â”Algorithms for HighLe v el SynthesisÂ”. In IEEE Design and T est of Compter s v olume 6, pages 18Â–31, December 1999. [86] M. Potk onjak and M. Sri v asta v a. Â”Rephasing: A T ransformation T echnique for the Manipulation of T iming ConstraintsÂ”. In Design A utomation Confer ence pages 107 Â–112, San Francisco, CA, 1995. [87] V Raghunathan, S. Ra vi, A. Raghunathan, and G. Lakshminarayana. Â”T ransient Po wer Management through HighLe v el SynthesisÂ”. In IEEE Int. Conf on Computer Aided Design pages 545 Â–552, 2001. [88] S. Raje and M. Sarrafzadeh. Â”GEM: A Geometric Algorithm for SchedulingÂ”. In IEEE Int. Symposium on Cir cuits and Systems v olume 3, pages 1991Â–1994, May 1993. [89] M. Rim and R. Jain. Â”Lo wer bound performance estimation for the high le v el synthesis scheduling problemÂ”. In IEEE T r ans. Computer Aided Design of Inte gr ated Cir cuits and Systems v olume 13, pages 451Â–458, April 1994. [90] W Rosenstiel. Â”Optimizations in High Le v el SynthesisÂ”, 1986. [91] M. Rosien, G. Smit, and T Krol. Â”Generating a CDFG from CC++ CodeÂ”. In 3r d PR OGRESS W orkshop on Embedded Systems pages 200 Â–202. STW T echnology F oundation, October 2002. [92] S.Amellal and B.Kaminska. Â”Scheduling of a control and data o w graphÂ”. In IEEE Int. Symp. on Cir cuits and Systems v olume 3, pages 1666Â–1669. May 1993. [93] S.Amellal and B.Kaminska. Â”A CDFG model for synthesis from VHDLÂ”. T echnical report, Ecole Polytechnique de Montreal, August 1997. [94] O. Sentie ys, E. Martin, and J. Philippe. Â”VLSI Architectural Synthesis for an Acoustic Echo Cancellation ApplicationÂ”. In W orkshop on VLSI Signal Pr ocessing pages 84 Â–92, October 1993. [95] N. S.Go vindarajan and R.V emuri. Â”Dependecn y analysis and operation graph generation for highle v el synthesis from beha vioral VHDLÂ”. [96] A. Sharma and R. Jain. Â”InSyn: Inte grated Scheduling for DSP ApplicationsÂ”. In IEEE T r ans. on Signal Pr ocess. v olume 43, pages 1966 Â–1977, August 1995. 86
PAGE 96
[97] H. Shin and N. W oo. Â”A Cost Function based Optimization T echnique for Scheduling in Data P ath SynthesisÂ”. In Int. Conf on Computer Aided Design pages 424Â–427, 1989. [98] W .T Shiue, J. Denison, and A. Horak. Â”Lo w Po wer Binding using Linear ProgrammingÂ”. In 43r d IEEE Midwest Symp. on Cir cuits and Systems v olume 2, pages 980 Â–983, 2000. [99] W T Shiue, J. Denison, and A. Horak. Â”Lo w Po wer Binding using Linear ProgrammingÂ”. In IEEE Midwest Symp. on Cir cuits and Systems v olume 2, pages 980 Â–983, 2000. [100] A. Sllame and V Drabek. Â”An Ef cient ListBased Scheduling Algorithm for HighLe v el SynthesisÂ”. In Eur omicr o Symp. on Digital System Design pages 316 Â–323, September 2002. [101] V Srikantam, N. Ranganathan, and S. Srini v asan. Â”CREAM: combined re gister and module assignment with oor planning for lo w po wer datapath synthesisÂ”. In IEEE Int. Conf on VLSI Design pages 223 Â–228, 2000. [102] N. The ypayasuw an, H. T ang, and A. Doboli. Â”An ExplorationBased Binding and Scheduling T echnique for Synthesis of Digital Blocks for Mix edSignal ApplicationsÂ”. In 2003 Int. Symp. on Cir cuits and Systems v olume 5, pages 629 Â–632, May 2003. [103] F Tsay and Y Hsu. Â”Data P ath Construction and RenementÂ”. In Int. Conf on Computer Aided Design pages 308 Â–311, Santa Clara, CA, No v ember 1990. [104] C. Tseng and D. Sie wiorek. Automatic syntheis of data path on digital systems. In IEEE T r ans. Computer Aided Design of Inte gr ated Cir cuits and Systems v olume 5, pages 379Â– 395, July 1986. [105] M. Unaltuna and V Pitchumani. Â”ANSA: A Ne w Neural Net based Scheduling Algorithm for HighLe v el SynthesisÂ”. In IEEE Symp. on Cir cuits and Systems v olume 1, pages 385 Â–388, 1995. [106] K. W akabayashi and T Y oshimura. Â”A Resource Sharing Control Synthesis Method for Conditional BranchesÂ”. In Int. Conf on Computer Aided Design pages 62 Â–65, Santa Clara, CA, No v ember 1989. [107] R. W alk er and D. Thomas. Â”Beha vioral T ransformations for Algorithmic Le v el IC DesignÂ”. IEEE T r ans. on Computer Aided Design of Inte g Cir cuits and Systems 8(10):1115 Â–1128, October 1989. [108] W W ang, T T an, J. Luo, Y Fei, L. Shang, K. V allerio, L. Zhong, A. Raghunathan, and N. Jha. Â”A Comprehensi v e HighLe v el Synthesis System for ControlFlo w Intensi v e Beha viorsÂ”. In GL VLSI'03 2003. [109] X. W ang and S. Grainger Â”The Reduction of the Number of Equations in the ILP F ormulation for the Scheduling Problem in HighLe v el SynthesisÂ”. In The Second International Confer ence on Concurr ent Engineering and Electr onic Design A utomation pages 483 Â–487, 1994. [110] T W ilson, N. Mukherjee, M. Gar g, and D. Benerji. Â”An ILP Solution for Optimum Scheduling, Module and Re gister Allocation, and Operation Binding in Datapaht SynthesisÂ”. VLSI Design 3(1):21 Â–36, 1995. 87
PAGE 97
[111] W W olf, A. T akach, C. Huang, and R. Mano. Â”The Princeton Uni v ersity Beha vioral Synthesis SystemÂ”. In Design A utomation Confer ence pages 182 Â–187, June 1992. [112] L. Zhong, J. Luo, Y Fei, and N. Jha. Â”Re gister Binding based Po wer Management for HighLe v el Synthesis of Control Flo w Intensi v e Beha viorsÂ”. In IEEE Int. Conf on Computer Design: VLSI in Computer s and Pr ocessor s 2002. [113] J. Zhu and D. D. Gajski. Â”Soft Scheduling in High Le v el SynthesisÂ”. In 36th Design A utomation Confer ence pages 219Â–224, 1999. [114] G. Zimmermann. Mds: The mimola design method. J ournal of Digital Systems 4(3):337Â– 369, 1980. 88
