USF Libraries
USF Digital Collections

Genetic algorithm based design and optimization of VLSI ASICs and reconfigurable hardware

MISSING IMAGE

Material Information

Title:
Genetic algorithm based design and optimization of VLSI ASICs and reconfigurable hardware
Physical Description:
Book
Language:
English
Creator:
Fernando, Pradeep Ruben
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla
Publication Date:

Subjects

Subjects / Keywords:
Evolutionary algorithms
Multi-objective VLSI floorplanning
FPGA IP core implementation
Evolvable hardware eesign
Extreme environment electronics
Dissertations, Academic -- Computer Science and Engineering -- Doctoral -- USF   ( lcsh )
Genre:
non-fiction   ( marcgt )

Notes

Summary:
ABSTRACT: Rapid advances in integration technology have tremendously increased the design complexity of very large scale integrated (VLSI) circuits, necessitating robust optimization techniques in many stages of VLSI design. A genetic algorithm (GA) is a stochastic optimization technique that uses principles derived from the evolutionary process in nature. In this work, genetic algorithms are used to alleviate the hardware design process of VLSI application specific integrated circuits (ASICs) and reconfigurable hardware. VLSI ASIC design suffers from high design complexity and a large number of optimization objectives requiring hierarchical design approaches and multi-objective optimization techniques. The floorplanning stage of the design cycle becomes highly important in hierarchical design methods.In this work, a multi-objective genetic algorithm based floorplanner has been developed with novel crossover operators to address the multi-objective floorplanning problem for VLSI ASICs. The genetic floorplanner achieves significant wirelength savings (>19% on average) with little or no increase in area (<3% penalty) over previous floorplanners that perform simultaneous area and wirelength minimization. Hardware implementation of genetic algorithms is gaining importance because of their proven effectiveness as optimization engines for real-time applications. Earlier hardware implementations suffer from major drawbacks such as absence of GA parameter programmability, rigid pre-defined system architecture, and lack of support for multiple fitness functions. A compact IP core that implements a general purpose GA engine has been designed to realize evolvable hardware in field programmable gate array devices.The designed GA core achieved a speedup of around 5.16x over an analogous software implementation. Novel reconfigurable analog architectures have been proposed to realize extreme environment analog electronics. In this work, a digital framework has been developed to realize self reconfigurable analog arrays (SRAA) where genetic algorithms are used to evolve the required analog functionality and compensate performance degradation in extreme environments. The framework supports two methods of compensation, namely, model based lookup and genetic algorithm based compensation and is scalable in terms of the number of fitness evaluation modules. The entire framework has been implemented as a digital ASIC in a leading industry-strength silicon-on-insulator (SOI) technology to obtain high performance and a small form factor.
Thesis:
Dissertation (Ph.D.)--University of South Florida, 2009.
Bibliography:
Includes bibliographical references.
System Details:
Mode of access: World Wide Web.
System Details:
System requirements: World Wide Web browser and PDF reader.
Statement of Responsibility:
by Pradeep Ruben Fernando.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 131 pages.
General Note:
Includes vita.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 002069509
oclc - 608537361
usfldc doi - E14-SFE0003289
usfldc handle - e14.3289
System ID:
SFS0027605:00001


This item is only available as the following downloads:


Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam 2200397Ka 4500
controlfield tag 001 002069509
005 20100422143109.0
007 cr mnu|||uuuuu
008 100422s2009 flu s 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0003289
035
(OCoLC)608537361
040
FHM
c FHM
049
FHMM
090
TK7885 (Online)
1 100
Fernando, Pradeep Ruben.
0 245
Genetic algorithm based design and optimization of VLSI ASICs and reconfigurable hardware
h [electronic resource] /
by Pradeep Ruben Fernando.
260
[Tampa, Fla] :
b University of South Florida,
2009.
500
Title from PDF of title page.
Document formatted into pages; contains 131 pages.
Includes vita.
502
Dissertation (Ph.D.)--University of South Florida, 2009.
504
Includes bibliographical references.
516
Text (Electronic dissertation) in PDF format.
520
ABSTRACT: Rapid advances in integration technology have tremendously increased the design complexity of very large scale integrated (VLSI) circuits, necessitating robust optimization techniques in many stages of VLSI design. A genetic algorithm (GA) is a stochastic optimization technique that uses principles derived from the evolutionary process in nature. In this work, genetic algorithms are used to alleviate the hardware design process of VLSI application specific integrated circuits (ASICs) and reconfigurable hardware. VLSI ASIC design suffers from high design complexity and a large number of optimization objectives requiring hierarchical design approaches and multi-objective optimization techniques. The floorplanning stage of the design cycle becomes highly important in hierarchical design methods.In this work, a multi-objective genetic algorithm based floorplanner has been developed with novel crossover operators to address the multi-objective floorplanning problem for VLSI ASICs. The genetic floorplanner achieves significant wirelength savings (>19% on average) with little or no increase in area (<3% penalty) over previous floorplanners that perform simultaneous area and wirelength minimization. Hardware implementation of genetic algorithms is gaining importance because of their proven effectiveness as optimization engines for real-time applications. Earlier hardware implementations suffer from major drawbacks such as absence of GA parameter programmability, rigid pre-defined system architecture, and lack of support for multiple fitness functions. A compact IP core that implements a general purpose GA engine has been designed to realize evolvable hardware in field programmable gate array devices.The designed GA core achieved a speedup of around 5.16x over an analogous software implementation. Novel reconfigurable analog architectures have been proposed to realize extreme environment analog electronics. In this work, a digital framework has been developed to realize self reconfigurable analog arrays (SRAA) where genetic algorithms are used to evolve the required analog functionality and compensate performance degradation in extreme environments. The framework supports two methods of compensation, namely, model based lookup and genetic algorithm based compensation and is scalable in terms of the number of fitness evaluation modules. The entire framework has been implemented as a digital ASIC in a leading industry-strength silicon-on-insulator (SOI) technology to obtain high performance and a small form factor.
538
Mode of access: World Wide Web.
System requirements: World Wide Web browser and PDF reader.
590
Advisor: Srinivas Katkoori, Ph.D.
653
Evolutionary algorithms
Multi-objective VLSI floorplanning
FPGA IP core implementation
Evolvable hardware eesign
Extreme environment electronics
690
Dissertations, Academic
z USF
x Computer Science and Engineering
Doctoral.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.3289



PAGE 1

Genetic Algorithm B ased Design and Optimization of VLSI ASICs and Reconfigurable Hardware by Pradeep Ruben Fernando A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Com puter Science and Engineering College of Engineering University of South Florida Major Professor: Srinivas Katkoori Ph.D Nagarajan Ranganathan, Ph.D Hao Zheng, Ph.D Sanjukta Bhanja, Ph.D. Stephen Suen, Ph.D Date of Approval: October 17 2008 Keywords: Evolutionary Algorithms, Multi objective VLSI Floorplanning, Hardware IP Core Implementation Evolvable Hardware Design, Extreme Environment Electronics Copyright 200 9 Pradeep Ruben Fernando

PAGE 2

D EDICATION To my family

PAGE 3

A CKNOWLEDGEMENTS I would like to thank Dr. Katkoori immensely for all his support and encouragement throughout the entire course of my research. I would like to thank all my committee members, Dr. Ranganathan, Dr. Zheng, Dr. Bhanja, and Dr. Suen, for agreeing to be on my committee and for their insightful comments. I would like to thank Dr. Keymeulen, Dr. Zebulum and Dr. Stoica of Jet Propulsion Labs for all their help while working on the Self Reconfigurable Electronics for Extreme Environments project. I would also like to thank the entire CSE technical support team consisting of Daniel, Peter, and Brian, for their qui ck responses to all my computer and software needs. I would like to specially thank all my friends and a ll my VLSI lab colleagues for their friendship, support and encouragement which helped me through m any hard times. Last but not least, I would like to thank my parents and my entire family for their unconditional emotional and financial support.

PAGE 4

i TABLE OF CONTENTS LIST OF TABLES iii LIST OF FIGURES i v ABSTRACT vi i C HAPTER 1 INTRODUCTION 1 1.1 Hardware Implementation Methodologies 2 1.2 Motivation 4 1.3 Contributions of the Dissertation 5 1. 4 Organization of the Dissertation 6 CHAPTER 2 GENETIC ALGORITHMS 7 2.1 Components of a GA B a s ed Optimization Engine 9 2.1.1 Individual Encoding 10 2.1.2 Fitness of an Individual 10 2.1.3 Selection Mechanism 11 2.1.4 Genetic Operators 1 1 2.1.5 Crossover Operators 12 2.1.5.1 One Point Crossover 12 2.1.5.2 Two Point Crossover 1 3 2.1.5.3 Uniform Crossover 1 4 2.2 Elitism in Genetic Algorithms 15 2.3 M ulti O bjective Genetic Algorithms 1 6 2.4 Summary 16 CHAPTER 3 MULTI OBJECTIVE GENETIC FLOORPLANNING FOR VLSI ASICS 1 7 3 .1 Multi objective Optimization 1 9 3 2 Fl oorplanning U sing Sequence Pair Rep resentation 2 2 3.2.1 Conversion f rom a Floorplan to a Sequence Pair 2 2 3.2.2 Conversion from a Sequence Pair to a Floorplan 23 3 .3 Related Work 24 3 .4 Proposed Multi O bjective Genetic Floorplanner 30 3 .5 Proposed Crossover Operators 3 8 3.5.1 Modified Two P oint Order B ased Crossover (MTOX) Operator 38 3.5.2 Heuristic One P oint Order B ased Crossover (HOOX) Operator 44 3 .6 Empirical Determination of GA Parameter Settings 50 3 .7 Experimental Results 54 3 7 .1 Performa nce of the Proposed G e netic Op erat o rs 5 6 3 7 .2 Comparisons A gainst State of the A rt Floorplanners 5 7 3.8 Summary 6 1

PAGE 5

ii CHAPTER 4 DESIGN OF A N FPGA BASED GENERAL PURPOSE GENETIC ALGORITHM IP CORE 62 4 .1 B ackground and R elated Work 64 4 .1. 1 Prior Work 64 4 .1. 2 Pseudo R andom Number Generation and GA Performance 69 4 .1.3 Basics of Evolvable Hardware 71 4 2 Proposed FPGA Based Genetic Algorithm IP Core 72 4.2.1 Implement ation and Interfacing Details 73 4.2.2 Design Considerations for ASIC Implemen t ation and Space Applications 7 9 4. 3 Experiment al Results 83 4.3.1 RT L evel Simulations 83 4.3.2 FPGA Implementation Results 90 4. 3.3 Runtime C omparison W ith Software Implementation 97 4.4 Summary 98 CHAPTER 5 A DIGITAL FRAMEWORK FO R EVOLUTIONARY DESIGN, AUTONOMOUS MONITORIN G AND PERFORMANCE COMPENSATION OF EXTREME ENVIRONMENT ELECTRONICS 9 9 5 .1 Related Work 100 5 2 Architecture and Implementation of Proposed Digital Framework 103 5 2.1 Overall System L evel Architecture and Operation 103 5 2.2 Detailed Architecture of the Proposed Framework 1 06 5 .3 Digital ASIC Implementation 1 17 5. 3.1 Design Methodology 118 5 4 Simulation s and Results 119 5.4.1 Layout L evel Simu lations 120 5.5 Summary 1 22 CHAPTER 6 CONCLUSIONS 1 23 R EFERENCES 1 25 ABOUT THE AUTHOR End Page

PAGE 6

iii LIST OF TABLES Table 3 1 Brief summary of relevant floorplanning works that simultaneously optimize a rea and wirelength 27 Tabl e 3 .2. Best area and total w irelength results obtained by the propose d genetic floorplanner for the n 300 benchmark for various crossover and mutation rates (A Area in sq.units, W Total Wirelength in units) 50 Table 3 3 A rea and wirelength c ompa risons with t he AdaptGA, QP LFF and P arquet f loorplanners (A = Area in mm 2 W = Total Wirelength in mm, %S = Percentage 5 5 Table 3 4 Area, wirelength and runtime results for the proposed genetic floorplanner on the GSRC benchmarks 5 9 Table 3.5 Percentage savings in Area and Wirelength of the Proposed Genetic Floorplanner compared against the PARQUET Floorplanner for the GSRC benchmarks 5 9 Table 4 .1. Review of existing li terature on FPGA implementation of a gener a l purpose genetic a lgorithm 6 5 Table 4 2 Port interface of the proposed GA core 7 6 Table 4.3. Index values of GA c 7 7 Table 4. 4 Preset modes available in the proposed GA co re 80 Table 4. 5 RT level simulation results obtained for the three test functions (BF6, F1, and F2) under various GA parameter settings 8 5 Table 4. 6 Post place and route statistics for the proposed GA core on Virtex II Pro device (xc2vp30 7ff896 ) 90 Table 4. 7 Best f itness values obtained by the GA for the mBF6_2 function for different parameter settings (XR = Crossover Rate) 93 Table 4.8. Best f itness values obtained by the GA for the mBF 7 function for different parameter settings (XR = Crossover Rate) 93 Table 4.9. Best f itness values obtained by the GA for the Shubert function for different parameter settings (XR = Crossover Rate) 94

PAGE 7

iv LIST OF FIGURES Figure 1.1. High level v iew of a typical design cycle for VLSI h ardware 2 F igure 2.1. High level view of a typical GA optimization cycle 9 Figure 2.2. Example illustrating single point crossover 13 Figure 2.3. Two point crossover on binary encoded chromosomes 14 Figure 2.4. Example i llustrating uniform crossover 15 Figure 3 1 P areto optimal solutions and n on domination levels in m ulti objective optimization 22 Figure 3 2. Gridding process to obtain the Sequence Pair corresponding to a floorplan 23 Figure 3 3. Pseudo code of the proposed hybrid e litist n on d ominated s orting GA based f loorplanner 31 Figure 3 .4. Computing crowding distance of a solution in the objective space 3 3 Figure 3 .5. Procedure to compute the crowding distances of solutions belonging to a non domination front 34 Figure 3 .6. Procedure for Slack_ Move1 us ed for l ocal o ptimization by the proposed floorplanner 3 7 Fig ure 3.7 Legality violations when using two point crossover on permutation based chromosomes 3 9 Fig ure 3. 8 Order crossover proposed for permutation based chromosomes 40 Figure 3.9 Generat ion of the first offs pring using the MTOX operator 42 Figure 3.1 0 Generation of sub floorplans using single cut point in the HOOX operator (steps 1 3 of Procedure HOOX) 4 6 Figure 3.1 1 Generation of offspring configurations in the HOOX operator (steps 4 5 of Procedure HOOX) 4 7 Figure 3.1 2 Convergence plot of w irelength for n300 benchmark with cxRate=0.8, mutRate=0.1 51

PAGE 8

v Figure 3.1 3 Convergence plot of a rea for n300 benchmark with cxRate=0.8, mutRate=0.1 52 Figure 3.1 4 Convergence plot of w irele ngth for n300 benchmark with cxRate=0.5, mutRate=0.1 52 Figure 3.15 Convergence plot of a rea for n300 benchmark with cxRate=0.5, mutRate=0.1 53 Figure 3.16 Convergence plot of w irelength for n300 benchmark with cxRate=0.5, mutRate=0.01 53 Figure 3.1 7 Convergence plot of a rea for n300 benchmark with cxRate=0.5, mutRate=0.01 54 Figure 3.18 Floorplan of ami33 benchmark with area=1.21 mm 2 and wirelength=35.43 mm 60 Figure 3.19 Floorplan of n100 GSRC benchmark with area = 205 758 sq.units and wirele ngth = 133 497 units 6 0 Figure 4.1. High level view of the implemented GA optimization cycle 74 Figure 4 .2 Typic al system showing the communication between the different modules and the GA core (signal numbers are in reference to Table 4.2 ) 7 9 Figure 4 .3 Implementation of a hybrid i ntrinsic EHW system (with internal and external fitness modules) using the proposed GA core 82 Figure 4 .4 ( Zoomed in) Pl ot of the modified b inary F6 [5 ] test function 84 Figure 4 .5 Convergence plot for the BF6 test funct ion using number of generations=32 Run #3 of Table 4.5 8 6 Figure 4 .6 Convergence plot for the BF6 test function with initial seed for RNG=1567 Run #4 of Table 4.5 8 6 Figure 4 .7 Convergence plot for the BF6 test function with crossover rate=0.75 R un #5 of Table 4.5 8 7 Figure 4.8 Convergence plot for the test func tion F2 with population size=32 Run #6 of Table 4.5 8 7 Figure 4.9 Convergence plot for the test function F2 with population size=64 Run #7 of Table 4.5 8 8 Figure 4.10 Convergence plot for the test function F3 with initial seed for RNG=10593 Run #9 of Table 4.5 8 8 Figure 4.11 Convergence plot for the test function F3 with initial seed for RNG=1567 Run #10 of Table 4.5 8 9

PAGE 9

vi Figure 4.12 Convergence plot for the test function mBF 6_2(x) with initial RNG seed=(061F) 16 crossover threshold=10, and popSize=64 (data collected from hardware execution) 9 5 Figure 4.13 Convergence plot for the test function mBF6_2(x) with initial RNG seed=(A0A0) 16 crossover threshold=10, and popSize=64 (data collected from hardware execution) 9 5 Figure 4.14 Convergence plot for the test function mBF7_2(x,y) with initial RNG seed=(AAAA) 16 crossover threshold=12, and popSize=64 (data collected from hardware execution) 9 6 Figure 4.15 Convergence plot f or the test function mShubert2D(x1,x2) with initial RNG seed=(AAAA) 16 crossover threshold=10, and popSize=64 (data collected from hardware execution) 9 6 Figure 5 1. Block diagram of overall system (including analog and digital systems) 104 Figure 5 .2. H igh level functional view of the proposed digital framework 106 Figure 5 .3. Architecture of the proposed digital framework 107 Figure 5 .4. FSM for the d igital f ramework c ontroller module 108 Figure 5 .5. FSM for the system m onitor module 111 Figure 5 .6. Chromosome encoding used by the GA based compensation algorithm 112 Figure 5 .7. Slew rate measurement by the internal fitness ev aluation module of the proposed digital framework 115 Figure 5 .8. FSM of th e FEM_Controller module in the internal fitness e v aluation module 116 Figure 5 .9. Design fl ow for digital ASIC implementation 118 Figure 5 .1 0 Layout level simulation of th e digital framework controller module 120 Figure 5.1 1 Layout level simulation of the system monitor module (in a monitoring cycle ) 121 Figure 5 .1 2 Layout level simulation of the system monitor module (in a compensation c ycle) 121 Figure 5 .1 3 Layout level simulation of the internal f itness e valuation m odule (containing FEM_Controller, e xcitation and slew rate modules) 122

PAGE 10

vii GENETIC ALGORITHM BA SED DESIGN AND OPTIM IZATION OF VLSI ASICS AND RECON FIGURABLE HARDWARE PRADEEP RUBEN FERNANDO ABSTRACT Rapid advances in integration technology have tremendously increased the design complexity of very large scale integrated ( VLS I ) circuits necessitating robust optimization techniques in many stages of VLSI design. A g enetic a lgorithm (GA) is a stochastic optimization technique that uses principles derived from the evolutionary process in nature In this work, genetic algorithms are used to alleviate the hardware design process of VLSI application specific integrated circuits ( ASICs ) and reconfigurable hardware. VLSI ASIC design suffers from high design complexity and a large nu mber of optimization objectives requiring h ierarchic al design approaches and multi objective optimization techniques T he floorplanning stage of the design cycle becomes highly important in hierarchical design methods In this work, a multi objective genetic algorithm based floorplanner has been developed w ith novel crossover operators to address the multi objective floorplanning problem for VLSI ASICs. The genetic floorplanner achieves significant wirelength savings (>19% on average) with little or no increase in area (<3% penalty) over previous floorplanne rs that perform simultaneous area and wirelength minimization Hardware implementation of g enetic a lgorithms is gaining importance because of their proven effective ness as optimization engine s for real time applications. Earlier hardware implementations s uffer from major drawbacks such as absence of GA parameter programmability,

PAGE 11

viii rigid pre defined system architecture, and lack of support for multiple fitness functions. A compact IP core that impleme nts a general purpose GA engine has been designed to realiz e evolvable hardware in fiel d programmable gate array devices. The designed GA core achieved a speedup of around 5.16 x over an analogous software implementation. Novel reconfigurable analog architectures have been proposed to realize extreme environment a n alog electronics In this work, a digital framework has been developed to realize self reconfigurable analog arrays (SRAA) where genetic algorithms are used to evolve the required analog functionality and compensate performance degradation in extreme envir onments The framework supports two methods of compensation namely, model based lookup and genetic algorithm based compensation and is scalable in terms of the number of fitness evaluation modules. The entire framework has been implemented as a digital AS IC in a leading industry strength silicon on insulator (SOI) technology to obtain high performance and a small f orm factor

PAGE 12

1 CHAPTER 1 INTRODUCTION Hardware solutions are required for many applications in a wide variety of problem domains Optimal design and implementation of hardware solutions is a complex and difficult process that is usually broken down into a sequence of smaller design stages Each of these design stages has well defined objectives to help the designe r concentrate on particular issues in each stage Figure 1.1 shows a high level abstract view of a typical design cycle for developing VLSI (Very Large Scale Integrated) hardware solutions. A brief outline of the design cycle is below: System Specificatio n : The input to the design cycle is a set of user specifications that details the functionality, performance, reliability, and other requirements of the system. Behavioral Modeling : In this stage, the behavior/ functionality of the entire system is modeled using a high level specification language such as C, VHDL, etc. Circuit Synthesis : In this stage, the minimal logic required to implement the system functionality is obtain ed. A minimal circuit description is synthesized to implement the optimized logic o f the entire system. Optimization techniques are required for obtaining minimized logic and circuit solutions of the system. Physical Design : In this stage, a geometrical description of the circuit is developed that satisfies all the user defined requireme nts including functionality, performance and reliability Efficient optimization techniques are needed for obtaining the geometrical circuit description that has the optimal values for multiple objectives such as area, performance, etc. Text page 1 sample: 1 inch top margin, 2 3 hard returns before title. A Left justify text

PAGE 13

2 Fabrication : The f inal hardware solution is obtained from the geometrical description of the entire system using the chosen hardware implementation methodology Figure 1.1 High level v iew of a typical design cycle for VLSI h ardware 1.1 Hardware Implementation Methodolog ies Hardware solutions can be implemented using different design methodologies such as ASICs (Application Specific Integrated Circuits) and FPGAs (Field Programmable Gate Arrays) The hardware design cycle will differ from the generic cycle shown in Figure 1.1 based on the choice of implementation methodology depend ing on the nature of the application. VLSI ASICs

PAGE 14

3 provide the highest design flexibility of all the implementation methodologies as they allow the user to decide the placement of all the logic gat e s and the interconnection s This allows the designer to achieve the best performance possible at the expense of long design turn arou nd time s and high costs. A pplications such as critical real time applications that require high performance, irrespective of the cost and design time involved will opt for the VLSI ASIC implementation methodology. FPGAs consist of arrays of programmable logic modules with pre wired interconnections between them. FPGAs are programmed to perform a logic function by first map ping the logic function on to a subset of the programmable logic modules and then programming switchboxes, which control the connectivity of the pre wired interconnections, to achieve the desired connectivity. FPGAs provide the lowest design flexibility as the placement of both the logic gates and the interconnections is fixed. Hardware design using FPGAs entails realizing the entire logic of the target application using the logic gates and interconnections that are available in the FPGA device But FPGAs p rovide the flexibility of changing the physical realization of the system just by changing the configuration of the logic gates and the switchboxes controlling the interconnections Although FPGAs suffer low performance due to the switchboxes in the interc onnection paths, they can be (re)programmed immediately with a new design resulting in the lowest design turn around times and a low cost. Thus, applications with reconfiguration needs and l owe r priority on system performance will opt for FPGA based implem entation The complexity of the VLSI hardware design process has also increased significantly with technological advances [1] With the advent of the nanometer era, more than a billion transistors are being fabricated on a single integrated circuit. Henc e the complexity of the designs being implemented has increased significant ly leading to entire systems being implemented on a single chip. To ease the difficulty of the entire design process, designers have resorted to a number of techniques throughout th e hardware design cycle. For easing the complexity of the behavioral

PAGE 15

4 modeling stage, designers partition the systems being designed into a number of smaller design units with well defined functionality so that they can be implemented using existing IP modu les Hence develo pment of efficient IP modules is becoming an important task, especially in FPGA based designs. 1.2 Motivation Irrespective of the hardware implementation choice, optimization is required at all the design le vels to obtain the best implem entation of the target application. Many of the optimization problems in the design stages are NP complete problems. Greedy heuristic s can be used to quickly obtain a solution for the NP complete problems. However greedy heuristics do not possess any mecha nism to escape out of locally optimal solutions. Hence they may result in solutions of very poor quality as they can get st uck in local optima Stochastic optimization methods such as simulated annealing and genetic algorithms make use of random moves to escape out of local optima. Given enough time, the stochastic optimization methods generally have the ability to reach the globally optimal solution. Although the need for a good global optimization technique in the hardware design process is apparent, the criticality of the optimization techniques will vary with the design levels and the type of hardware implementation In VLSI ASIC design, the circuit synthesis and physical design stages are the most crucial stag es as they play a significant role in obtai ning the best system implementation. In FPGA based hardware design, design flexibility and optimization opportunities are limited compared to full custom ASICs as the placement of logic cells and i nterconnections is fixed. Hence the FPGA design cycle is very sensitive to the high level model of the system Thus producing an accurate and succinct be havioral model of the system is very cr itical in FPGA based hardware design. This re iterates the necessity for well designed IP modules that can be re used by

PAGE 16

5 the designer during high level modeling. The availability of a n IP module library reduces the complexity of high l evel modeling With entire systems being implemented on the same FPGA, m any FPGA designs require stochastic optimization cores as part of the application itself. Since the requirements of the applications are widespread, a robust stochastic optimization technique that is easy to implement in hardware is needed. A genetic algorithm (GA) [ 2 5 ] is a stochastic optimization technique modeled on the theory of evolution in nature. It has been successfully employed for a wide variety of problems including NP complete problems such as the Traveling Salesman problem [ 6 ] real time problems such as reconfiguration of evolvable hardware [ 7 ] and other optim ization problem s that have complex constraints Genetic algorithms are easy to implement in software and hardware They can also be easily adapted to a wide variety of problems. They can make use of existing knowledge about a problem by incorporating succe ssful existing operators into their optimization framework The y are a multi agent optimization technique and henc e can be easily modified for multi objective optimization. 1. 3 Contributions of the Dissertation In this dissertation, genetic algorithm base d optimization is applied to the VLSI hardware design domain at various design levels: VLSI ASIC Design : A m ulti objective genetic floorplann er has been developed for simultaneous optimization of area and wirelength in VLSI ASIC s ( layout s). FPGA based Ha rdware Applications : A customizable general purpose GA IP core has been developed for FPGA based applications that need a stochastic optimization engine Extreme Environment Electronics : A genetic algorithm based VLSI ASIC has been developed to evolve ana log circuits, monitor their performance in extreme environments and compensate any performance degradation

PAGE 17

6 1.4 Organization of the Dissertation The remainder of this dissertation is organized as follows: Chapter 2 introduce s genetic algorithms in detail and discusses both the weighted sum and pareto front based multi objective genetic algorithms. Chapter 3 defines the problem of multi objective (area and wirelength) VLSI floorplanning and presents the genetic floorplanner developed to s olve the problem Chapter 4 discusses the development of a hardware IP core for a general purpose genetic algorithm that can be used as a stochastic optimization engine for a wide variety of applications. Chapter 5 describes the design and development of a digital framework for evolution, autonomous monitoring and automatic compensation of e xtreme environment electronics. Chapter 6 draws conclusions from the research work of this dissertation.

PAGE 18

7 CHAPTER 2 GENETIC ALGORITHMS A genetic algorithm is a stochastic optimiza tion technique inspired by the principles of evolution. John Holland proposed the first simulated evolution algorithm [2] that mimicked the evolutionary process in nature. Since then, g enetic algorithms have been developed to successfully a ddress many dif ferent problems including combinatorial optimization problems [ 6 ], real time problems [ 7 ], and problems with unknown characteristics [ 3 ]. The phenomenon of evolution in nature refers to the process by which living organisms change their physical character istics, referred to as their phenotype, in response to the changing environment. According to the theory of genetics, the physical characteristics of all organisms are based on the makeup of cellular structures called chromosomes. The chromosome structure is a collection of smaller organic structures called genes. The structure of each gene is directly responsible for a particular physical characteristic of the organism. All living organisms share the finite amount of resources present in this world. The s urvival of an individual is dependent on it s ability to compete with other individuals for these limited resources. The physical characteristics of the individual that help it to compete with other individua ls are directly dependent on its genetic makeup. W hen individuals re produce in nature the chromosomes of the parents combine in a random manner to create a new chromosome for the genes (and hence physical characteristics) only from the gene pool b elonging to its parents and their ancest ors If the offspring chromosome obtained a good mixture of its parents genes, then its ability to compete for resources will be Text page 1 sample: 1 inch top margin, 2 3 hard returns before title. Left justify text

PAGE 19

8 good, leading to an increased period of survival Moreover fitter individuals have a higher probab i l it y of produc ing fitter offspring and preserving the genetic makeup of that species. O ffspring that get s a poor mixture of its parents chromosomes will not be able to compete with fitter individuals and will die soon. In the long run, speci es with such a poor chromosomal makeup will become extinct. This is the principle of survival of the fittest in nature. A genetic algorithm mimics this natural evolutionary process in its optimization cycle. Figure 2.1 shows a high level abstract view of a typical GA optimization cycle. It models the problem at hand as an ecosystem with limited resources. It models the solutions to the problem as chromosomes of individuals that are competing for the limited available resources. The solutions that have bett er values for the objective functions relevant to the problem are deemed to be fitter individuals. Thus, e ach individual is assigned a fitness measure that denotes the quality of the solution that it encodes. A genetic algorithm begins its optimization cyc le with an initial population of randomly generated individuals. Genetic algori thms evolve the population during each generation using genetic op erators such as cr ossover and mutation. In each generation, highly fit individuals from the current generation are selected as parents to produce new individuals for one or more offspring using an operator called crossover. This operator mimics the mating process in nat ure. The offspring produced by the crossover operator replace s individuals with the worst fitness in the current population. This ensures that the average fitness of the population increases over generation s as the fitter individuals survive through the ge nerations. To prevent premature convergence to the best individual in the initial population, an operator called mutation introduces random changes into the population. After a pre specified number of generations, the best individual found in the entire ge netic optimization cycle is output as the solution to the problem.

PAGE 20

9 Fig ure 2.1 High level view of a typical GA optimization cycle 2.1 Components of a GA B ased Optimization Engine A simple genetic algorithm based optimizer is characterized by the followin g components:

PAGE 21

10 Individual e ncoding Individual fitness S election mechanism, and G enetic operators. 2. 1 1 Individual Encoding G enetic algorithms encode solutions to the given problem as chromosomal strings and operate on these encoding s during the optimizat ion process. This h elps minimize the amount of problem specific information needed during the optimization process of a genetic algorithm An encoding scheme that map s each chromosome string to a uni que solution is preferred as the genetic algo rithm will n ot waste time evaluating multiple encodings of the same solution The chosen encoding scheme should also be easy to decode using minimal time and memory resources. For example, for a traveling salesman problem (TSP), a permutation of all the cities in the problem instance can be used a solution encoding scheme. 2.1.2 Fitness of an Individual A fitness measure is needed to evaluate e ach chromosome encountered by the GA optimizer The fitness measure of the chromosome should reflect the quality of the corres ponding solution to the problem. For example, in a TSP instance, the length of the overall tour represented by the permutation encoding could be assigned as the fitness measure. The genetic algorithm then works on finding the permutation encoding that resu lts in a minimal tour length for the given problem instance. Since genetic algorithms follow natural selection, the fitness value of an individual will decide whether or not it will survive through the generations. The fitness of an individual may also dec ide whether or not the individual will be chosen to participate in genetic operations.

PAGE 22

11 2. 1 3 Selection Mechanism The selection m echanism i s the scheme used by a genetic a lgorithm to select two ind ividuals for crossover (mating) The purpose of these opera tions is to allow substrings in the fit individuals in a popul ation to survive for many generations. Hence, the parent individuals for these operations are generally selected based on their fitness values. This will promote survival of fitter genes in the offspring and should lead to fitter individuals in the future generations. Many selection mechanisms exist including proportionate selection, roulette wheel selection, and rank based selection. The selection mechanism is chosen based on the nature of the p roblem being optimized and other factors including computation time and memory requirements. Two most effective parent selection schemes [3] are described below: Roulette Wheel Selection In this scheme the probability that an individual will be chosen as a parent for the current crossover operation is equal to the proportion of the as compared to the total fitness value of the current population. The proportion of offspring produced in this scheme by a fit individual as compared to the offspring by a less fit individual will be proportional to the ratio of their corresponding fitnesses. Tournament Selection In this scheme n individuals are randomly chosen from the current population to play in a tournament against each other. The winner of the tournament is selected as the parent. A tournament involving n individuals is called an n way tournament. 2. 1 4 Genetic Operators Genetic algorithms use two kinds of genetic operators called crossover and mutation. The crossover operator per forms a probabilistic exchange of chromosomal information between t wo individuals to produce a new individual. The crossover operator selects two parent individuals

PAGE 23

12 from the population based on a selection scheme. It then produces an offspring individual b y using certain information from the first parent and the rest of the information from the second parent. Thus the offspring individual inherits a subset of properties from both of its parents. The mutation operator typically picks a random individual fro m the population and performs an inversion or some other random operation on the individual chromosome. After a certain number of generations, the crossover operator tends to produce offspring that are very similar to the parent individuals. The n the mutat ion operator plays a critical role in restoring lost genetic material or providing diversity in the c urrent population Thus the mutation operator helps prevent conver gence to local optima 2. 1 5 Crossover Operators Traditional g enetic a lgorithms worked o n binary encodings of the problem instances. T hese genetic algorithms use simple c rossover operators such as the u niform c rossover and the o ne p oint c rossover to produce the offspring individual. 2.1. 5 .1 One Point Crossover The one point c rossover operator randomly generates a single cut point on both the parent offspring is produced by concatenating the first part of the first and t chromosome A second offspring m illustrates the one point crossover.

PAGE 24

13 Figure 2 .2. Example illustrating single point crossover 2. 1.5 2 Two Point Crossover The one point c rossover operator cannot produce offspring that contain s discontinuous fragments of its parents. To provide more flexibility to producing all combinations of offspring, the two point crossover was proposed [3]. The two point crossover operator randomly chooses two cut points that split both the parents into three parts each. Two offspring are produced from the two parents as shown in Fig ure 2.3 The first offspring is formed by concatenating the first part of the fir st parent, the second part of the second parent, and the thir d part of the first parent The second offspring is formed by concatenating the first part of the second parent, the second part of the first parent, and the third part of the second parent.

PAGE 25

14 Fig ure 2.3. Two point crossover on binary encoded chromosomes 2.1.5 .3 Uniform Crossover The u niform c rossover operator offers the most flexibility in obtaining all combinations of the parent individuals in the offspring by assigning e qual probability to bo th the parents to pass each of their genes to the offspring. In a simple implementation of the uniform crossover operator, a random If the bit value at a position i o f the string is 0, then the gene at position i of the offspring chromosome is filled with the i th gene from the first parent. If the value was 1, then the i th gene from the second parent will be used. Uniform crossover is a very disruptive operator [4 ] and tends to lose important genetic information after a certain number of generations. An example of the uniform crossover operator is shown in Figure 2. 4

PAGE 26

15 Figure 2.4. Example illustrating uniform crossover 2. 2 Elitism in Genetic Algorithms An elitist gen etic algorithm preserves some of the best (elite) solutions of the current generation into the next generation. The number of elite solutions preserved in the following generation is called the degree of elitism. Rudolph [ 8 ] proved that an elitist GA will converge to the global optimal solution in a single objective optimization problem where the objective is a real valued function. Rudolph [ 9 ] also proved that an elitist evolutionary algorithm will converge to the pareto optimal front of a multi criterion optimization problem where all the objectives are real valued functions. Intuitively, preserving the elite individuals increases the probability of generating better offspring in each generation thus leading to faster convergence towards the optimal front.

PAGE 27

16 2. 3 Multi Objective Genetic Algorithm s Many different templates, both elitist and non elitist, have been proposed for mult i objective optimization using genetic a lgorithms [ 10 ]. Vector evaluated genetic a lgorithms (VEGA) [11 ] was one of the first multi o bjective genetic algorithms. VEGA, a non elitist multi objective GA, used a vector of objective function values instead of a scalar weighted sum fitness value. Later, other non elitist GA templates such as the Non Dominated Sorting GA [1 2] and the Niched P areto GA [13 ] were proposed. E litist multi objective GAs such as the Strength Pareto E volutionary Algorithm (SPEA) [14 ] and the Non Dominated S orting based GA II (NSGA II) [15 ] outperformed the non elitist multi objective GAs. 2. 4 Summary Genetic a lgorithm s have been extensively researched and applied to a wide variety of single objective and multi objective optimization applications. In this dissertation, the application of genetic algorithms to three VLSI design problems is investigated.

PAGE 28

17 CHAPTER 3 MULTI OBJECTIVE GENETIC FLOORPLANNING FOR VLSI ASICS The continuous technology scaling over the years has lead to the nanometer era in VLSI design The 200 7 edition of the ITRS (International Technology Roadmap for Semiconductors) [1] predicts that transi stor lengths will scale down to 1 0 nanometers by 201 5 This implies that the number of transistors that can be packed into the same area of an integrated circuit will increase tremendously. This results in a significant increase in the complexity of the sy stems that can be fabricated on a single chip. As of 2008 entire systems are being fabricated on a single chip under the System o n a Chip (SOC) paradigm. This high design complexity affects the circuit synthesis phase of VLSI ASICs significantly since tec hnology scaling introduces new issues regarding performance and reliability. To handle the high complexity of designing VLSI ASICs, designers utilize hierarchical design methods so that only portions of the entire design have to be considered at any one t ime. Also, designers are i ncreased ly identifying components in the target application that can be mapped to pre defined IP (Intellectual Property ) modules which have been previously optimized. With this trend in VLSI design, the floorplanning phase of the VLSI physical design cycle has become a critical step as it has a major influence on interconnect issues including wiring congestion crosstalk, and performance. These issues are the bottleneck to realizing the full potential of the technology improvement s [ 1 ] Traditional floorplann ers work on optimizing either the bounding box area or the total wirelength of the floorplan. Current VLSI floorplanners must include multiple metrics in their objective function including interconnect metrics such as total Text page 1 sample: 1 inch top margin, 2 3 hard returns before title. Left justify text

PAGE 29

18 wir elength and longest path delay in the objective function in addition to the area metrics Hence an optimization engine is necessary for the floorplanning problem that can handle multiple metrics. Genetic a lgorithms are a natural choice for multi objective optimization as they maintain a population of individuals at all times and can be easily modified to per for m multi objective optimization [ 10 ] In this work, a multi objective genetic algorithm is proposed for the outline free, macro cell based VLSI floorp lanning problem that simultaneously opti mizes both area and total wirelength. The main contributions of this work are: A n NSGA II (Elitist Non dominated Sorting based Genetic Algorithm) based multi objective genetic floorplanner that performs simultaneous minimization of area and wirelength, A novel heuristic crossover operator (HOOX) that promotes the multiplication of good sub floorplans and A new crossover operator (MTOX) that is a novel combination of the classical two point crossover operator and ord er crossover operator. The proposed multi objective GA based floorplanner is one of very few floorplanners [ 38 ] [ 39 ] to use non domination concepts to rank floorplan solutions. The efficiency of the proposed method is demonstrated by the wirelength saving s obtained with marginal or no area penalty. The proposed floorplanner obtained an average wirelength savings of 25.3% over an existing genetic floorplanner [ 29 ] that uses the same Sequence Pair encoding for its floorplan solutions, illustrating t he effici ency of the proposed crossover operators. The proposed floorplanner obtained 26.9 % average wirelength savings over a Quadratic Programming based floorplanner [ 24 ] for the MCNC benchmarks for a 2.63% average increase in area. The proposed floorplanner obtai ned an average savings of 19. 17 % in wirelength and 1 37 5% in area over a simulated annealing based floorplanner [ 21 ], averaged over both the MCNC and GSRC benchmarks To the best of the

PAGE 30

19 s knowledge, the results reported here are the best results com pared to those reported in the literature on simultaneous area and wirelength optimization during outline free floorplanning for the MCNC and GSRC benchmark s uites 3.1 Multi objective Optimization Multi objective optimization is the process of finding a set of solutions that are optimal in terms of more than one conflicting objectives. If the multiple o bjectives are non conflicting in nature then the problem is the same as a single objective optimization problem. This is due to the fact that there exists exactly one optimal solution for such a problem that can be found by optimizing any one of the objectives For problems that require multiple conflicting objectives to be optimized, there exists a set of globally optimal solutions called the pareto optima l set. Domination A solution is said to dominate another solution denoted as p q if and only if i n }, f i ( p f i ( q ), and j n }: f j ( p ) < f j ( q ), where f i is the i th objective function under consideration and S is the solution space for the problem under consideration. W ithout any loss of generality all the n objective functions are assum ed to be minimizati on functions in the above definition Non dominated Set A set of solutions P Q is called a non dominated set if p P q Q q does not dominate p All the solutions p P in the above definition, are said to lie in the same no n domination level or front. When Q = S the solutions in set P are not dominated by any other solution in S and are said to lie in the non domination level zero. In this case, the solutions in P dominate all the other solutions in S and the set P is called the global pareto optimal set. Traditional optimization techniques transform the multi objective optimization problem into a single objective optimization problem due to a dearth of solution methodologies that are

PAGE 31

20 directly applicable to the multi objectiv e optimization problem. Consider the multi objectiv e optimization problem in Equation 3 .1 .1 (3.1 .1 ) where f i i n } form the set of objective functions to b e minimized. In the traditional single objective optimization met hod, the problem would be transformed into (3. 1. 2) where is a weight vector provided by the user to specify the preferences among the objectives, and is a normalization vector used to transform the range of values that each objective function can take. In general, m i is set to the maximum value of the corresponding objective function f i One of the major disadvantages of the single normalized weighted sum (SNWS) approach is that finding the correct weight vector to be used and predicting the maximum value or even a tight upper bound for some objective functions might be non trivial. For instance, normalizing the wirelengths of floorplans is difficult as a tight upper boun d for the total wirelength of circuits does not exist. In such a case, the preference given to the objective functions by the user assigned weight vector ( ) may result in an undesirable bias towards a particular objective. The we ighted single objective transformation will bias search engines such as Simulated Annealing towards a particular solution of the pareto optimal set. This might lead to rejection of nearby solutions that are closer to the global pareto optimal set causing i ncreased difficulties in finding the desired solution. In fact, a simulated annealing based optimization engine will only consider solutions in a small subset of the entire solution space based on the user assigned preference vector. Hence even a minute di screpancy in the user assigned preference vector might lead to large sub optimalities.

PAGE 32

21 Another major disadvantage of the single normalized weighted sum approach to multi objective optimization is that it can yield only one of the multiple pareto optimal so lutions. A more direct approach to multi objective optimization is to find the pareto optimal set and then allow the user to choose one solution from this set. Consider an optimization problem with two conflicting objectives. Fig ure 3. 1 plots the solution space (with six solutions numbered 1 6) of such an optimization problem with two conflicting objectives, namely area and wirelength, both of which have to be minimized. It is clear from Fig ure 3. 1 that the solutions numbered 1 and 2 dominate all the other solutions. But solution 1 cannot be deemed better than solution 2 as solution 1 is better than solution 2 only in terms of area but not in terms of wirelength. Hence both solutions belong to the global pareto optimal front and are assigned a non dominatio n rank of zero. If solutions 1 and 2 are removed from consideration, then the solutions numbered 3 and 4 are better than solutions 5 and 6 in terms of both objectives. But solution 3 is better than solution 4 only in terms of wirelength. Hence solutions 3 and 4 form the next local pareto optimal front and are assigned a non domination rank of 1. Similarly, solutions 5 and 6 are assigned a non domination rank of 2.

PAGE 33

22 Figure 3.1 Pareto optimal solutions and non domination levels in multi objective o ptimiza tion Note: All values in generic units. 3.2 Floorplanning using S equence Pair Representation A Sequence Pair [ 16 of the n module indices. Without loss of generality, it can be assumed tha t all the indices of the n modules belong to the set, P n }. 3.2.1 Conversion from a Floorplan to a Sequence Pair The Sequence Pair representation is obtained from its floorplan by a process called gridding tended to the chip boundary using upward (downward) and left (right) extensions without any intersections with the loci of the other the SW NE diagonals of all the modules and ordering the loci of these diagonals from left to right.

PAGE 34

23 ) is obtained by extending the NW SE diagonals of all the modules and ordering these loci from left to right. Figure 3.2 illustrates this gridding process with an example. Fig ure 3.2. Gridding process to obtain the Sequence Pair corresponding to a floorplan 3.2.2 Conversion f rom a S equence P air to a F loorplan The geometric information of the modules can be obtained from the relative order of the modules in the two sequences. Consider two modules i and j i j = i j i is to the left of module j in the floorplan corresponding to this Sequence Pair j i i j i is below module j The geometric relation between any two modules in the floorplan can be obtained using these two rules. The actual co ordinates of the modules can be obtained in O( n 2 ) time by constructing horizontal and vertical constraint graphs based on the horizontal and vertical relations between the modules, where n is the number of modules in the floorplan. Tang et al [ 17 ] proposed a faster algorithm to obtain module co ordinates using longest common subsequence s. A Sequence Pair is characterized by the following two properties:

PAGE 35

24 Property 1 (Existence): n } must be present in both the sequences. If Q1={x : x } then Q1 P = P and Q2 P = P. Property 2 (Uniqueness): Each module must be present exactly once in both the sequences, i.e. A legal Sequence Pair ) is one that satisfies both existence and uniqueness properties. 3.3 Related Work VLSI f loorplanning is a well studied problem for which a variety of optimization technique s have been applied including simulated a nnealing [1 7 2 2 ] m athematical p rogramming [2 3 2 4 ] and genetic a lgorithms [2 5 31 ] Early floorplanners dealt with area optimization alone. But with the advent of the deep sub micron regime, floorplanners shifted their focus to optimizing wirelength. But if wirelength is the only objective to be optimized, the resulting floorplan will have a lot of unused space. Hence, some floorplanners attempted to optimize both area and wirelength. Table 3.1 gives a brief summary of some of the recent floorplanning works in the literature that simultaneously optimize floorplan area and total wirelength. In the single normalized weighted sum (SNWS) approach to multi objective optimization simul taneous optimization of two objectives implies that the optimizer uses equal weights to multiply the normalized objectives before adding them together to obtain the single normalized weighted sum. It is to be noted that numerous floorplanning works exist i n the literature that work either on area optimization alone or wirelength optimization. T hese

PAGE 36

25 works will not be discussed here as the focus of this dissertation is on simultaneous optimization of area and wirelength. Classical (outline free) floorplanner s based on Simulated Annealin g [1 7 22 ] use the single normalized weighted sum approach to optimize the two objectives, namely area and total wirelength. These SA based floorplanners differ only in the d ata structure, Sequence Pair [17,18] O tree [19 ] or Transitive Closure Graph [20 ], used to represent the floorplans. Such floorplanners form a single scalar objective function using the two normalized objectives and the user defined weights for each objective as shown in Equation 3.3 .1 below. (3.3 .1 ) Mathematical programming based floorplanners use a wirelength estimate as the objective function to be minimized with a constraint on the floorplan area. Kim and Kim [2 3 ] proposed a linear programming approach to optimize area and wirelength simultaneously. Their approach uses a linear program with area constraints to optimize wirelength followed by a low temperature annealing process using the single normalized weighted sum approach to improve the solution quality Sheqin et al [2 4 ] proposed a q uadratic programming based f loorplanner to optimize wirelength followed by a deterministic algorithm b ased on Less Flexibility First (LFF) p rinciples to produce the final floorplan Genetic algorithms were first proposed for circuit placeme nt by Cohoon and Paris [25] The first genetic algorithm for floorplanning was proposed by Cohoon et al [26] and used Normalized Polish expressions to represent floorplans. Later, many other genetic floorplanners [27 31] were proposed that developed novel crossover techniques for different floorplan representation schemes Esbensen [27] proposed a genetic macrocell placer for a binary tree based representation. Valenzuela and Wang [31] also proposed a GA for floorplan area optimization that uses normalized polish expression representation. Hatta et al [28] proposed the first genetic

PAGE 37

26 floorplanner based on the Sequence Pair representation that optimized floorplan area Their genetic floorplanner use d two new crossover operators namely One point Partially Matc hed Crossover (OPX ) and Uniform Partially Matched Crossover ( UPX) Hatta et al combined the well known Partially Matched Crossover [3] (meant for permutation based chromosomes) with two binary crossover operators to formulate two new crossover operators (O PX and UPX) that could handle the Sequence Pair floorplan representation. OPX is a combination of the one point crossover and the Partially Matched Crossover. On the other hand, UPX is a combination of the uniform crossover and the Partially Matched Crosso ver operators Nakaya et al [29] later proposed a genetic floorplanner that also used the Sequence Pair floorplan representation. Their work use d two novel crossover operators Common Topology Preserving Crossover (CTPX ) and Placement based Partially Exch anging Crossover ( PPEX) that were also specifically designed to work on Sequence Pair s. CTPX computes the longest common subsequences between the Sequence Pairs of the two parents and preserves it in the offspring. PPEX randomly selects a sub floorplan fro m one parent and orders the modules making up the sub floorplan according to their relative positions in the other parent. Nakaya et al report both area and wirelength results obtained by their genetic floorplanner on the MCNC benchmarks but do not report the results for simultaneous area and wirelength optimization

PAGE 38

27 Table 3.1. Brief summary of relevant floorplanning works that simultaneously optimize area and wirelength Optimization Technique Work Optimization Objectives Floorplan Representation Remark s Area Wire Both Simulated Annealing Parquet [21 ] x x x Sequence Pair/B* tree SNWS approach Guo et al [18 ] x x x O tree SNWS approach Lin and Chang [19 ] x x x TCG SNWS approach Lin and Chang [20 ] x x possible TCG S SNWS approach Linear Progra mming Kim and Kim [23 ] x Quadratic Programming Sheqin et al [ 24 ] x Genetic Algorithms Chatterjee and Manikas [38 ] x (+ temp) Sequence Pair Multi Objective GA (SPEA) for optimizing area and temperature Chat terjee, Manikas, and Mark ov [39 ] x Sequence Pair Multi Objective GA (SPEA) for optimizing area, wirelength and temperature This work x Sequence Pair Multi Objective GA (NSGA II) for optimizing area and total wirelength

PAGE 39

28 Modern VLSI floorp lanning, as defined by Kahng [3 2 ], focuses on wirelength optimization within a fixed chip outline. With the chip complexity increasing with the improving integration technology, hierarchical design methods have become imperative. In a hierarchical design flow, floorplanning at the topmo st level might have a flexible chip outline. But the floorplans for the modules of the higher levels will fix the floorplan outline for the lower level sub modules. This has led to an increased importance for the modern fixed outline floorplanning problem. It is to be noted that in modern floorplanning, wirelength is the primary objective while area is no longer an objective but rather a constraint. There have been many fixed outline floorplanners p roposed in the literature [21, 33 35 ]. We will not review t he fixed outline floorplanning works here as this work focuses on outline free floorplanning. However, we will review a publicly a vailable tool called Parquet [21 2 2 ] as it is capable of both fixed outline and outline free floorplanning. Adya and Markov [21 ] proposed a novel simulated annealing based hybrid floorplanning tool called Parquet that is capable of both fixed and flexible outline floorplanning. Parquet uses the single normalized weighted sum approach during simulated annealing based optimizatio n but also has some heuristic operators that drive the optimization engine towards solutions that obey the fixed outline constraint. They proposed the notion of slack of a module in Sequence Pair based floorplanning [21 ]. Generally, all the modules in a fl oorplan are compacted to the bottom left corner of the space that they can occupy. To compute slack all the modules are additionally compacted to the top right corners in the floorplan T he slacks of a module i are computed as: (3. 3.2 ) (3. 3.3 ) where xc _topRight(i) yc_topRight(i) are the x and y coordinates of the lower left corner of module i when the floorplan (i.e., all its modules) is compacted to the top right corner, and

PAGE 40

29 xc _botLeft(i) yc_ b otLeft (i) are the x and y coordinates of the lower left corner of module i when the floorplan is compacted to the bottom left corner. Adya and Markov [21] used the slack values of modules to estimate the amount of empty space around them. For instance, an X slack value of zero for a module implies that it cannot be moved in the horizontal direction without altering the floorplan dimensions. A large Y slack value for a module implies that there exists a lot of space for the module to move in the vertical direction and this empty space can be reduced by moving a small block that could fit in the empty space. Based on similar observations, Adya and Markov proposed novel slack based heuristic operators to reduce the area and wirelength of the floorplan. Addit ionally, they used some of these operators to bias the simulated annealing engine to search for floorplans that obeyed the fixed outline constraint. Multi objective genetic algorithms and non domination based solution ranking concepts have been successfull y used for various problems belonging to different domains. In t he VLSI domain, Dick and Jha [37 ] proposed a multi objective hardware software co synt hesis tool. Esbensen and Kuh [36 ] proposed a non domination based solution ranking scheme for IC and MCM p objective genetic algorithm for the floorplanning problem, were propos ed by Chatterjee and Manikas [38 ], and Cha tterjee, Ma n i kas, and Markov [39 ]. These works use the SP EA [14 ] multi objective GA template. Chatterjee and Manikas [3 8 ] work on simultaneous optimization of chip area and maximum on chip temperature and do not consider wirelength as an objective or constraint. Ch atterjee, Manikas and Markov [39 ] use the SPEA multi objective GA template to optimize wirelength and temperature for both the fixed outline and classical floorplanning problems. Both these works do not propose any new crossover or mutation operators. They apply the SPEA multi objective GA to the floor planning problem and use traditional genetic operators

PAGE 41

30 Of the numerous floorplanning works in the literature, very few floorplanners work explicitly on simultaneous area and wirelength optimization. Among these floorplanners, all SA based and most GA ba sed floorplanners use the single normalized weighted sum m ethodology to perform multi objective optimization by assign ing equal weights to the area and wirelength objectives. The proposed floorplanner works explicitly on simultaneous optimization of area a nd wirelength using the Elitist Non dominated Sorting based Genetic Algorithm. It is to be noted that the proposed floorplanner can be easily extended to perform fixed outline floorplanning by incorporating a penalty function or by using a modified fitness assignment. 3.4 Proposed Multi objective Genetic Floorplanner The proposed floorplanner is an elitist non dominated s orting based multi objective genetic a lgorithm employing two novel crossover operators a set of mutation operators and a local optimiza tion operator The pseudo code of the proposed genetic floorplanner is shown in Fig ure 3.3 The following sections describe the various features of the proposed genetic floorplanner in detail: Individual Representation Each individual in the population corresponds to a valid floorplan represented by a legal Sequence Pair Sequence Pair s represent floorplan s using two permutations ) of the module indices. In this work, all modules are considered to be fixed modules in terms of their width and height but rotation of modules (by 0 or 90 degrees) are allowed This rotation of modules is represented using a Boolean orientation v cell floorplanning benchmark suite do not contain pin information for the modules. In this case, other orientations and mirroring of modules will not affect the wirelength of the nets to which the module i s connected. Thus the chromosomal encoding of an individual consists of two sequences namely, X sequence ), and the

PAGE 42

31 Fig ure 3.3. Pseudo code of the proposed hybrid e litist n on d ominated s orting GA based f loor planner Initial Population The proposed genetic floorplanner starts with an initial population of randomly generated Sequence Pair s and orientation vectors (line1 of procedure in Fig ure 3. 3) The size of the population used by the proposed genetic floorp lanner is not fixed. Empirical studies (to be discussed in Section 3.6 ) with a training set of benchmark circuits from both the MCNC and GSRC benchmark suites were conducted to find the values for the GA parameters that give the best results. Based on the results of the empirical studies Equation 3 .3.4 has been derived to determine the GA population size based on the problem size, i.e. number of modules ( n ). This equation was then applied to size the Proposed Multi Objective Genetic Floorplanner ( cx_rate, mut_rate, N generations, popSize, tourneySize ) { 1. population pulation ( popSize ); 2. for gen in N_generations do { 3. EliteSet Dominated Sort ( population, Area, Wire ); 4. MatePool population ); 5. for i in 1 to cx_rate do { 6. ( P1, P2 MatePoo l, tourneySize ); 7. ( Off1, Off2 P1, P2 ); 8. } 9. for i in 1 to mut_rate do { 10. mutIndex Elite Individual ( population ); 11. Mutate ( mutIndex ); 12. Update Population ( population, mutIndex ); 13. } 14. Perform Local Optimization ( offspring ); 15. Update Population ( population, offspring ); 16. } 17. return (Elite Set ( population )); }

PAGE 43

32 population by the proposed floorplanner for all the ben chmarks. To the best of the according to the number of modules ( n ) present in the benchmark circuit. (3. 3.4 ) Non dominated Sorting of t he Population The proposed floorplanner sorts the entire population into various non domination levels (described in the Section 3.1 ) in terms of area and total wirelength at the beginning of every generation (line 3 of procedure in Figure 3.3) All the individuals in the current population are assigned a non domination rank starting from zero The fittest individuals have the least non domination ran ks Elite Individ uals The number of elite individuals in the current population of the proposed GA vari es with each generation. All the individuals with the lowest non domination rank (zero) form the current set of elite individuals. These individuals are not subject to mutation. Thus individuals containing gene tic information contributing to reduced area or wirelength will be preserved for the future generations. Mating Pool Selection In every generation the proposed floorplanner selects a pool of individuals to use as parents for the crossover operations in that generation (line 4 of procedure in Fig ur e 3. 3 The size of this pool is set to half the population size as recommended in [5 ] If the number of elite individuals is greater than this size, then the elite individuals with the largest crowding distance [10] [15 ] are picked to form the mating pool to ensure that a diverse set of individuals are maintained in the population Crowding Distance Crowding Distance ( d i ) of a solution s i is defined as the distance between the solutions s i 1 and s i+1 belonging to the same non domination front as the solu tion s i and are immediate neighbors to the solution s i Crowding Distance can be measured in either the solution encoding space or the objective function space. The objective space in this work is two dimensional as f loorplan area a nd total wirelength are

PAGE 44

33 the objectives considered E ach floorplan solution can be considered as a point in this two dimensional objective space as shown in Fig ure 3. 4. In the proposed GA, the crowding distance of a solution is measured in the objective space. Since the objective space is two dimensional, the procedure in Figure 3.5 can be used to assign crowding distances to solutions. In the proposed GA, all the individuals in the population are assigned a crowding distance in addition to the non domination ranks. The crowding di stance of an individual is a measure of the proximity of neighboring solutions in the current population. In multi objective genetic algorithms, premature convergence of a non dominated front to a small section of the actual front must be prevented. This c an be achieved by preserving solutions that do not have close neighboring solutions in the current population. Fig ure 3 4. Computing crowding distance of a solution in the objective space Note: All values are in generic units.

PAGE 45

34 Figure 3.5. Procedure to compute the crowding distances of solutions belonging to a non domination front Parent Selection The proposed GA uses crowded tournament selection [10] [15 ] to select the two parents for crossover from the mating pool of indiv iduals. In tournament selection, a group of parent candidates are selected randomly from the mating pool. A tournament is played between these candidate individuals to determine the fittest two individuals of the group. In crowded tournament selection, in dividuals are first compared according to their non domination ranks. The individual with the smaller non domination rank is considered the winner. If the individuals being compared have the same non domination rank, then the individual with the largest cr owding distance is declared the winner. The proposed crowded tournament selection. Since the population size is set to 10* n the size of the parent candidate pool is n /10. Crossover The crossover operator (lines 5 8 of the procedure in Fig ure 3. 3) is used by genetic algorithms to combine good traits from the parents to form highly fit offspring. A good crossover operator must also ensure that the offspring does not closely resemble either parent. These properties ensure that the crossover operator explores different promising areas in the solution space. The crossover rate ( cx_rate ) limits the amount of Procedure Compute_CrowingDistance ( front, frontsize ) 1. for i in 0 to frontsize 1 do a. d[ i ] = 0; 2. front = Sort ( front frontsize area); 3. d[0] = d[ frontsize 1 ] = INF; 4. for j in 1 to frontsize 2 do a. d[ j ] = (area[ j+1 ] area[ j 1 ]) + (wire[ j 1 ] wire[ j+1 ]); 5. return;

PAGE 46

35 crossover performed in any gener ation. This value was set to 1.0 after empirical studies that will be described in Section 3.6 Two new crossover oper ators are proposed to work on Sequence Pair s, which are described in detail in Section 3.5. Mutation Operators The mutation operator (lines 9 13 of the procedure in Fig ure 3. 3) is used by genetic algorithms to produce diversity in the population. It also helps in avoiding a quick convergence to local optima. In elitist genetic algorithms, mutation is not applied to the elite individuals. Three mutation operators are used in the proposed genetic floorplanner. A non elite individual is randomly chosen from the population and one of the three mutation operators is applied to the individual. Each of the mutation operators has an equal selection probability. The mutation rate ( mut_rate ) limits the number of mutation operations performed in any generation. A mutation rate of 0.1 gave the best results in empirical studies and was maintained for all the experiments. The three mutation operators used in the proposed floorplanner ar e described below. Mutation Operator 1 ( Random module p sequences) This mutation operator picks 2 random modules and exchanges their sequences, of the chosen individual. If the individual chosen for mutation is a legal Sequence Pair then this muta tion operator does not introduce any new modules in either sequence. Also, this mutation operator neither duplicates nor erases any module in either sequence. Hence, the Sequence Pair of the mutated individual will remain legal. Mutation Operator 2 ( Rando m module p alone ) This mutation operator picks 2 random modules and exchanges their positions only in If the individual chosen for mutation is a legal Sequence Pair then this mutation operator does not introduce any new modules in the first sequence, and neither duplicates nor erases any module in the first

PAGE 47

36 sequence. Also, the second sequence is not changed at all. Hence the Sequence Pair of the mutated individual will remain legal. Mu tation Operator 3 (Random module orientation change ) This mutation operator picks a random module, b i i ) by exchanging the width and the height of the chosen module. This operator does not change the Sequence Pair at all and hence will maintain the legality of the Sequence Pair of the mutated individual. Local Optimization The proposed genetic floorplanner uses a heuristic local optimization technique to find better solutions that are located near the generated offspring. Genetic algorithms are not very efficient in local neighborhood search [ 4 ]. Hybridization of genetic a lgorithms with local search operators will result in faster convergence to better solutions. The proposed genetic floorplanner uses a slack based local optimization operator, bas ed on a strategy proposed in [21 ]. Although different strategies for slack bas ed moves were discussed in [21 ], no procedures were given for the implementation of t he slack based moves used in [21 ]. In this work, a single slack based move is utilized which has been implemented using the procedure described in Figure 3. 6 Local optimi zation using Procedure Slack_Move1 i s a greedy procedure that accepts the modified individual if there is an improvement in either area or wirelength. If there is no improvement in either objective, the original individual from the population ( currInd in t he procedure) is returned. The local optimization move, Slack_Move1, uses the slack measures defined in Equation 3.3.2 and Equation 3.3.3 as whitespace estimates around the modules. It identifies the smallest module with zero whitespace surrounding it usi ng lines 3 and 4 of the procedure. It then identifies a large module surrounded by a lot of whitespace in line 5 of the procedure. Using lines 6 8, the procedure moves the small module adjacent to the

PAGE 48

37 larger module in either the horizontal or vertical dire ction based on the whitespace along those directions. Procedure Slack_Move1 ( currInd, newInd ) 1. Copy ( currInd newInd ); 2. Evaluate Slacks ( newInd ); 3. findLeastSlackModules ( newInd pList ); 4. p = findSmallestAreaModule ( pList ); 5. q = findLargestSlackModule ( newInd ); 6. If (X Slack( q ) > Y Slack( q )) then i. Move module p next to module q and arrange them in a horizontal fashion. 7. Else if (X Slack( q ) < Y Slack( q )) then i. Move module p next to module q and arrange them in a vertical fashion. 8. Else i. Move module p next to m odule q and maintain their existing geometric relations. 9. Compute Fitness ( newInd ); 10. If ( area( newInd ) >= area( currInd ) ) OR ( wirelength( newInd ) >= wirelength( currInd ) ) i. copy( currInd, newInd ); 11. return; Fig ure 3.6 Procedure for Slack_ Move1 used for l ocal o ptimization by the proposed floorplanner Population Update The new population for the next generation is formed by choosing the required number of individuals from the combined pool of the current population and the newly formed offspring populati on. The combined pool is sorted into non dominated fronts and all the individuals belonging to a particular non domination front are copied into the new generation starting from the non domination front with rank zero. If the addition of all the individual s in a certain non domination front results in violation of the population size, then the required number of individuals are chosen using the crowded

PAGE 49

38 selection operator so that a uniformly distributed front, with respect to the crowding distance, is obtain ed. 3.5 Proposed Crossover Operators The proposed genetic floorplanner uses two novel crossover operators namely Modified Two Point Order based Crossover Operator (MTOX) and Heuristic One Point Order based Crossover Operator (HOOX). T he MTOX operator is an unbiased operator that attempts to search the entire solution space for a good solution by randomly combining segments from the parents to form the offspring T he Heuristic One point Order Crossover operator (HOOX) tries to bias the search towards prom ising regions of the solution space by promoting the transfer of good sub floorplans from the parents to the offspring. 3.5.1 Modified Two Point Order based Crossover (MTOX) Operator The MTOX operator is a novel crossover operator proposed specifically to work on Sequence Pair s. This operator is a combination of two classical crossover operators, namely the two p o int crossover operator and the o rder crossover operator. The original two point crossover operator [3,4 ] was proposed for use with individuals th at were encoded as binary strings, as is illustrated in Figure 2. 3 If this traditional method is used with Sequence Pair sequences due to duplication (violation of uniqueness property) and deletion of modules (violatio n of existence property) as shown in Fig ure 3.7 These violations must be removed to obtain a Sequence Pair that represents a valid floorplan.

PAGE 50

39 Fig ure 3.7. Legality violations when using two point crossover on permutation based chromosomes To eliminate t he occurrence of such violations in the offspring's Sequence Pair the traditional two point crossover operator is combined with the order crossover operator to yield the Modified Two point Order Crossover (MTOX). The original order crossover operator [3] was proposed to work on individuals encoded as a single permutation as shown in Figure 3.8 To form the first offspring (Offspring 1 in Figure 3.8), t h e order crossover operator cuts the parent chromosomes at two points. For the genes present in between th e two cuts of the second parent, the operator then identifies their positions in the first parent and fills these positions with holes (denoted by H in Figure 3.8). The holes are then slid to the spaces between the two cuts in a wrap around fashion. Finall y, the holes are filled with the genes in the order they are present in the second parent. Offspring 2 is produced by a similar process after exchanging the two parents. In this manner, order crossover preserves the relative positions of the genes from bot h the parents in the offspring.

PAGE 51

40 Fig ure 3. 8 Order crossover proposed for permutation based chromosomes The MTOX operator is not a simple extension of the two point crossover to handle the two permutations of the Sequence Pair The MTOX operator couples contributed by the same parent are maintained in the offspring. The orientation of a module is contributed by the parent that dicta tes the position of the module. This ensures that a good configuration within a parent is preserved in the offspring. The MTOX crossover operat or accepts two parents P1 and P2 as input and generates two offspring Off1 and Off 2 The MTOX crossover operati on is formally described in the procedure below: Procedure MTOX (P1, P2, Off1, Off2) Step 1. Generate 2 random cut Let x 1 P1 x 2 P1 and x 3 P1 be the three segments obtained from the two P1. Step 2. Find the ord er of the modules in segment x 1 P1 in the sequence.

PAGE 52

41 Generate segment y 1 P1 using this ordering of modules. Similarly, generate segments y 2 P1 and y 3 P1 sequence ordering of the modules in the segments x 2 P1 and x 3 P1 Step 3. Find the order of the modules in segmen t x 1 P1 Generate segment x 1 P2 using this ordering of modules. Similarly, generate segments x 2 P2 and x 3 P2 2 P1 and x 3 P1 Step 4. Find the order of the mod ules in segment x 1 P1 sequence. Generate segment y 1 P2 using this ordering of modules. Similarly, generate segments y 2 P2 and y 3 P2 sequence ordering of the modules in the segments x 2 P1 and x 3 P1 Step 5. The concate nation of the sequences x 1 P1 x 2 P 2 and x 3 P1 offspring (Off1). Step 6. The concatenation of the sequences y 1 P1 y 2 P 2 and y 3 P1 sequence of the first offspring (Off1). Step 7. Module orientation s in the of fspring are copied over from the respective parent that contributes the position of the module. Step 8 Steps 1 7 are repeated after exchanging the two parents to obtain the second offspring, Off2. Figure 3.9 illustrate s the MTOX operator with an example It can be formally proven that the offspring produced by the MTOX procedure above always produce legal offspring as shown in Theorem 3.1.

PAGE 53

42 Fig ure 3.9 Generation of the first offspring using the MTOX operator 1 3 4 2 7 6 8 5 4 1 5 3 7 8 6 2 2 6 3 7 1 5 8 4 (P1) 2 3 7 4 5 8 6 1 (P2) 1 3 4 7 2 6 8 5 2 6 3 7 1 5 8 4 (Off1) y 1 P1 1 3 4 2 7 6 8 5 4 1 5 3 7 8 6 2 2 6 3 7 1 5 8 4 (P1) 2 3 7 4 5 8 6 1 (P2) 1 3 4 2 7 6 8 5 4 1 5 3 7 8 6 2 2 6 3 7 1 5 8 4 (P1) 2 3 7 4 5 8 6 1 (P2) x 1 P1 = {1,3,4} x 2 P 1 = { 2 7 } x 3 P 1 = { 6 8 5 } y 1 P1 = {3, 1, 4} y 2 P 1 = { 2 7 } y 3 P 1 = { 6 5,8 } x 1 P 2 = { 4 1,3 } y 1 P 2 = {3, 4,1 } x 2 P2 = {7,2} y 2 P 2 = { 2 7 } x 3 P2 = {5,8,6} y 3 P 2 = { 5 8,6 } y 2 P 2 y 3 P1 Step 1 Step 2 Step s 3 4 Step s 5 7 x 1 P1 x 2 P 2 x 3 P1 c1=3 c2= 5

PAGE 54

43 Theorem 3.1. Given legal sequence pa irs as the parents, the MTOX operator always produces legal sequence pairs as the offspring. Proof: Let X P1 and Y P1 sequences of the first parent. Let X P2 and Y P2 denote sequences of the second parent. Let and without loss of generality, let us assume that there is a bijection between the module names and the set M Let and Similarly let and Since both the parents correspond to legal floorplans, we have (3.5.1) Step 1 of Procedure MTOX generat es three segments x 1 p1 x 2 p1 and x 3 p1 from X P1 using the two random cutpoints. Let and By construction and Equation 3.5.1 (3.5.2) Let sequence s n }. We define s ( s )] as the ) that contains the modules in s in the order they occur in the sequence ). For example, assume X P1 = {5, 4, 1, 3, 2} and s = {1, 2, 5}; then X P1 ( s ) = {5, 1, 2}. Step 2 of Procedure MTOX generates three segments y 1 P1 = Y P1 ( s 1 ), y 2 P1 = Y P1 ( s 2 ), and y 3 P1 = Y P1 ( s 3 ). Let and By construction and Equation 3.5.1 we get (3.5.3) (3.5.4) Step 3 generates three segments x 1 p2 = X P2 ( s 1 ), x 2 p2 = X P2 ( s 2 ), and x 3 p2 = X P2 ( s 3 ). Let and By construction and Equation 3.5.1

PAGE 55

44 (3.5.5 ) (3. 5.6) Step 4 generates three segments y 1 p2 = Y P2 ( s 1 ), y 2 p2 = Y P2 ( s 2 ), and y 3 p2 = Y P2 ( s 3 ). Let and By construction and Equation 3.5.1 (3.5.7 ) (3.5.8) Off1 ) of the o ffspring using segments x 1 p1 x 2 p2 and x 3 p1 L et By construction, From Equation 3.5.5 Hence, sequence (Y Off1 ) of the o ffspring using the segments y 1 p1 y 2 p2 and y 3 p1 Let By construction, From Equation 3. 5.3 and Equation 3 .5 7 we get This yields sequence of the offspring is also legal. Hence the offspring floorplan represented by sequence formed using the proposed MTOX crossover operator is always a legal floorplan 3.5.2 Heuristic One Point Order based Crossover (H OOX) Operator The HOOX operator is a heuristic crossover operator proposed specifically for Sequence Pair s, that identifies good sub floorplans in the parents and preserves them in the offspring. The

PAGE 56

45 HOOX operator uses the traditional One point Crossover operator [3] to partition both the parents into two sub floorplans as shown in Figure 3. 1 0 The order crossover operator is used to eliminate any violat ions in the Sequence Pair s for the resulting sub floorplans. The two sub floorplans with the better area usage are then combined to form the offspring as shown in Figure 3. 1 1 The HOOX crossover operation is de scribed in the procedure below. Procedure HOO X (P1, P2, Off1) Step 1 P 1 and x 2 P 1 be the two segments obtained from the one Step 2. Find the order of the modules in segment x 1 P 1 sequence Generate segment y 1 P 1 using this ordering of modules. Similarly, generate segment y 2 P 1 sequence ordering of the modules in the segment x 2 P 1 The Sequence Pair s (x 1 P 1 y 1 P 1 ) and (x 2 P 1 y 2 P 1 ) correspond to two sub floorplans, FP11 and FP21. Step 3. Find the order of the modules in segment x 1 P 1 Generate segment x 1 P 2 using this ordering of modules. Similarly, generate segment x 2 P 2 2 P 1 Simila rly, generate segments y 1 P 2 and y 2 P 2 sequence ordering of the modules in the segments x 1 P 1 and x 2 P 1 The Sequence Pair s (x 1 P 2 y 1 P 2 ) and (x 2 P 2 y 2 P 2 ) correspond to two sub floorplans, FP12 and FP22. Step 4. Sub floorplans FP11 and FP12 conta in the same modules in them and form two alternatives for building a sub floorplan using the modules in segment x 1 P 1 S ub floorplans FP21 and FP22 form two alternatives for building a sub floorplan using the modules in segment x 2 P 1 The sub floorplan alter natives with the better area usage are picked to form the offspring floorplan. Step 5. Four different offspring configurations are possible using the two sub floorplan alternatives. The best configuration is chosen as the final offspring.

PAGE 57

46 Fig ure 3.1 0 Generation of sub floorplans using single cut point in the HOOX operator (steps 1 3 of Procedure HOOX) The module orientations in the parents are preserved in the generated sub floorplans and are copied over to the offspring. To speed up the area computat ions of the offspring configurations, each sub floorplan is regarded as a super module. The offspring can now be considered to be made of just two super modules It is to be noted that in step 5 of Procedure HOOX, no modules are added or deleted. Only the positions of the individual modules from the selected segments are determined. It can be formally proven that the offspring produced by the proposed HOOX operator is always legal as shown in Theorem 3.2.

PAGE 58

47 Fig ure 3.1 1 Generation of offspring configuration s in the HOOX operator (steps 4 5 of Procedure HOOX) Theorem 3.2. Given legal sequence pairs as the parents, the HOOX operator always produces legal sequence pairs as the offspring. Proof: Let X P1 and Y P1 sequences of the first parent. Let X P2 and Y P2 denote sequences of the second parent. Let Without loss of generality, we can a ssume that there exists a bijection between the module names and t he set M Let and Similarly let and Since both parents correspond to legal floorplans, we have (3.5.9) Step 1 of Procedure HOOX generates two segments x 1 p1 and x 2 p1 from X P1 using the single random cutpoint.

PAGE 59

48 Let and By construction and Equation 3.5.9 (3.5.10) Step 2 of Procedure HOOX generates two segments y 1 p1 = Y P1 ( s 1 ) [ the notation used here is the same as defined in the proof of Theorem 3.1 ] and y 2 p1 = Y P1 ( s 2 ). Let and By construction, and Equation 3.5.9, (3.5.11) (3.5.12) Step 3 of Procedure HOOX first generates two segments x 1 p2 = X P2 ( s 1 ) and x 2 p2 = X P2 ( s 2 ). Let and By construction and Equation 3.5.9 (3.5.13) (3.5.14) Step 3 of Procedure HOOX also generates two more segments y 1 p2 = Y P2 ( s 1 ) and y 2 p2 = Y P2 ( s 2 ). Let and By construction and Equation 3.5.9 (3.5.15) (3.5.16) Step 4 of Proced ure HOOX picks either s 1 and t 1 or q 1 and r 1 for the first sub floorplan of the offspring. Step 4 also picks either s 2 and t 2 or q 2 and r 2 for the second sub floorplan of the offspring. F our different offspring configurations are possible. In all of the ca ses, let F 1 denote the and let F 2 denote sequence, i.e.

PAGE 60

49 Case 1: s 1 t 1 s 2 and t 2 are chosen to form the offspring s 1 and s 2 Hence From Equation 3.5.10, we have Hence offspring is legal. sequ ence of the offspring is formed using the modules in t 1 and t 2 Hence From Equation 3.5.12 we have Hence sequence of the offspring is legal. Case 2: s 1 t 1 q 2 and r 2 are chosen to form the of fspring he offspring is formed using modules in s 1 and q 2 Hence From Equation s 3.5.10 and 3.5.13 we have and Hence offspring is leg al. sequence of t he offspring is formed using modules in t 1 and r 2 Hence From Equation s 3.5.11, 3.5.12, and 3.5.13, and Thus sequence of offspring is legal. Case 3 : q 1 r 1 q 2 and r 2 are chosen to form the offspring he offspring is formed using modules in q 1 and q 2 Hence From Equation 3.5.14 we have and Thus, offspring is legal. sequence of the offspring is formed using the modules in r 1 and r 2 Hence From Equation 3.5.16 we have and Thus, sequence of the offspring is also legal. Case 4: q 1 r 1 s 2 and t 2 are chosen to form the offspring q 1 and s 2 Hence From Equation s 3.5.10 and 3.5.13, we have and Thus,

PAGE 61

50 sequence of the offspring is formed using the modules in r 1 and t 2 Hence From Equation s 3.5.11 and 3.5.15 we have and Thus sequence of the offspring is also legal. Since the above four cases are the only ways in which the HOOX operator produces an offspring, it is proved that the proposed HOOX operator always produces legal offspring 3.6 Empirical Determ ination of GA Parameter Settings The values for all the GA parameters were determined from empirical studies conducted on a training set of circuits selected from both the MCNC and GSRC benchmark sui tes. The hp and ami33 circuits were used as training circuits from the MCNC suite while the n100 and n300 circuits were used as training circuits from the GSRC benchmark suite. Table 3.2. Best area and total w irelength results obtained by the proposed gen etic floorplan ner for the n 300 benchmark for various crossover and mutation rates (A Area in sq.units, W Total Wirelength in units) CX_RATE 0.5 1 A W A W MUTN_RATE 0.01 328755 598997 323464 573560 0.05 325511 587297 317633 522298 0.1 31906 1 592562 313045 510862 Generally speaking, crossover rates are set high (in the range of [0.5, 1.0]) in genetic algorithms to allow the inheritance of good genetic information from parents to offspring while mutation rates are kept low (in the range of [ 0.001, 0.1]) to prevent the loss of good genetic information from the population [3, 4]. In the proposed genetic floorplanner, two crossover rates (0.5 and 1.0) and three mutation rates (0.01, 0.05, and 0.1) were used for the experimental studies.

PAGE 62

51 Table 3. 2 lists the results obtained by the proposed genetic floorplanner on the n300 GSRC benchmark for the different settings of the crossover and mutation rates. For all the training circuits, the best results were obtained when the crossover rate was set to 1. 0 and the mutation rate was set to 0.1. The genetic floorplanner also exhibited good convergence for these values. Figure 3.12 through Figure 3.17 show the convergence plots of the proposed genetic floorplanner for various settings of the crossover and mut ation rates. Fig ure 3.1 2 Convergence plot of wirelength for n300 benchmark with cxRate=0. 5 mutRate=0.1 Note: All values are in generic units.

PAGE 63

52 Fig ure 3 1 3 Convergence plot of area for n300 benchmark with cxRate=0. 5 mutRate=0.1 Note: All values a re in generic units. Fig ure 3.14 Convergence plot of w irelength for n300 benchmark with cxRate= 1. 0, mutRate=0. 05 Note: All values are in generic units.

PAGE 64

53 Fig ure 3.15 Convergence plot of a rea for n300 benchmark with cxRate= 1.0 mutRate=0. 05 Note: A ll values are in generic units. Fig ure 3.16 Convergence plot of w irelength for n300 benchmark with cxRate= 1. 0, mutRate=0.1 Note: All values are in generic units.

PAGE 65

54 Fig ure 3.17 Convergence plot of a rea for n300 benchmark with cxRate= 1. 0, mutRate=0.1 Note: All values are in generic units. 3.7 Experimental Results The proposed genetic floorplanner was implemented using C++/STL and compiled with g++ version 3.4.6 (using the O2 flag). For all the experiments, the proposed genetic floorplanner was run wi th a crossover rate of 1.0 and mutation rate of 0.1. These GA parameter values were empirically determined as discussed in Section 3.6 All the experiments were run on a Linux machine with a 3.2GHz Intel Pentium 4 processor and 2 GB RAM.

PAGE 66

55 Table 3. 3 Are a and wirelength c omparisons with t he AdaptGA, QP LFF and P arquet f loorplanners (A = Area in mm 2 Proposed Floorplanner AdaptGA Floorplanner QP LFF Floorplanner SA based (Parquet) F loorplanner A W A %S W %S A %S W %S A %S W %S apte 48.48 319.81 46.56 1.93 390.57 18.12 hp 10.052 146.94 22.22 7.49 492.002 13.72 xerox 20.56 424.48 10.288 2.29 164.542 10.7 ami33 1.20 31.33 1.22 1.64 39.37 20.4 1.177 1.95 45.3 30.84 1.327 9.54 57.72 45.72 ami49 37.81 677.9 37.16 1.75 971.3 30.2 36.6 3.3 879.9 22.96 40.66 7.01 803.89 15.67 Average Savings 0.05 25.3 2.63 26.9 4.88 20.79

PAGE 67

56 Floorplan area is computed using the longe st common subsequence method [16 ] proposed for floorplans represented by Sequence Pair s. Half p erimeter wirelengths (HPWL) are computed for all the nets to estimate the total wiring required for the floorplans. The positions of the module pins are available f or the MCNC benchmarks. Hence, pin to p in half perimeter wirelengths are an outline for the chip. If the aspect ratio of the packing produced by the proposed genetic along the chip boundary to obtain their new positions. This meth od has been previously used by many researchers as reported in [ 40 ]. The module pin positions are not specified for the GSRC benchmarks. Hence the module centers are used to measure HPWL in this case. The area and wirelength results shown for the proposed genetic floorplanner belong to a single valid floorplan chosen from the final pareto optimal set. These results belong to the best floorplan solution found out of three independent runs (using different random number seeds) of the genetic floorplanner. 3. 7.1 Performance of the Proposed Genetic Operators To demonstrate the performance of the proposed crossover operators, the proposed genetic floorplanner is compared with another genetic algorithm based floorplanner [2 9 ] (referred to as AdaptGA in the rest of this chapter ) that also uses the Sequence Pair based solution encoding. knowledge, for a genetic floorplanner that encodes its solutions using the Sequence Pai r representation. This ensures that the performance improvements obtained can be attributed to the proposed crossover operators as the solution encoding is the same for both the floorplanners. The proposed floorplanner easily outperformed AdaptGA as can be seen in Table 3.3 obtaining an average wirelength savings of 25.3% with almost no increase in area. In fact, the proposed genetic

PAGE 68

57 floorplanner produces better area than AdaptGA for the ami33 benchmark but increases the floorplan area slightly (1.75%) for the ami49 benchmark. It is to be noted that the area and total wirelength results reported for the proposed genetic floorplanner belong to a single individual present in the best non dominated front of the final population. It is to be noted that the Adapt GA does not perform simultaneous area and wirelength optimization. The results from the proposed genetic floorplanner are compared against the best area and wirelengths reported by AdaptGA. The runtimes of the proposed GA were faster than AdaptGA but are n ot directly comparable as AdaptGA was run on an UltraComp model60 workstation (clock speed and memory not reported). Specifically, the average running time (over 3 runs) of the proposed genetic floorplanner per pareto front solution was approximately 0.433 seconds and 1.12 seconds for the ami33 and ami49 benchmarks respectively on the previously mentioned Linux machine. 3.7.2 Co mparisons against State of the a rt Floorplann ers To demonstrate the effectiveness of the proposed floorplanner for simultaneous a rea and wirelength optimization in outline free floorplanning, the proposed floorplanner is compared with two state of the art floorplanners that perform simultaneous optimization of area and wirelength. The proposed genetic floorplanner is compared with S heqin et al [2 4 ] (referred to as QP LFF in the rest of the chapter ) which claims the best results for the MCNC benchmarks among floorplanners that simultaneously optimize both area and wirelength, o utperforming Enhanced O tree [18], TCG [19], and SA LP [23 ] (refer to [24 ] for details). Comparisons are also made with the publicly available SA based Parquet floorplanner which uses the single normalized weighted sum (SNWS) approach. Tables 3.3 and 3.4 summarize the results of the area and total wirel ength comp arisons for the MCNC and GSRC benchmarks respe ctively. The proposed genetic floorplanner outperforms QP LFF in terms of wirelength for both the ami33 and ami49 benchmarks. QP LFF does not report results for the other MCNC

PAGE 69

58 benchmarks. The proposed floorpla nner yield ed an average wirelength savings of 26.9% when compared to QP LFF for a very marginal 2.63% increase in area, which is justifiable given the significant wirelength savings obtained. Comparisons against Pa rquet The most recent version (4.5.23) of the pub licly available Parquet tool [21] [22 ] was obtained, compiled (with the O3 flag) and installed on the same Linux compute cluster mentioned above. The outline free floorplanning results for Parquet were obtained using the following flag settings: FPrep SeqPair minWL areaWeight 0.5 wireWeight 0.5 The population based approach and the NSGA II template of the proposed genetic floorplanner ensure that multiple floorplan solutions belonging to a pareto optimal set are available to the user. Thus t he user can choose from numerous pareto optimal solutions and analyze the trade offs involved between area and wirelength objectives for the particular problem instance. But the Parquet floorplanner can only yield a single solution from each of its runs. S ince the Parquet tool has to be run multiple times with different objective weight the time ( t pf ) to obtain a single solution of the final pareto front for comp arison against the runtime of the SA based optimizer. The size of the final Pareto front was recorded for each run of the proposed floorplanner to compute the runtime per pareto front s o lution (last column of Table 3.4 ). Table 3.4 reports the area, wirelen gth and runtime results obtained by the proposed genetic floorplanner on the GSRC benchmarks. The Parquet tool was then run for of three independent runs are us ed for the comparisons shown in Table 3.5 The ra tio of runtimes s hown in Table 3.5 is the ratio of the runtime of the proposed genetic floorplanner by the runtime of the Parquet floorplanner to obtain the reported savings.

PAGE 70

59 Table 3. 4 Area, wirelength and run time results for the proposed genetic floorplanner on the GSRC benchmarks Circuit A rea W irelength Best Ind ividual Total Runtime (s) RunTime /PF s oln (s) Min Avg Min Avg Area WL n10 228492 242455.7 30375 32407.27 238120 32138.33 46.67 0.47 n30 22418 6 226355.3 86223 89287.27 227319.7 89065.33 651.48 2.17 n50 213498 219323.7 118757 122803 218961 122731.3 1745.55 3.49 n100 198592 200955 191626 195489.3 200776.3 197819 9176.95 9.18 n200 201600 204484 354726 359090.3 203908.3 366423.7 42619.37 21.31 n 300 314130 320387 505139 521463.4 320045.6 517739.2 60822.62 1645.47 Table 3.5 Percentage savings in Area and Wirelength of the Proposed Genetic Floorplanner compared against the PARQUET Floorplanner for the GSRC benchmarks Circuit Best Ind ividual Ratio of Runtimes Area WL n10 0.44 16.90 0.9881 n30 1.44 22 6 9 1.06104 n50 0.01 14.98 1.05365 n100 0.7 8 15.38 1.08947 n200 3.18 22.42 1.0373 n300 7.82 12.94 1.06869 2. 13 17.55 For the MCNC benchmarks, the proposed floorplanner outperforms Par quet in terms of both area and wirelength producing 20.79% average wirelength savings and an average area savings of 4.88%. Fig. 18 shows one of the best floorplans obtained using the proposed GA for the ami33 benchmark. For the GSRC benchmarks, the propos ed genetic floorplanner outperforms Parquet in terms of total wirelength but at the cost of a small area penalty. Specifically, the proposed genetic floorplanner obtains 17.55% average wirelength savings for an average area increase of 2.13%. It is to be n oted that the proposed genetic floorplanner produces better wirelength results for all the benchmarks. Fig. 19 shows one of the best floorplans

PAGE 71

60 obtained by the proposed floorplanner for the n100 GSRC benchmark. Considering both the benchmark suites, the pr oposed genetic floorplanner obtains an average savings of 19.17% in wirelength and 1.375% in area over the Parquet floorplanner. Figure 3.18. Floorplan of ami33 benchmark with area = 1.21 mm 2 and total wirelength = 35.43 mm Fig ure 3.19. Floorplan of n1 00 GSRC benchmark with area = 205 758 sq.units and wirelength = 133 497 units

PAGE 72

61 3 8 Summary VLSI Floorplanning has transformed in to a multi objective optimization problem with the recent advances in integration technology. Genetic algorithms have been exten sively used in different forms to solve various multi objective optimization problems. In this work, the NSGA II multi objective genetic algorithm has been applied to tackle the VLSI floorplanning problem considering the floorplan area and total wirelength objectives. Novel crossover operators have been developed for effective floorplanning using Sequence Pairs. The hybridized multi objective floorplanner achieves very good results for the MCNC and GSRC benchmark suites as compared to other floorplanners th at perform simultaneous optimization of area and wirelength. Thus, genetic algorithms can be used effectively for multi objective optimization in VLSI design when equipped with well designed genetic operators.

PAGE 73

62 CHAPTER 4 DESIGN OF A N FPGA BASED GEN ERAL PURPOSE GENETIC ALGORITHM IP CORE Genetic a lgorithms have been shown to be a robust searc h mechanism that can be used as an effective optimization engine in a wide variety of applications [6, 7], which include reconfigurable hardware applications and real time applications. Genetic algorithms can explore but can incur s ignificant run times due to their population based search mechanism A natural solution to speed up the genetic algorithm i s to implement it in hardware. Another advantage of a hardware implementation of a GA is the elimination of the need for complex time and resource consuming communication protocols needed by an equivalent software implementation to interface with the main application. This is particularly advantageous to real time applications such as reconfiguration of evolvable hardware. Moreover, with the rapidly increasing FPGA logic density, efficiently designed hardware genetic algorithms can be implemented on a sing le FPGA in addition to the target application resulting in a less bulky apparatus. In this work a robust parameterized genetic algorithm IP core is proposed that is readily synthesizable using standard FPGA design tools at both t he RT level and gate leve l. The programmable IP c ore can be easily integrated with any application that requires a search engine and can a lso be implemented in a system on a c hip configuration. Its architecture is extremely flexible and easy to integrate with target applications a llowing seamless integration of user defined Text page 1 sample: 1 inch top margin, 2 3 hard returns before title. Left justify text

PAGE 74

63 blocks such as fitness function modules. The core has been implemented on a Xilinx Virtex2Pro FPGA device (xc2vp30 7ff896). It has a very small footprint (only 13% slice utilization) a nd runs at a high speed (5 0 MHz). This IP core is highly suitable for Evolvable Hardware (EHW) [59 ] applications. It is one of the building blocks of the Self Reconfigurable Analog Array architecture and i s used to compensate extreme temperature effects on VLSI electronics [68][69 ]. T he novel contributions of this work are that the proposed GA IP core: is available at differe nt design levels, RT level and g ate level, to provide the end user flexibility in choosing the design level at which to include the GA IP core, supports user def ined fitness functions without the need for re synthesis of the entire design, allows programming of the initial seed for the Random Number Generator (RNG) that enables different convergence characteristics to be obtained for the same GA parameter setting s, provides PRESET modes, allowing the user to readily experiment with a varied set of predefined GA parameter settings, and can be directly implemented as a digital ASIC using standard ASIC design tools with simple scan chain testability built into the co re and with basic fault tolerance in the form of PRESET modes to bypass parameter initialization failure. In addition, the proposed GA IP core has several highly desirable features: Programmability values of important GA parameters including population size, number of generations, crossover rate, and mutation rate can be programmed to accommodate the requirements of a wide variety of applications, High Probability of Convergence to Optimal Solution an e litist GA model is used that can co nverge to the g lobal optimum [8, 9 ], and

PAGE 75

64 Easy Interfacing simp le two way handshaking protocol to interface with user defined fitness evaluation module and the target application. 4.1 Background and Related Work In this section, we will review previous hardware impleme ntations of general purpose genetic algorithms. We will also review some background materi al on Evolvable Hardware and discuss how the previous hardware implementations fail to address key design issu es of Evolvable Hardware. 4.1.1 Prior Work There have b een many hardware implementations of both general purpose [ 41 46] and application specific [52 ] genetic algorithms. This section will review previous FPGA implementations of a general pur pose genetic algorithm Table 4.1 summarizes the existi ng works on FP GA implementation of ge neral purpose genetic algorithm Several application specific hardware implement ations of a genetic algorithm [52 ] exist in literature tailored to the particular application in t erms of chromosome encoding, crossover and mutation operations. These implementations will not be reviewed here as they cannot be re used with other applications even for prototyping purposes.

PAGE 76

65 Table 4.1. Review of existing li terature on FPGA implementation of a general purpose g enetic a lgorithm UNKNOWN) Work Elitist Pop. Size No. Gens Selection Crossover/ Mutation rates Crossover Operators RNG/ Seed Preset Modes Initialize Mode FPGA platform Scott et al [5] N fixed (16) fixed Roulette fixed 1 point CA/fixed none none BORG bo ard Tommiska and Vuori [6] N fixed (32) fixed Round robin fixed 1 point LSHR/ fixed none none Altera Shackleford et al [7] N fixed fixed Survival fixed 1 point CA/fixed none none Aptix Yoshida et al [8] N 64 or 128 fixed Simplified tourney 1 point CA/ fixed none none SFL (HDL) Tang and Yip [9] prog. prog. Roulette prog. 1 point, 4 point, uniform fixed none PCI card based system Aportewan et al [10] N/A fixed (256) N/A N/A N/A N/A CA/fixed none none Xilinx Virtex1000 Proposed Y prog. (8 bit) prog. (32 bit) Roulette prog. (4 bit) 1 point CA/prog. 3 diff. modes separate init. mode (two way handshake) Xilinx Virtex2Pro FPGA

PAGE 77

66 The first FPGA implementation of a general purpose GA engine was propos ed by Scott et al [41 ] who described a modular hardwar e implementation of a simple genetic algorithm that used roulette wheel selection, one point crossover with a fixed population size of 16 and member width of 3 bits. The genetic algorithm was broken into simpler modules and each module was described using behavioral VHDL. The overall GA design was implemented on multiple Xilinx FPGAs on a BORG board. The main goal of [4 1 ] was to illustrate the issues of hardware implementation of a general purpose GA. Tommiska and Vuori [42 ] implemented a general purpose GA system with round robin parent selection and one point crossover and used a fixed population size of 32. The GA was using high performance PCI buses. Experimentati on on various fitness functions involved rewriting the AHDL code and reprogramming the FPGAs. Shackleford et al [43 ] implemented a survival based, steady state GA in VHDL to achieve higher performance and tested it on set covering and protein folding prob lems. The prototype GA machine used for the set cover problem was designed using the Tsutsuji logic synthesis system and implemented on an Aptix AXB MP3 Field Programmable Circuit Board (FPCB) populated with six FPGAs. Yoshida et al [44 ] implemented a Gen etic Algorithm Processor with a steady state architecture that supports efficient pipelining and a simplified tournament selection. Tang and Yip [45 ] implemented a PCI based hardware GA system using two Altera FPGAs mounted on a PCI board. The PCI based G A system has multiple crossover and mutation operators implemented with programmable crossover and mutation thresholds. Tang and Yip also discussed different parallel implementations of the PCI based GA system. In contrast to th e simple GA, Aportewan et al [46 ] implemented a compact GA in Verilog HDL as it is more amenable towards a hardware implementation. However, compact GAs suffer

PAGE 78

67 from a severe limitation in that their convergence to the optimal solution is guaranteed only for the class of applications that possess tightly coded, non overlapping building blocks [46 ]. Hardware acceleration techniques such as pipelining and parallel architectures have been applied to the design of hardware genetic algorithm s [47 49 ]. Such advanced techniques are not the m ain focus of this work and will not be discussed here. The motivations of all the previous FPGA implementations fall under one or more of the following categories: Basic Hardware Acceleration to obtain speedup over the correspo nding software implementat ion [41 45 ], Novel GA Templates to propose a novel genetic algori thm template or architecture [46 ] that is more suited f or a hardware implementation, Advanced Hardware Acceleration Techniques to accelerate a genetic algorithm using pipelined, and/or parallel impleme ntations of GA architectures [47 49 ]. The primary goal of the above efforts is to demonstrate t he speedup that can be achieved by a hardware GA implementation. As a result, the prototypes developed in the above FPGA based implementations su ffer from one or more of the following limitations: Lack of programmability for GA parameters Some or all of the GA parameters are customizable parameters is the GA mac h ine proposed by Tang and Yip [45 ]. Lack of scalability in terms of fitness functions Only a single fitness function is supported. To accommodate a new fitness function, the entire design has to re synthesized All the previous implementations suffer from this limitation. Pre defined system architecture/organization The GA architecture is defined based on a specific development environment imposing serious restrictions on the target application For example in Tang and Yip [45 ], the GA machine can only be implemented on a PCI

PAGE 79

68 card that contains two FPGAs with a local PCI bus providing the communication between the different modules. The proposed GA IP core overcomes all of the above mentioned limitations with the following features: Independent Paramet er Initialization Phase The proposed GA core has a separate initialization phase that enables the user to program the desired GA parameters including population size, number of generations, crossover threshold, mutation threshold, and the initial seed us ed by th e random number generator Compact IP core with Simple Communication Interface The proposed GA core does not impose any hardware requirements or system architecture on the user. It can be instantiated within the main application as a drop in IP module and synthesized along with the application. Support for External Fitness Functions The proposed core allows the user to provide external fitness values by multiplexing between the internal and external fitness values and the interfacing signals. It can select from the existing internal fitness function or an external fitness function supplied by the user using another FPGA or from a PC. This eliminates the need to re synthesize the entire design just to accommodate a new fitness function. This is a very desirable feature especially for intrinsic EHW applications and other space applications that cannot re program the on board FPGA without significant effort and down time. Besides the FPGA implementations, ASIC implementations of GAs [50, 5 1 ] have al so been proposed to improve the performance of the GA using hardware acce leration. Wakabayashi et al [50 ] proposed a Genetic Algorithm Accelerator (GAA) chip that implements an elitist generational GA with on the fly adaptive selection between the two poin t and uniform crossover operators. The GAA chip was fabricated using 0.5um standard CMOS technology.

PAGE 80

69 Chen et al [5 1 ] developed a GA chip using 0.18um TSMC cell library. They developed a software tool called Smart GA to tailor the GA module to user specifi cations accept ed through a front end GUI Any re programming of the GA parameters will require re synthesizing the GA netlist and repeating the physical design process to obtain the ASIC. This is a significant problem as, in most cases, the user cannot pre dict the best GA parameter settings for his/her application. If the current user settings do not offer the best performance, then the user has to re synthesize the entire GA netlist with new parameter settings and re design the entire ASIC. The programmabl e GA IP core proposed in this work eliminates the need for such re synthesis. 4.1.2 Pseudo random Number Generation and GA Performance A genetic algorithm requires random numbers for generati on of the initial population crossover and mutation True rando m numbers can be generated using sp ecialized hardware that extract the random numbers from a non deterministic source such as cloc k jitter in digital circuits [53 ]. Pseudo random number generators (PRNG) use a deterministic algorithm to generate the random numbers. Hence, the sequence of random numbers can be predicted if the initial seed is known. The choice of random number generators depends upon the application at hand. Applications that require high security will use true random number generators. Appl ications that require a quick response and cannot afford the high area overhead of true random number generators will use pseudo random number generators. The effect of the quality of the random number ge nerators on the performance of genetic a lgorithms h a s been previously studied [54 56 ]. A high quality random number generator is generally characterized by a long period (before repetition of the random numbers), uniformly distributed random numbers, absence of correlations between consecutive numbers, and structural properties such as organization of the num bers in lattices. Meysenburg [54] and Meysenburg and Foster [55 ] reported little or no improvement of performance of GAs using good PRNGs over

PAGE 81

70 those using poor PRNGs. However later Cantu Paz [56 ] found significant improvements on the performance of a simple binary GA when using a good PRNG. Cantu Paz found that the quality of the random numbers used to gen erate the initial population has a major impact on the performance of the GA while it did not affec t the performance of the GA significantly for all the other operations such as crossover and mutation. The seed used by a Random Number Generator (RNG) influences the sequence of numbers generated. Although the RNG characteristics like cycle length and un iform distribution will remain the same with a different seed, the sequence of random numbers generated will differ. A poorly chosen seed can lead to a poor quality of random numbers generated by a good RNG. Guidelines to choosing a good seed can be fou nd in Garfinkel and Spafford [57 ]. The performance of algorithms depending on random numbers has been shown to va ry with the RNG seed. Elsner [58 ] studied the influence of the RNG seeds on the performance of four different graph partitioning algorithms. In a particular instance, Elsner observed that the performan ce worsened by up to 5 times by changing only the RNG seed. Theoretically, a good RNG will produce random numbers that are uniformly distributed for a large enough sample size. But due to time constrai nts (of real time applications), the distribution of the random numbers generated might be non uniform. This might lead to poor results for resource constrained hardware genetic algorithms. It has been observed by Meysenburg a nd Foster [55] and Cantu Paz [ 56 ] that poor RNGs can sometimes outperform good RNGs for particular seeds. Thus, a user will have to experimentally determine the RNG seed value for his particular application. This is particularly necessary for hardware implementations where simple RNG i mplementations are preferred due to tight resource and response time requirements. Hence, the proposed IP core allows the user to program the RNG seed in addition to providing three in built seeds to select from.

PAGE 82

71 4.1.3 Basics of Evolvable Hardware The bas ics of Evolvable Hardware [59 ] will be briefly discussed here as space applications are increasingly employing them to adapt on board electronics to the changing environmental conditions. Evolvable Hardware (EHW) is a class of hardware that adapts itself t o changing conditions using evolutionary algorithms. There are two major divisions of evolvable hardware, namely, extrinsic EHW and intrinsic EHW. Extrinsic EHW refers to hardware that is evolved using software simulations (and behavioral models of the har dware). The best configuration found in the simulations is then downloaded on to the actual hardware. Intrinsic EHW refers to the adaptation and re configuration of previously configured hardware because of changes observed or required in the actual hardwa re. The growing number of remote space applications has increased the demand for intelligen t and adaptive space systems [67 ]. Thus, intrinsic EHW is becoming popular in space applications. Intrinsic EHW have been targeted for differe nt platforms including ASICs [60 63 ], and specialized platfo rms [64 ]. Due to the flexibility and scalability requirements of space applications, most of the existing works on intrinsic EHW have been implemented on FPGAs [37, 38]. Intrinsic EHW can be classified into four diffe rent classes based on the location of the reconfigurable hardware and the evolutionary algorithm as proposed by Lambert et al [65 ]: PC based Intrinsic EHW The reconfigurable hardware application is located on an FPGA and the monitoring system is located in the PC. The reconfiguration of the evolvable hardware is done from the PC. This system suffers from a slow runtime because of the communication with the PC Complete Intrinsic EHW Both the reconfigurable hardware and the evolutionary algorithm are si tuated on the same (FPGA) chip. This system will yield the best performance as the communication delays are due to intra chip wires.

PAGE 83

72 Multi chip Intrinsic EHW The reconfigurable hardware and the evolvable hardware are located on different (FPGA) chips. Th e performance of this system is low er than the complete intrinsic EHW solution due to the inter chip communication delay Multi board Intrinsic EHW The reconfigurable hardware and the evolvable hardware are located in FPGA chips on different boards. The performance of this system is low er than both multi chip and complete intrinsic EHW solutions due to the inter board communication delays. Although the Complete Intrinsic EHW implementation (especially on an ASIC) yields the best performance and smallest form factor, it is not widely adopted as it suffers from low scalability and flexibility issues with respect to the fitness function computation. The multi chip and multi board implementations are considered better for intrinsic EHW due to the dynamic rec onfiguration features available in FPGAs (see Lambert et al [65 ] for more details). The proposed core alleviates this problem by supporting the interfacing of fitness functions housed on other chips (or boards) to the existing system, thus allowing the re alization of a hybrid system, as shown in Figure 4. 3 The proposed core allows the user to select between internal and external fitness functions. Hence, even if the existing system is implemented on an ASIC, new fitness functions can be added externally t o the system. It is to be noted that the proposed IP core can be used to realize all classes of intrinsic EHW systems (excluding PC based) both on an FPGA and on an ASIC without losing out on flexibility. 4.2 Proposed FPGA B ased Genetic Algorithm IP Core This section describes in detail the implementation and interfaci ng details of the proposed core and the design issues considered for ASIC development and space applications.

PAGE 84

73 4.2.1 Implementation and Interfacing Details In th is sub section, the design met hodology, behavioral modeling, and interfacing of the proposed GA IP core are presented in detail Design Methodology The entire behavior of the proposed GA core was modeled in VHDL and simulated to test its correctness. A Register Transfer (RT) level VH DL model of the GA core was synthesized from the behavioral model using an in house High Level Synthesis tool called Automatic Design Instantiation (AUDI). The RT level VHDL model was simulated thoroughly to test the correctness of the synthesized netlist. A gate level Verilog model was then synthesized from the RT level model using in house flattening scrip ts and the Berkeley SIS tool [66 ]. The gate level Verilog model uses simple Boolean gates such as NAND, NOR, AND, OR, XOR, and SCAN_REGISTER. The gate l evel Verilog model was also simulated using Cadence NC Launch to verify the functionality and the timing. This design methodology ensures that the RT level and the gate level netlists are completely synthesizable by standard synthesis tools such as the Xil inx ISE tool. Thus, a synthesizable Genetic Algorithm FPGA IP core is available to the user at two levels of design abstraction, namely RT level and gate level. Behavioral Modeling The proposed GA core implements the GA optimization cycle shown in Figur e 4. 1. An initial population of randomly generated i ndividuals is formed. A 16 bit cellular a utomaton based Random Number Generator, sim ilar to the implementation in [41 ], is used to generate all the required random numbers. In each generation, a new popul ation of candidate solutions is generated using crossover and mutation operators. Elitism is provided by copying the best individual in the current generation into the next generation.

PAGE 85

74 Figure 4.1. High level view of the implemented GA optimization cycle Parent Selection The parent individuals required for crossover are selected from the current population using the P roportionate Selection scheme [3 ]. A threshold value is computed by scaling down the su m of the fitnesses of all the individuals in the cu rrent population using a random number. A cumulative sum of the fitnesses of the individuals in the new population is computed and c ompared to the threshold value. The individual whose fitness causes the cumulative sum to exceed the thresho ld is chosen as the parent. This ensures that highly fit individuals have a selection probability that is proportional to their fitness. To speed up computations, the sum of the fitnessses of the new population i s accumulated when the fitness is computed. Sing le Point Binary Crossover The GA core implements the single point binary crossover technique [3 ], illustrated in Figure 2 2, to combine parents from the current

PAGE 86

75 generation and produce offspring for the next generation. The GA core performs crossover only if the random number generated is less than the specified crossover threshold. Since the 4 bit crossover threshold is user programmable, the user can control the crossover rate in the GA core. The single point crossover is implemented by using a bit mask vector that generates the first portion of the offspring from the first parent and the other portion of the offspring from the second parent. A random number n is generated to denote the random with 1s from position 0 to n 1 and 0s after n obtain the first part of the offspring. The mask is then logically inverted and ANDed with d part of the offspring. Mutation Mutation is performed after crossover in the proposed GA core. The GA generates a 4 bit random number and compares it with the selected mutation threshold to decide if mutation should be performed. If the random number is smaller than the mutation threshold a random bit mutation is performed A randomly chosen mutation point dictates the appropriate bit mask to be used in an XOR operation with the candidate solution. This X OR operation essentially flips the bit at the m utation point. The fitness of the resultant offspring is then computed using a sim ple two way handshaking communication between the GA core and the fitness evaluation module T he candidate and its fitness are then stored in the GA memory as part of the new population. The above cycle of parent selection, crossover, and mutation is repeated in every generation until the new population is completely filled. The GA optimization ends w hen the generation index is equal to the user programmed number of generat ions. Then, the GA core exits the optimization cycle and outputs the best individual found.

PAGE 87

76 Table 4.2. Port interface of the proposed GA core No. Port Input/ Output Width in bits Functionality 1 reset I 1 System reset 2 sys_clock I 1 System clock 3 ga_l oad I 1 Load GA parameters 4 index I 3 Index of GA parameter 5 data_valid I 1 Initialization Handshake signal 6 data_ack O 1 Initialization Handshake signal 7 fit_value I 16 Fitness value bus 8 fit_request O 1 Fitness request signal 9 fit_valid I 1 F itness value validity signal 10 candidate O 16 Candidate solution bus 11 mem_address O 8 GA Memory address 12 mem_data_out O 32 Data to GA memory 13 mem_wr O 1 GA Memory Write signal 14 mem_data_in I 32 Data from GA memory 15 start_GA I 1 GA Start si gnal 16 GA_done I 1 GA completion signal 17 test I 1 Scan chain Test signal 18 scanin I 1 Scan chain input 19 scanout O 1 Scan chain output 20 preset I 2 Preset Mode Selector 21 rn I 16 Random number Programmable GA parameters The proposed GA cor e has a port interface as shown in Table 4 .2 The performance and runtime of a genetic algorithm depend on GA parameters namely population size, number of generations, crossover rate, and mutation rate. Large values for the population size and the number of generations generally yield the best results at the expense of long runtimes. But if the target app lication is simple, a few generations and a small population size may suffice to find the best solution.

PAGE 88

77 Table 4.3 ammable parameters Index Programmable Parameter 0 Number of Generations [15:0] 1 Number of Generations [31:16] 2 Population Size 3 Crossover Rate 4 Mutation Rate 5 RNG Seed An efficient GA core implementation must h ave the ability to cater to the n eeds of the individual applications by allowing the user to change these parameters according to the application. The crossover and mutation rate s that produce the best results in the shortest amount of time also vary with the application. Hence the propo sed GA core also has the capability to program both the crossover and mutation rates. The quality of the random numbers generated for the execution of the genetic operators also has an impact on the performance of the GA. The proposed core allows the user to program the initial seed of the Random Number Generator ( RNG ) which enables the user to obtain different sequences of random numbers using the same RNG module All the programmable parameters of the GA core must be initialized before it can be used. In itialization of the GA core is done using a simple two way handshake The user first asserts the init_GA signal to put the GA core in the initialization mode. Then, all the programmable parameters can be initialized using the handshaking process described below. Each programmable parameter has an index associated with it as shown in Table 4.3 The user places the value of the programmable parameter on the fit value bus and the corresponding index value on the index bus. The user then asserts the data_ valid signal. The GA core reads the fit value bus, decodes the inde x and stores the value in the appropriate register. The GA core then asserts the data ack signal and waits for data valid to be de asserted before de asserting dat a_ ack

PAGE 89

78 Interfacing details of t he GA core The overall GA optimizer consists of three modules namely GA core, GA memory and a random number generator (RNG). T he GA core communicates with a fitness evaluation m odule and the actual application using simple two way handshaking protocol A typical system with the communication between all these modules is shown in Figure 4. 2 The GA memory module is a single port memory module that store s both the individuals and the ir fitnesses. To store an individual and its fitness, the GA core places the memory contents on the memory bus and asserts the memory write signal. To read an individual and its fitness, the GA core places the memory address on the address bus and reads the memory contents in the next clock cycle. The RNG module is implement e d using a cellular automaton. It is to be noted that the operation of the GA core is independent of the RNG implementation. The initial seed of the RNG module can be provid ed by the user. One of three preset initial seeds can also be selected in the PRESET mode. The GA core reads the output register of the RNG module when it needs a random number. Based on the number of random bits needed, the GA selects the bits from pre defined positions. The GA core uses a simple two way handshaking protocol for its comm unication with the f itne ss evaluation m odule. When the GA core requires the fitness of a candidate solution, it places the individual on the candidate bus and then asserts the fit request signal. The FEM module of the target application should then read th e candidat e port and compute the fitness of the individual. The computed fitness is then placed on the fit value port of the GA core and the fit valid signal is asserted by the target application. On assertion of the fit valid signal, the GA core reads the fitness value and de asserts the fit reques t signal. The simplicity of the interfacing protocol is a major advantage to the user as it reduces timing issues during implementation of the entire application.

PAGE 90

79 Figure 4.2 Typical system showing the communi cation between the different modules and the GA core (signal numbers are in reference to Table 4.2 ) Usage details of the GA core The Genetic Algorithm core starts its optimization cycle when it receives the start GA pulse from the target application. If the programmable parameters of the GA have been initialized it uses these values Otherwise, the GA core can use one of the three preset modes. During its optimization cycle, the GA core requests fitness computations for candidate individuals from the fi tness evaluation module using the handshaking protocol described. Once the GA core has computed the best candidate, it is place d on the candidate bus and the GA done signal is asserted 4.2.2 Design Considerations for ASIC Implementation and Space Applica tions The proposed GA core is well suited for ASIC development as it is available as a Verilog gate level netlist and also has three preset modes and a scan chain built into the design

PAGE 91

80 Table 4.4. Preset modes available in the proposed GA core Mode Pop size No. of generations Thresholds crossover mutation User 00 < 256 < 2 32 0 15 0 15 Preset 01 32 512 12 1 10 64 1024 13 2 11 128 4096 14 3 Preset Modes The proposed GA core has three preset modes as shown in Table 4.4 The values for the pr ogrammable GA parameters in the preset modes have been set so that the GA core can be used for a varied set of applications without compromising on performance or runtime. The user can select any one of the three different preset modes based on the target application. When the 2 core uses the user programmed values for all the programmable parameters. Scan Chain Testing The proposed GA core has a scan chain connecting all the registers used in the design. A s can chain test can be run on the core by asserting the test signal and feeding the user test pattern in the scanin port. The output of the scan chain can be observed on the scanou t port. This scan chain can also be connected to a top level scan chain in a system level design. The increasing number of remote space missions has necessitated autonomous space crafts that are capable of handling unexpected situations and adapting to new environments [ 67 ]. This requires deployment of intrinsically evolvable hard ware whose adaptation and re configuration are controlled by on board evolutionary algorithms. In [67 ], Stoica et al identif y the following characteristics as the most critical in space oriented EHW: Systems Approach to EHW design The EHW system only he lps to reconfigure and adapt the higher level application to the changing environment.

PAGE 92

81 Flexibility in F itness C alculation The means of computing and the context of the fitness of a candidate solution need to be considered. Response Time Most of the s pace applications are time critical applications that must adapt to the changing environment quickly before serious damage is done to the system and/or the mission itself. Safety Space systems are very expensive systems that are highly sensitive to even small errors and/or environmental changes. Hence, safety of the space systems is the most critical characteristic as they can be permanently lost or damaged with the slightest of problems. The proposed GA core addresses these issues in the following ways : Design of the GA core as a drop in IP module and its capabil ity to be integrated at various design levels enables a systems approach to EHW design. The proposed GA core supports external fitness functions. By providing the user the ability to program the number of generations according to the criticality of the application, the runtime of the GA can be controlled. Moreover, the best candidate of every generation is always output to the application to use in case of an emergency.

PAGE 93

82 Figure 4. 3 Implem entation of a hybrid Intrinsic EHW system (with internal and external fitness modules) using the proposed GA core Implementation of a Hybrid Intrinsic EHW system The proposed GA core can be used to implement a scalable hybrid intrinsic EHW system as show n in Figure 4.3. The GA core enables the user to multiplex between an internal fitness function and an external fitness function. The fitness value (shown in bold in Figure 4.3) and the handshaking signals are available to the external fitness function mod ule, which may be on a different chip/board and can be added on later by the user to expand the functionality of the system. The external fitness function can be housed on a reconfigurable fabric such as an FPGA if more external fitness functions are to be supported.

PAGE 94

83 4.3 Experimental Results The GA core was simulated and tested thoroughly at each level of design abstraction (behavioral, RT level, and gate level) Moreover, an analogous GA optimization cycle was implemented in software (C programming langu age) to compute the speed up obtained by the FPGA implementation. This section will discuss in detail the various experiments conducted at each design level and the results obtained from simulation and hardware execution runs. 4.3.1 RT level S imulations At the RT level, the GA core was simulated using Cadence NC Launch to verify the functionality. The effectiveness of the GA core was tested by optimi zing three maximization test function s shown below using various parameter settings All the experiments use a chromosome length of 16. Hence all the single variable experiments have an X variable range of 0 to 65535 and the two variable experiments have equal ranges (0 to 255). Test Function #1 (Binary F6) This is a very difficult te st function that has numerous local maxima as can be seen in Figure 4.4 and has exactly one global maxima with a value of 4271 when x = 65522. This is a standard test function used to test the effectiveness of genetic algorithms and other optimization algo rithms [5].

PAGE 95

84 Figure 4.4. ( Zoomed in) Plot of the modified Binary F6 [5] test function Test Function #2 This is a simple minimax test function that has to maximize one variable ( x ) and minimize the other variable ( y ) to obtai n the optimal objective function value of 3060. Test Function #3 This is a simple maximax test function that has to maximize both the variables ( x and y ) to obtain the optimal objective function value of 3060. Table 4.5 summar izes the best results obtained for the three test functions under various in average fitness between the current generation and next generation is less than 5%. It can be clearly seen that the proposed GA core finds the optimal values for all the test functions. But the optimum is found only for certain parameter settings underlining the need for programmability of the random numbers used by the GA play a vital

PAGE 96

85 role in determining the performance. For instance, in run #1 from Table 4.5, the GA core converges prematurely to a local optimum. But when the RNG seed is changed from 45890 to 10593 (Run#3), the convergence of the GA is better and the global optimum is found under the exact same settings for the other parameters. Table 4.5. RT level simulation results obtained for the three test functions (BF6, F1, and F2) under various GA parameter settings Test Function Ru n Number Initial RNG Seed Population Size Cross over Threshold Best Fitness Convergence (gen. num.) Value Generation BF6 1 45890 32 10 4047 1 8 2 45890 64 10 4271 14 30 3 10593 32 10 4271 3 16 4 1567 32 10 4146 2 26 5 1567 32 12 4047 2 10 F 2 6 45890 32 10 3060 15 1 8 7 45890 64 10 2096 1 10 F3 8 10593 64 10 3060 10 26 9 10593 32 12 3060 5 12 1 0 1567 32 10 3060 16 2 0 The convergence plots for the three test functions under different parameter settings are shown in Figures 4.5 through 4.11 In these plots, the X axis plots the generation number and the Y P ( i j ) is a population member in generation i j For the sake of clarity, the plots show only one of mu ltiple members with the same fitness in any generation. Hence, as the population converges to the best few candidates in the latter generations, the number of points reduces. Although many inferior members are present in the initial random population, the final generations contain highly fit individuals and very few inferior individuals (due to mutation).

PAGE 97

86 Figure 4. 5 Convergence plot for the BF6 test function using number of generations=32 Run #3 of Table 4.5 Figure 4.6 Convergence plot for the BF6 test function with initial seed for RNG=1567 Run #4 of Table 4.5

PAGE 98

87 Figure 4.7 Convergence plot for the BF6 test function with crossover rate=0.75 Run #5 of Table 4.5 Figure 4. 8 Convergence plot for the test function F2 with population size=32 Run #6 of Table 4.5

PAGE 99

88 Fig ure 4 9 Convergence plot for the test function F2 with population size=64 Run #7 of Ta ble 4.5 Fig ure 4 1 0 Convergence plot for the test function F3 with initial seed for RNG=10593 Run #9 of Table 4.5

PAGE 100

89 Fig ure 4.11 Convergence plot for the test function F3 with initial seed for RNG=1567 Run #10 of Table 4.5 Figure s 4. 8 and 4. 9 show the convergence results for test function #2, the mini max objective function. Figures 4. 1 0 and 4. 1 1 show the convergence results for test function #3, the maxi max objective function. Figure s 4. 8 t hrough 4. 11 show that a small population size and fewer generations are sufficient to solve simple problems. The convergence characteristics for the F6 test function in Figures 4. 5 through 4. 7 show that finding the optimal parameter settings for a difficult problem is non trivial and that the initial seed for the RNG module is an important factor in GA convergence. For instance, changing the initial seed for the RNG module from 45890 in run #1 to 10593 in run #3 (while using the same values fo r all the other programmable parameters) improved the best solution found for the BF6 test function by about 5.5%, while for test function F2 the optimal result was found quickly with the initial RNG seed set at 45890. Thus it is clear that the optimal GA parameter settings differ widely from function to function re iterating the need for a customizable GA core.

PAGE 101

90 4.3.2 FPGA I mplementation R esults The gate level Verilog netlist of the GA core was synthesized using the Xilinx ISE 10 .1i tool and mapped to a Vi rtex2Pro (xc2vp30 ff896, speed grade 7) device Table 4.6 shows the area utilization clock speed and block memory utilization for the placed and routed GA core on t his Xilinx device Block memory utilization is reported as it is not included in the logi c utilization computation. It is to be noted that the dedicated block memory in the Xilinx Virtex II Pro device implements both the GA memory module and the lookup based fitness evaluation module. The post place and route simulation model for the designed GA IP core was extracted and simulated using ModelSim to verify the functionality and timing. The design was then downloaded on to the FPGA device and its functionality was verified using Chipscope Pro 10.1i tools. Table 4.6. Post place and route statisti cs for the proposed GA core on Virtex II Pro device (xc2vp30 7ff896) Design Attribute Value Logic Utilization (% Slices Used) 1 3 % Clock Period 50MHz Block Memory Utilization (GA Memory) 1% Block Memory Utilization (Fitness Lookup Module) 48% The eff ectiveness of the GA core was then tested by optimizing the three difficult maximization test functions shown below. The RT level simulations used simple mini max and maxi max functions, and a difficult optimization test function. These simulations verifie d the functionality and the convergence characteristics of the proposed GA core. For the FPGA experiments, more complex test functions have been used to test the effectiveness of the GA core. These functions have been modified to enable easy hardware impl ementation.

PAGE 102

91 Modified and Scaled Binary F6 This function is a modified and scaled version of the maximization test function from Haupt and Haupt [5 ]. It has a single globally optimal solution at x = 65521 with a value = 8183. Modified Binary F7 This function is a modified version of the minimization function in [5 ]. It has been modified into a maximization function that has a single optimal solution with a value=63904 at x=247 and y=249 Modifi ed 2D Shubert Function This function is a minimiz ation function (derived from [50 ]) modified into a maximization function with a global optimal value=65535. The function has 48 global optimal solutions and numerous local max ima. The experimental setup is similar to the one shown in Figure 4. 2 The Xilinx ISE 10.1i tool achieved a clock speed of 50MHz for the GA module (GA core, RNG module, and the GA memory). The initialization module and the application (fitness) module are separate entities that communicate with the GA module using handshaking. The Xilinx ISE tool was able to achieve a

PAGE 103

92 clock speed of 200MHz for these modules. A Digital Clock Manager (DCM) core is used to generate the two clocks from the on board 100MHz cloc k. The initialization module consists of a simple Finite State Machine (FSM) to perform the two data valid data ack various GA parameters one by one. The application module contains a si mple FSM to perform the two way handshaking with the GA core and the hardware implementation of the fitness function. A lookup based implementation has been used for the fitness functions as this resulted in better operational speed than a combinational im plementation. In the lookup based fitness computation method, block ROMs within the FPGA device are populated with the fitness values corresponding to each solution encoding. The approach is used only to demonstrate the effectiveness of the proposed GA IP core in optimizing difficult maximization test functions without having to implement the actual test functions in hardware. The entire experimental setup was implemented on the Xilinx Virtex2Pro (xc2vp30 7ff896) FPGA device. Chipscope Pro 10.1 tools were used to build cores to observe and record the The proposed GA core was run with 12 different parameter settings as shown in Table 4.7 through Table 4.9. It is to be noted that muta tion rate and number of generations were set to 0.0625 and 64 respectively for all the experiments. The number of generations was set to 64 as the population converged within 64 generations for all three fitness functions. Table 4.7 tabulate s the results o btained by the GA core for the mBF6_2 test function using different settings for the programmable GA parameters. In the experiments conducted, the best solution found by the proposed GA core for the mBF6_2 test function was 65345. This solution evaluates t o a fitness of 8135 which is approximately 0.59% lesser than the globally optimal fitness value of 8183. It is to be noted that the best solution found by the proposed GA core lies within approximately 0.27% distance of the globally optimal solution in the solution space.

PAGE 104

93 Table 4.7. Best f itness values obtained by the GA for the mBF6_2 function for different parameter settings (XR = Crossover Rate) RNG_Seed (hexadecimal) PopSize=32 PopSize=64 XR=10 XR=12 XR=10 XR=12 2961 7999 7813 7824 7819 061F 6175 75 78 8134 8129 B342 7612 7497 7612 7719 AAAA 7534 7534 7578 7864 A0A0 8104 7406 8135 8039 FFFF 7291 7623 7847 7669 Table 4.8. Best f itness values obtained by the GA for the mBF 7 function for different parameter settings (XR = Crossover Rate) RNG_Seed (h exadecimal) PopSize=32 PopSize=64 XR=10 XR=12 XR=10 XR=12 2961 56835 56835 48135 56456 061F 59648 53432 59648 60656 B342 55000 59928 59480 57184 AAAA 55560 52704 55000 61496 A0A0 58136 53040 58024 56624 FFFF 60880 61384 56344 60768 Table 4.8 tabu late s the results obtained by the GA core for the mBF7_ 2 test function using different settings for the programmable GA parameters. The best candidate found by the proposed GA core for the mBF7_2 test function was 65516. The corresponding solution is y =(FF ) 16 and x =(EC) 16 with a fitness of 61496. This is approximately 3.7% lesser than the globally optimal fitness value of 63904. The best solution found by the proposed GA core lies within 4.3% and 2.35% distance of the globally optimal solution along the x d irection and y direction respectively of the solution space. Table 4.9 tabulate s the results obtained by the GA core for the mShubert2D test function using different settings for the programmable GA parameters. The proposed GA core found more than one gl obally optimal solution for many different parameter settings as seen in Table 4.9 The GA core found two different globally optimal solutions, (x 1 =C2, y 1 =4A) and (x 2 =DB,y 2 =4A), during the experimental run with RNG seed=(AAAA) 16 population size=64, and cr ossover threshold=10.

PAGE 105

94 Table 4.9. Best f itness value s obtained by the GA for the Shubert function for different parameter settings (XR = Crossover Rate) RNG_Seed (hexadecimal) PopSize=32 PopSize=64 XR=10 XR=12 XR=10 XR=12 2961 56835 56835 48135 56835 061F 56835 55095 65535 58227 B342 56487 56487 54051 63795 AAAA 63795 56487 65535 65535 A0A0 56835 63795 65535 53355 FFFF 53355 65535 48135 56835 Figures 4. 1 2 through 4. 1 5 plot the data collected from the hardware runs and illustrate the convergence o f the GA optimizer for the three test functions. Both the best fitness and average fitness values are plotted for every generation. It can be seen that the GA core finds the best soluti on within the first 2 0 generations for all three test functions. From F igures 4. 1 2 and 4. 1 3 it can be seen that the GA core evaluates at most ({10 generations + 1 initial population = 11} x {population size =64}) 704 candidate solutions before finding the best solution. Although the size of the entire solution space is only 65536, the GA core evaluates less than 1.1% of the solution space before finding the best solution. This is a major speedup over an exhaustive search and is very important for real time applications and other applications that have time consuming fitness e valuation procedures such as EHW.

PAGE 106

95 Figure 4.1 2 Convergence plot for the test function mBF6_2(x) with initial RNG seed=(061F) 16 crossover threshold=10, and popSize=64 (data collected from hardware execution) Figure 4 1 3 Convergence plot for the test function mBF6_2(x) with initial RNG seed=(A0A0) 16 crossover threshold=10, and popSize=64 (data collected from hardware execution)

PAGE 107

96 Figure 4.14 Convergence plot for the test function mBF7_2(x,y) with initial RNG seed=(AAAA) 16 crossover threshold=12, and popSize=64 (data collected from hardware execution) Figure 4.1 5 Convergence plot for the test function mShubert2D(x1,x2) with initial RNG seed=(AAAA) 16 crossover threshold=10, and popSize=64 (data collected from hardware execution)

PAGE 108

97 From Figure 4 1 4 it can be seen that the GA core evaluates at most ({18 generations + 1 initial population = 19} x {population size =64}) 1216 candidate solutions before finding the best solution for the mBF7_2 test function. From Figure 4. 15 it can be seen that the GA core evaluates at most ({12 generations + 1 initial population = 13} x {population size =64}) 832 candidate solutions before finding the best solution for the mShubert2D test function Thus, it can be seen that the GA core quickly converges to a good s olution after evaluating a small fraction of the solution space even for difficult test functions. It is expected for the GA to find good solutions quickly due to its population based search mechanism. The GA then converges towards the good solutions and t ries to find better solutions in their vicinity. However, it has to be noted that the GA core finds the optimal solutions only for certain settings of the GA parameters. This re iterates the necessity for the ability to change the values of these parameter s according to the application at hand 4.3.3 Runtime Comparison with Software I mplementation A software implementation of a GA optimizer, similar to the GA optimization algorithm in the IP core, was developed in the C programming language. Genetic algori thms, when used for hardware applications such as EHW, need to communicate with the application to evaluate the fitness of the candidate solutions. This communication overhead can be effectively modeled using the Xilinx Virtex2Pro board as it contains Powe rPC processor IP cores that can execute software programs. The experimental setup consists of the GA software running on the PowerPC processor in the Xilinx Virtex2Pro board and the fitness evaluation module (implemented as the same lookup table using bloc k RAM) on the Xilinx Virtex2Pro FPGA. This setup gives a fair comparison between the software and hardware implementations as both are implemented using the same technology node. The runtime, averaged over 6 runs, for the GA program for a

PAGE 109

98 population size o f 32, crossover rate of 0.625, mutation rate of 0.0625, and running for 32 generations to optimize the modified binary F6 ( mBF6_2 ) function was 37.615 milliseconds. In hardware, a 32 bit counter was implemented that is clocked using the 50MHz clock used f or the GA IP core. The GA execution time on the hardware was computed as the product of the counter value and the clock period. The hardware GA implementation achieved a speedup of approximately 5.16x over the software implementation. 4 4 Summary With the rapid increase in integration technology, entire systems are now implemented on a single chip. Many systems require a stochastic optimization engine and it is highly desirable that the optimization engine is also implemented in hardware. In this work, a r eadily synthesizable, robust genetic algorithm core for FPGAs has been designed that is easy to interface and use with a wide range of applications. The efficiency of the designed core is illustrated by the low area utilization and the high clock speed. Th e effectiveness of the designed GA core is evident from the convergence characteristics obtained for the standard test function. The gate level V erilog implementation of the GA core is advantageous in that it can be directly used by commercial layout tools such as Cadence First Encounter for chip layout generation. The availability of preset modes and scan chain testability provides some basic fault tolerance to an ASIC designed using the proposed GA design.

PAGE 110

99 CHAPTER 5 A DIGITAL FRAMEWORK FO R EVOLU TION ARY DESIGN AUTONOMOUS MONITORING AND PERFO RMANCE COMPENSATION OF EXTREME ENVIRONME NT ELECTRONICS T he functionality and performance of analog electronics degrades with temperature variations and the presence of radiation in the operating environment. Extreme environment electronics are necessary in many applications including automotive, geo thermal, oil well drilling, and space applications. The degradation of operational characteristics such as amplitude gain or slew rate of analog components can af fect the operation of the entire system severely. Identifying and compensating the performance degradation of these analog electronics is essential to guarantee the proper functionality of the entire system. Moreover, many of these systems operate in inacc essible environments. Hence, failure of these electronic components may have disastrous consequences and lead to loss of the entire system. A simple method to avoid performance degradation is to place the temperature sensitive electronics in protective en closures that provide a controlled operating environment regardless of the surroundings. But such protective enclosures are bulky, consume a lot of power, and might fail themselves. An alternative method is to allow the performance degradation to occur bu t monitor the performance characteristics and compensate when the performance degradation exceeds specified limits. Without the volume and weight of the protective cover, or the power needed for thermal control, electronics can be placed close to sensors a nd actuators in miniature probes, sensing Text page 1 sample: 1 inch top margin, 2 3 hard returns before title. Ar Left justify text

PAGE 111

100 arrays, and smart structures This offer s a great increase in the electronic processing capability of the higher level systems, and result s in unprecedented accessibility (information and control) of extreme enviro nments encountered in human and robotic space exploration. In this work a digital system is proposed for autonomous monitoring and compensation of the performance and functionality of an analog system. The proposed system continuously monitors the perfor mance characteristics of an analog system and compensates any significant performance degradation using either a model based compensation technique or a genetic algorithm based compensation technique. The proposed digital system has been tested at both the RT m rad hard SOI V technology. The novel contributions of the proposed digital system include: A digital framework for evolutionary design, autonomous monitori ng and compensation of analog electronics, Two methods of performance compensation Genetic Algorithm (GA) based compensation and Model Lookup Table based compensation, Parameterizable hardware GA module for GA based compensation, Support for external f itness function modules to provide scalability for an intrinsic EHW system, Module level i solation and b y passing schemes that provide basic testability and fault tolerance, and Simple e xternal i nterfaces to ensure ease of integration and ease of use. 5.1 Related Work There have been previous design techniques for compensation of extreme environment effects, specifically extreme temperature effects. A survey of compensation design techniques for extreme temperature electronics has been presented in [ 7 3] whe re these techniques have been

PAGE 112

101 classified into compensation design techniques and technological design techniques. Most of the previous compensation schemes are specific to the target performance characteristic. In [ 7 4], current mirrors were proposed for leakage current compensation. In [ 7 5], input stage compensation schemes were proposed for operational amplifiers (op amps). Supply voltage division schemes were proposed in [ 7 6] for compensation of middle gain stages in op amps. But all the compensation s chemes proposed still face temperature limits beyond which their operation is degraded. Hence, an adaptive solution is necessary that can modify the necessary system parameters and recover any loss in performance. A simple adaptive solution may assume that the changes in the environmental conditions will cause a sub set of a list of operational changes in the system and that the list of possible operational changes can be well specified during the design phase. Different solutions can then be created for th ese specific conditions, all of which are available at runtime and applied as the conditions are changed. A compensation system can then switch between those predefined solutions. However, even if different temperatures are known in advance and the suitabl e design solutions created, the real time switching of the design may not be trivial. Also, depending on the application in hand, it is not always possible to specify the temperature conditions at design time. To adapt to changing temperature conditions at runtime involves creating a new design solution tailored to the changing conditions. One possible runtime adaptive solution is to apply an evolutionary algorithm to search for the new solution. Stoica [ 7 8] proposed Field Programmable Transistor Arrays, a hardware concept for reconfiguration at the transistor level. Both digital and analog circuits can be mapped on to the FPTA architecture. Performance recovery of circuits including multipliers, Gaussian shape curve generators, filters and logic gates were successfully demonstrated [ 79][8 0]. In the initial experiments, the circuits were subject to specific high temperatures and compensation was manually triggered. The performance recovery algorithms were implemented in software and were

PAGE 113

102 run on a PC. Further experiments by Zebulum et al [8 1] revealed that only filters with modest roll off characteristics could be evolved or recovered by FPTAs. They proposed a Reconfigurable Analog Array (RAA) architecture that uses Gm C filters to implement a wider range of a nalog filters even at extreme temperatures. The Gm C filters are built using Wide Range Trans conductance Amplifiers (WRTA) [8 2]. They tested the behavior of the WRTA for a temperature range of 180 C to +120 C. They also evolved a low pass filter using the RAA and were able to recover its functionality for the same temperature range. The functionality recovery was done by manually tuning the bias voltages of the component WRTAs of the low pass filter. Later, a hill climbing algorithm was implemented on an F PGA to recover the performance of a low pass filter using the RAA [8 3]. Zebulum et al [8 4] later proposed a Self Reconfigurable Analog Array (SRAA) architecture that used multiple building blocks to build different analog circuits including Pulse Width Mo dulator, Power Switch Control, Shaft Encoder, and Instrumentation Amplifier. They identified a list of essential building blocks such as Op Amps, Low offset Op Amps, High Voltage Op Amp, Current Source, Comparator, and High speed Comparator by analyzing pr evious implementations of the analog circuits. The building block analog cells were replicated in four rows to form a 4 x 6 array of functional analog cells. An extra copy of each type of the analog cell is also provided to serve as a reference analog cell This enables online monitoring and compensation of the reference analog cell while the functional analog cells continue their operation. The functionality of the analog cells can be programmed by setting their bias voltages using configuration Digital A nalog C onverters (DACs). A switchbox array is used to interconnect the different analog cells and to provide external access to the analog cells. The compensated bias voltage values are applied to the analog cells using the bias voltage configuration DACs of the analog cells.

PAGE 114

103 In all the previous e fforts user interaction is need ed f or one or more of the following: to evolve the required analog circuit using the field programmable analog chips, to monitor the performance of the evolved analog circuits and identify any significant performance degradation, to recover the performance or functionality of the evolved circuit under changing conditions. In this work a digital framework is proposed that interacts with a field programmable analog array to evolve cu stom analog circuits, autonomously monitor their performance and automatically compensate any performance degradation using either a model based lookup technique or a genetic algorithm based technique. Thus, the proposed system removes the need for any use r interaction with the entire system. The monitoring and compensation system has been implemented as a digital ASIC so that the form factor of the entire system is reduced. 5.2 Architecture and Im plementation of Proposed Digital Framework This section pre sents a detailed discussion o f the design methodology and behavioral modeling of the digital framework. It also describes its various feat ures and interfacing details 5.2.1 Overall System Level Architecture and Operation This section will describe the o verall architecture and operation of an entire system that contains the proposed digital system as the performance monitoring and compensation subsystem

PAGE 115

104 Figure 5.1. Block diagram of overall system (including analog and digital systems) The architecture of the top level system containing both the analog electronics and the digital monitoring and compensation system is shown in Figure 5.1 The analog system shown in the figure is a generalization of the reconfigurable analog array architectu res proposed b y Stoica et al [8 3] and Zebulum e t al [8 4]. The analog sub system consist s of: Analog electronic components/cells, arranged in an array fashion, Switching and c onfiguration logic to interconnect the various components in the analog sub system and also to realize different functionalities, and Digital Analog and Analog Digital converters to interface with the digital sub system so that the performance characteristics of the analog cells can be measured. The digital sub system is meant to monitor the perfo rmance of the analog cells in the analog sub system and compensate any significant degradation using the user specified compensation technique. It comprises of:

PAGE 116

105 Monitoring logic to select the analog cells, excite them with appropriate input signals, col lect their response, and calculate the performance degradation, if any, Compensation logic to compensate the degraded performance of the selected analog cell using the user selected technique, Synchronization and Control logic regulates the communicati on between modules and controls the triggering of the monitor and compensate cycles, and Interfacing logic contains the communication interface between the analog sub system and the digital sub system. Figure 5.2 shows the high level operational flow of the entire system. The operation of the proposed system can be divided into three operational modes: INIT mode In the INIT mode, the entire system is reset and all the customizable parameters in the system are programmed. MONITOR mode In this mode, an analog cell is selected by providing its address to the analog sub system. The appropriate analog performance characteristic is measured by sending an excitation signal to the selected analog cell, recording the response and evaluating the digitized r esponse on the digital sub system. COMPENSATE mode The proposed system enters this mode when the digital sub system has identified that the currently selected analog cell suffers from significant performance degradation. The digital sub system uses eithe r the model based lookup technique or the GA based technique for compensation based on the user specification.

PAGE 117

106 Figure 5.2 High l ev el functional view of proposed d igital framework 5.2.2 Detailed Architecture of the Proposed Framework The proposed syste m has been designed with a modular architecture as shown in Figure 5. 3 to enable ease of design, and module level isolation and testing. The proposed system consists of the following modules: Initialization Module Digital Controller Module System Monitor Module GA based Compensation Module Internal fitness evaluation module and Model based lookup table (LUT) m odule. The following section describes in detail the design of each of these modules and the operation of the entire proposed system.

PAGE 118

107 Figure 5.3 Architecture of the proposed digital framework Initialization Module The initialization module contains logic to initialize all the init_digASIC initialization module, all the oth er modules of the digital sub system are reset. The initialization module can then be used to: Program the number of Analog cells to be monitored by the System Monitor module. Load pre characterized correction voltage values into the model based lookup tab le module. Customize the programmable parameters of the GA optimization engine in the GA based compensation module. Customize the programmable parameters of the internal fitness evaluation module.

PAGE 119

108 Figure 5.4. FSM for the d igital f ramework c ontroller m odule Digital Controller Module The Digital Controller module acts as the main controlling unit that synchronizes and triggers the operations of all the other modules in the proposed system. It is a simple state machine, as shown in Figure 5.4 which lo ops the entire digital system in monitoring and compensation cycles. On system reset, the digital ASIC controller module enters an IDLE state. In the IDLE state, the digital ASIC controller waits for the analog cell array to start some operation. Once the analog cells are app_valid This triggers the controller module to start the monitor and compensate cycles for the analog cells. In the START_MONITOR state, the controller triggers the system monitor

PAGE 120

109 start_monitoring After triggering the system monitor module, the controller waits in the IDLE2 state for a compensation request. When it receives a compensation request, the controller triggers the LUT based or GA based compensation technique based on the user programmed comp_type prog_FAC as voltages found candidate COMPENSATION_DONE state for the analog system to complete applying the bias voltages to the selected analog cell. Then the controller returns to the IDLE 2 state while the system monitor module proceeds to the next analog cell to continue its monitoring cycle. System Monitor Module The System Monitor (SM) module interfaces with the analog electronics subsystem and determines if the functionality and per formance of the system is acceptable. The System Monitor module has also been designed as a finite state machine as shown in Figure 5.5 On system reset, the system monitor module goes into the INIT start _monitoring the system monitor module starts monitoring the analog cells for performance degradation. In the MONITOR_MODE state, the system monitor signals to the entire system that the digital system is entering the monitoring cycle b monitor mode RAC_ID RAC_ID_valid state, the system monitor module waits for t he fitness evaluation module to compute the fit_valid fit fit

PAGE 121

110 is de asserted, it implies that the selected analog cell suffers from significant performance degradation and needs compensation. The system monitor notifies the controller module about the need for compensation in the COMPENSATE state by asserting the compensate s the entire system that the system is in the compensation phase by de monitor_mode compensation phase to complete in the WAIT_FOR_COMP_DONE state. When the compensation_done d, the system monitor enters the monitoring phase monitor_mode compensate fit analog cell has an acceptable per formance and does not need compensation. If compensation is not required for the current analog cell, the system monitor module proceeds to the next analog cell in the MONITOR_NEXT_RAC state and continues the monitoring cycle.

PAGE 122

111 Figure 5.5 FSM for the sy stem m onitor module GA modul e The proposed system has a Genetic Algorithm (GA) module to perform GA based compensation. The GA module consists of three sub modules, namely, the GA optimizer core, the GA memory, and the random number generator (RNG).

PAGE 123

112 Figure 5.6 Chromosome encoding us ed by the GA based compensation algorithm The GA optimizer in the GA module is built using the cust omizable GA core described in [8 1] which follows the optimization cycle shown in Figure 4.1 A 16 bit binary string is use d to encode the chromosome used by the GA optimizer. This binary string is the concatenation of two voltage values that serve as the bias voltages for the selected analog cell as shown in Figure 5.6 In Zebul um et al [8 4], these digital values are applied to the configuration DACs of the selected analog cells to obtain the required functionality. A 16 bit chromosom e was chosen as Zebulum et al [8 4] were able to realize all their desired analog building block functionalities by programming the analog cells u sing two 8 bit bias voltages. In each generation, a new population of candidate solutions is generated using crossover and mutation operators. The GA optimizer follows an elitist strategy by preserving the best individual found in ea ch generation, since Ru dolph [8,9 ] proved that elitist GAs have better probability of convergence to the global optimum. The GA module uses the Proportionate Selection scheme to select parents for the crossover operation. This ensures that highly fit individuals have a selection probability that is proportional to their fitness. The GA core implements the single poi nt binary crossover technique [3 ]

PAGE 124

113 illustrated in Figure 4.2 to combine parents from the current generation and produce offspring for the next generation. The GA core p erforms crossover only if the r andom number generated is less than the specified 4 bit crossover threshold. Since the crossover threshold is user programmable, the user can control the crossover rate in the GA core. The GA core uses random bit flipping as its mutation operator and uses a 4 bit mutation threshold. The RNG module implements a cellular automaton based random number generator as described in [ 41 ]. The initial seed of the RNG module can be provided by the user. The GA memory module is a single p ort memory module that can store both the individuals and the fitnesses of the individuals in the population. On system reset, the GA module waits in the IDLE state for either an initialization request or a GA load_GA GA module enters an initialization mode in which the customizable GA parameters such as population size, crossover rate, mutation rate, number of generations and random start_GA signal is asserted, the GA module enters the GA optimization cycle. The GA module uses either the on board or the external fitness evaluation module to compute the fitness of the individuals that it generates during the optimization cycle. When the GA opti mization cycle is done, the GA candidate signal. The GA module de GA_done start_GA de asserted which indicates that the best candidate h as been read. The GA core uses a simple two way handshaking protocol for its communication with the fitness evaluation module When the GA core requires the fitness of a candidate solution, it places the individual on the candidate bus and then asserts the fit_request signal. The FEM module of the target application should then read the candidate port and compute the fitness of the individual. The computed fitness is then placed on the fit_value

PAGE 125

114 port of the GA core and the fit_valid signal is asserted by th e target application. On assertion of the fit_valid signal, the GA core reads the fitness value and de asserts the fit_request signal. The simplicity of the interfacing protocols is a major advantage as there are no real timing issues because of the two wa y handshaking protocols used for communication. Fitness Evaluation Module The proposed digital system uses a fitness evaluation module to calculate the value of performance characteristics for the analog cells and determine if there exists significant pe rformance degradation. The proposed system supports both internal and external fitness evaluation modules, which improves the scalability of the system. The internal fitness evaluation module implemented in the proposed system computes the slew rate of an analog cell. It compares the computed slew rate to a user programmed threshold value to determine if the analog cell needs compen sation with respect to the slew rate characteristic. The internal fitness evaluation module has a modular architecture and con sists of three modules: FEM_Controller module, Fitness Computation module, and Excitation module. The FEM_Controller module contains the interfacing logic to communicate with the analog sub system and the other modules in the digital sub system. It has be en implemented as a simple state machine as shown in Figure 5.9 On system reset, the FEM_Controller module enters the INIT state and then waits in the IDLE state for a fitness evaluation request from either the system monitor module or the GA module. When fit_req_mon fit_req_ga onchip_fiteval

PAGE 126

115 evaluation is to be done by the internal fitness evaluation module. It then reads the address of the analog cell in the READ_ANALOG_CELL_ID state and triggers the Excitation module and the s lew rate module to evaluate the fitness of the selected analog cell. When fit_valid signal to indicate to the fitness request module that a valid fitness value is available. It then waits fit_req signal is de asserted, the controller returns to the IDLE state and waits for the next fitness evaluation request. Figure 5.7 Slew r ate measurement by the internal fitness e valuati on module of the proposed digital framework The Excitation module is used to send an appropriate input signal to the selected analog cell so that its response can be collected and used to obtain a value for the desired performance characteristic. The excitation module within the internal fitness evaluation module is used to send a step input signal to the selected analog cell. It is implemen ted as a state machine that can be programmed to send out either a rising edge or a falling edge. The start and end voltage levels and the duration of these voltage levels can also be

PAGE 127

116 programmed by the user/application. Figure 5.8 shows a rising edge excit ation input being sent to the analog cell and the response expected from the analog cell. Figure 5. 8 FSM of the FEM_Controller module in the internal fitness e valuation m odule The Fitness Computation module within the internal fitness evaluation modul e is used to compute the output slew rate of the selected analog cell. The output slew rate is a signal to change from 10% of its maximum value to 90% of its maximum v alue is used as

PAGE 128

117 rising edge input signal from the excitation module passes through an Analog Digital converter (ADC) and the digitized response is read by the Fitness m odule. The fitness module decrements the maximum value of an internal counter to measure the time elapsed between the ADC code for the 10% voltage value and the 90% voltage value. The remaining counter value is used as a fitness measure for the slew rate c haracteristic of the selected analog cell. Model Based Look up Table Module The model based look up table module is essentially a memory module that stores the best voltage values for all the analog cells at different temperatures. This lookup table is add ressed by concatenating the analog cell address and the current temperature value. The bias voltage values that produce the best performance for the analog cells at different temperatures can be obtained by pre characterization experiments. Such pre chara cterization experiments can build functional models for the analog cells and find the bias voltage values that obtain the best performance for different temperature settings. These voltage values can then be loaded in to the lookup table during the initial ization phase of the proposed system and read from the module during m odel based compensation. The lookup table in the proposed system stores the best voltage values for eight different analog cells over a temperature range of 180 C to +120 C in 5 C tempe rature increments. 5.3 Digital ASIC Implementation In this section, we will discuss the design methodology and tools used to develop the proposed system as a digital ASIC.

PAGE 129

118 Figure 5. 9 D esign f low for digital ASIC implementation 5.3.1 Design Methodology The design flow used to develop the digital ASIC from the behavioral models is shown in Figure 5.10. The behavior of each of the modules in the digital system was implemented in behavioral VHDL and simulated to test the correct functionality at the behav ioral level. An RT level VHDL description of each module was then obtained from the behavioral VHDL description using an in house High Level Synthesis tool called Automatic Design Instantiation (AUDI) tool. The RT level description is built using a set of simple components. The RT level VHDL

PAGE 130

119 description was simulated thoroughly using Cadence NC Launch tools to test the functionality of the GA core. A gate level Verilog description was generated for each of the modules from the RT level VHDL description usin g in house flattening scrip ts and the Berkeley SIS tool [66 ]. The gate level Verilog description uses simple gates such as NAND, NOR, AND, OR, XOR, and SCAN_REGISTER. The gate level Verilog description was also simulated using Cadence NC Launch to verify t he functionality and the timing details of all the modules. The functionalities of the RT level and gate level systems were also tested by developing and testing a top level design. This design methodology ensures that the entire system is available to the user at two design levels, namely RT level and Gate level. The gate level netlists of each of the modules were used to produce module layouts using the Cadence Silicon Ensemble Standard Cell Place and Route tool. The top level ASIC layout was obtained by performing block placement of all the modules using the Cadence First Encounter tool. The placed and routed layout was then imported into the Cadence Virtuoso tool and various design checks such as DRC, ERC, Antenna check, Soft Power check and LVS were pe rformed on the layout using the Mentor Graphics Calibre tool. The digital ASIC passed all the tests and was fabricated using a commercial SOI based rad hard technology. 5.4 Simulations and Results The digital ASIC was extensively tested at multiple design levels, namely, behavioral level, RT level, ga te level, and layout level. The functionality of the individual modules was tested at the behavioral level using Cadence NC launch simulations. The RT L descriptions of the individual modules and the top level design were simulated for correctness using Cadence NC launch tools. Finally, the gate level Verilog descriptions of the individual modules and top level design were simulated and verified using ModelSim. The gate level netlists of the individual modules w ere then used t o produce the module layouts using Cadence Silicon Ensemble The

PAGE 131

120 individual layouts were then placed and routed to form the entire ASIC layout using Cadence SOC Encounter tools. 5.4.1 Layout level S imulations Spice netlists were extracted for each of the individual modules and the top level design. These netlists were then simulated using HSpice for simple test cases as complete tests are exorbitantly time consuming. Simulation snapshots for the individual modules are shown in Figure 5.1 0 t hrough Figure 5.13 Figure 5.1 0 Layout level simulation of the digital framework controller module

PAGE 132

121 Figure 5.11 Layout level simulation of the system monitor module (in a monitoring cycle) Figure 5.12 Layout level simulation of the system m onit or module (in a compensation cycle)

PAGE 133

122 Figure 5.1 3 Layout le vel simulation of the internal fitness evaluation m odu le (containing FEM_Controller, e xcitation and s lew rate modules) 5 5 Summary Evolvable a nalog electronics are useful in many space applicati ons. In this work, a digital framework is proposed for the realization of self reconfigurable electronics that performs autonomous monitoring and compensation of analog electronics operating under extreme environments. The proposed system has been implemen ted as a digital ASIC to reduce the form factor of the overall system and is also scalable with respect to the number of fitness functions

PAGE 134

123 CHAPTER 6 CONCLUSIONS The complexity of designing hardware systems has i ncreased significantly with techn ological advances Irrespective of the choice of hardware implementation, technology scaling has affected the design process by increasing the complexity of the designs that can be implemented on a single chip and by introducing new challenges in the form of interconnect, thermal, and reliability issues. Effective optimization techniques are required in all the stages of the hardware design cycle. Ge netic a lgorithms have been shown to be a robust search mechanism in a wide variety of problem domains. In th is dissertation, it has been shown that genetic algorithms can be used successfully to address the optimization needs of the hardware design process Specifically, this dissertation has used genetic algorithm based optimizers to address the following hardw are design problems: Layout optimization of VLSI ASICs A genetic algorithm based multi objective floorplanner has been developed for solving the outline free macro cell based ASIC design problem. The proposed floorplanner outperforms all existing floorpl anners that perform simultaneous optimization of floorplan area and wirelength. Reconfigurable Hardware Design of Optimization Applications A customizable FPGA IP core of a general purpose genetic algorithm has been developed to alleviate the design of h ardware applications that need an effective optimization engine. Design, Monitoring and Performance Compensation of Extreme Environment Electronics A digital framework has been developed for the evolutionary design of analog Text page 1 sample: 1 inch top margin, 2 3 hard returns before title. Left justify text

PAGE 135

124 electronics. The proposed di gital framework also performs autonomous monitoring and automatic performance compensation of the evolved analog electronics when operating in extreme environments.

PAGE 136

125 REFERENCES [1] International Technology Roadmap for Semiconductors. 2007 [2] J. H. Holland Adaptation in Natural and Artificial Systems University of Michigan Press, Ann Arbor 1975 [3] David E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning 1989: Addison Wesley Publishing Company, Inc. [4] Lawrence Davis, Handbook of Ge netic Algorithms 1991. [5] R. Haupt and S. E. Haupt Practical Genetic Algorithms 2ed, John Wiley and Sons, Inc., 2004. [6] J. J. Grefenstett e, R. Gopal, B. J. Rosmaita, and D. Van Gucht Genetic Algorithms for the Traveling Salesman Problem e 1st International Conference on Genetic Algorithms, pp. 160 168, 1985. [7] M. Vellasco R. Zebulum and M. Pacheco Evolutionary Electronics: Automatic Design of Electronic Circuits and Systems by Genetic Algorithms st e dition, CRC Press, 2001. [8] G. Rudolp h Evolutionary Search for Minimal Elements in Partially Ordered Finite Sets Evolutionary Programming, p p 345 353 1998 [9] G. Rudolph On a multi objective evolutionary algorithm and its convergence to the Pareto set IEEE International Conference on Evolutionary Computation, p p. 511 516 1998 [10] Kalyanmoy Deb Multi Objective Optimization using Evolutionary Algorithms 1 st ed ition John Wiley & Sons, Ltd 2001 [11] J. D. Schaffer Multiple Objective Optimiz ation with Vector Evaluated Genetic Algorithms on Genetic Algorithms 1985. [12] N. Srinivas, Multiobjective Optimization Using Nondominated Sorting in Genetic Algorithms IEEE Transactions on Evol utionary Computation, vol: 2, p p 221 248 1994 [13] J. A niched Pareto genetic algorithm for multiobjective optimization Evolutionary Computation p p 82 87 1994

PAGE 137

126 [14] E. Zitzl Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach Evolutionary Computation vol. 3(4): p p 257 271 1999 [15] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan A fast an d elitist multiobjective genetic algorithm: NSGA II Evolutionary Computation vol. 6(2): p p 182 197 2002 [16] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani VLSI module placement based on rectangle packing by the sequence pair IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems vol: 15 (12), p p 1518 1524 1996 [17] Fast evaluation of Sequence Pair in block placement by longest common subsequence computation IEEE Trans actions on Computer Aided Design of Integrated Circuits and Systems vol: 20(12), p p. 1406 1413 2001 [18] G. Pei Ning, C. Chung An O tree representation of non slicing floorplan and its applications Design Automation Conference 1999. [19] J. M. Lin, and Y. TCG: A Transitive Closure Graph Based Representation for Non Slicing Floorplans Design Automation Conference 2001. [20] J. M. Lin, and Y. TCG S: orthogonal coupling of P* admissible representations for general floorplans IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems vol: 23(6), p p 968 980 2 004 [21] S. N. Adya and I. L. Markov, Fixed outline Floorplanning : Enabling Hierarchical Design ", IEEE Trans. on VLSI Systems, vol : 11(6), p p. 1120 1135, December 2003. [22] H. H. Are floorplan representations important in digital design? Proceedings of the 2005 international symposium on physical design p p. 129 136 2005 [23] K. Jae Gon, and K. Yeong A linea r programming based algorithm for floorplanning in VLSI design IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems vol : 22 (5), p p. 584 592 2003 [24] S. Dong, X. Hong, Y. Wu, Y. Lin, and J. Gu Module placement based on quadratic programming and rectangle packing using less flexibility first principle in International S ymposium on Circuits and System v ol: 5 p p. V 61 V 64 2004. [25] J.P. Genetic Placement IEEE Transactions on Computer Aided Design of Integr ated Circuits and Systems vol: 6(6), p p. 956 964 1987 [26] J.P. Cohoon, S. Hegde, W. Martin, and D. Richards Distributed genetic algorithms for the floorplan design problem Computer Aided Design of Integrated Circuits and Systems vo l : 10(4), p p. 483 492 1991 [27] A genetic algorithm for macro cell placement European Design Automation conference, pp. 52 57, 1992.

PAGE 138

127 [28] K. Solving the rectangular packing problem by an adap tive GA based on sequence pair Proceedings of ASPDAC pp. 181 184, 1999. [29] S. An adaptive genetic algorithm for VLSI floorplanning based on sequence pair IEEE International Symposium on Circuits and Systems p p. 65 68 2000 [30] L. Chang Tzu, C. De Sheng, and W. Yi An efficient genetic algorithm for slicing floorplan area optimization International Symposium on Circuits and Systems 2002. [31] C. L. Valenzuela, VLSI placement and area optimiza tion using a genetic algorithm to breed normalized postfix expressions Evolutionary Computation, vol: 6(4), pp. 390 401, 2002. [32] Andrew Kahng, Classical Floorplanning Harmful? ," Proc eedings of ACM International Symposium on Physical D esign, pp. 207 213, April 2000. [33] Yan Feng Dinesh P. Mehta and Constrained "Modern" Floorplanning Proceedings of the 2003 international symposium on Physical design, April 06 09, 2003, Monterey, CA, USA. [34] Tung Chieh Chen, and Yao Wen Chang, Modern floorplanning based on fast simulated annealing 03 06, 2005, San Francisco, California, USA. [35] Yong Zhan Yan Feng, and A fixed die floorplanning algor ithm using an analytical approach design automation, Janua ry 24 27, 2006, Yokohama, Japan [36] A performance driven IC/MCM placement algorithm featuring explicit design space e xploration Electronic Systems, vol:2(1), pp. 62 80, January 1997. [37] R. P. Dick, and N. K. Jha, MOGAC: a multiobjective genetic algorithm for the co synthesis of hardware software embedded systems ," Digest of Techni cal Papers, IEEE/ACM International Conference on Computer Aided Design pp.522 529, 1997. [38] Power Density Aware Floorplanning for Reducing Maximum On Chip Temperature ," Proceedings of 18th IASTED International Conference on M odelling and Simulation (ICMS) pp. 319 324, 2007. [39] D. Chatterjee, T.W. Manikas, and COOLER A Fast Multiobjective Fixed outline Thermal Floorplanner, P roc eedings of the 3rd a nnual Austin Conf. on Integrated Systems & Circuits (ACISC 08) May 2 008. [40] Xin Hao, and F. Brewer, Wirelength optimization by optimal block orientation ," IEEE/ACM International Conference on Computer Aided Design pp. 64 70, 2005. [41] S. D. Scott, A. Samal, and S. HGA: a hardware based genetic algorithm Press New Y ork, NY, USA 1995

PAGE 139

128 [42] M. Tommiska, and J. Implementation of genetic algorithms with programmable logic devices 78, 1996. [43] B. Shackleford G. Snider, R. Carter, E. Okushi, M. Yasuda, K. Seo, and H. Yasuura, A High Performance, Pipelined, FPGA based Genetic Algorithm Machine Algorit hms and Evolvable Machines, vol: 2, pp. 33 60, 2001. [44] N. Yoshida, and T. Multi GAP: Parallel and Genetic Algorithms in VLSI Proceedings of SMC, pp. 571 576, 199 9. [45] W. Tang, and L. Hardware Implementation of Genetic Algorithms using FPGA Proceedings of the 47 th MWCAS, pp. 549 552, 2004. [46] C. Aporntewan, and P. A hardware implementation of the compact genetic algorithm Proceedings of the Co ngress on Evolutionary Computation, pp. 624 629, 2001. [47] P. Graham, and B. E. Genetic algorithms in software and hardware a performance analysis of workstation and custom machine implementation IEEE Symposium on FPGAs for Custom Computing Mach ines pp. 216 225, 1996. [48] M. S. Jelodar, M. Kamal, S. Fakhraie, and M. Ahmadabadi, SOPC Based Parallel Genetic Algorithm in Evolutionary Computation [49] N. Nedjah, and L. Massively parallel hardware architectu re for genetic algorithms 2005. [50] S. Wakabayashi, T. Koide, K. Hatta, Y. Nakayama, M. Goto, and N. Toshine GAA: a VLSI genetic algorithm accelerator with on the fly adaptation of crossover operators Proceedings of the IEEE Int l. Symposium on Circuits and Systems, pp. 268 271, 1998. [51] P. Y. Chen, R D. Chen Y P. Chang, L S. Shieh, and H. Hardware Implementation for a Genetic Algorithm Meas urement, vol : 57( 4 ) pp. 699 705, April 2008. [52] K. Sai Mohan, and B. A Diversity Controlled Genetic Algorithm for Optimization of FRM Digital Filters over DBNS Multiplier Coefficient Space International Symposium on Circuits and Systems, 2007. [53] A 2.92 W hardware random number generator Proceedings of IEEE ESSCIRC 2006. [54] M. The effect of pseudo random number generation quality on the performance of a simple genetic algorithm [55] M. Meysenburg, an d J. A. Randomness and GA performance, revisited Proceedings of the Genetic and Evolution ary Computation Conference, vol: 1, pp. 425 432, 1999.

PAGE 140

129 [56] E. Cant Paz, On random numbers and the performance of Genetic Algorithms Proceedings of the Genet ic and Evolutionary Computation Conference, pp. 311 318, July 9 13, 2002. [57] Practical UNIX and Internet Security [58] Influence of random number generators on graph partitioning algorithms Electronic Trans actions on Numerical Analysis, vol: 21, pp. 125 133, 2005. [59] H. Evolvable Hardware: Genetic Programming of a Darwin Machine conference on Artificial Neural Networks and Genetic Algorithms, Spinger Verlag, 1993. [60] I.Kajitani, T. Hoshino, D. Nishikawa, H. Yokoi, S. Nakaya, T. Yamauchi, T. Inuo, N. Kajihara, M. Iwata, D. Keymeulen, and T. Higuchi, A gate level EHW chip: Implementing GA operations and reconfigurable hardware on a single LSI eedings o f International conference on Evolvable Systems: From Biology to Hardware, ICES 1998. [61] The PIG paradigm: the design and use of a massively parallel fine grained self reconfigurable infinitely scalable architecture NASA/DoD Wor kshop on Evolvable Hardware, pp. 175 180, 1999. [62] A. Stoica, D. Keymeulen, D. Vu, R. Zebulum, I. Ferguson, T. Daud, T. Arsian, and G. Xin, Evolutionary recovery of electronic circuits from radiation induced faults ," Congress on Evolutionary Co m putation, vol: 2, pp. 1786 1793, 2004. [63] J. Langeheine, K. Meier, J. Schemmel, and Intrinsic evolution of digital to analog converters using a CMOS FPTA chip rence on Evolvable Hardware, pp. 18 25, 2004. [64] V. Baumgar PACT XPP A self reconfigurable Data Processing Architecture SA'01, Las Vegas, NV, CSREA Press, 2001. [65] C. Lambert, T. Kalganova, and E. FPGA based systems for evolvable hardware ngineering and Technology, vol: 12, pp. 123 129, March 2006. [66] E. M. Sentovich, K. J. Singh, C. Moon, H. Savoj, R. K. Brayton, and A. Sangiovanni Vincentelli, Sequential circuit design using synthesis and optimiz ation ," Proceedings of IEEE International Conference on Computer Design, pp. 328 333, 1992. [67] A. Stoica, A. Fukunaga, K. Hayworth, and C. Salazar Evolvable Hardware for Space Applications 166 173, 1998. [68] A. Stoica, D. Keymeulen M. Mojarradi, R. Zebulum, and T. Daud, Progress in the Development of Field Programmable Analog Arrays for Space Applications ," IEEE Aerospace Conference pp. 1 9, 1 8 March 2008.

PAGE 141

130 [69] A. Stoica, D. Keymeulen, R. Zebulum, S. Katkoori, P. Fernando, H. Sankara n, M. Mojarradi, and T. Self Reconfigurable Analog Array Integrated Circuit Architecture for Space Applications Systems, June 22 25 2008. [70] Exploring beyond the scope of human design: Automa tic generation of FPGA configurations through artificial evolution th Annual Advanced PLD and FPGA conference, 1998. [71] On routine implementation of virtual evolvable devices using COMBO6 of the 2004 NASA/DoD confe rence on Evolvable Hardware, pp. 63 70, 2004. [72] Extreme Temperature Electronics from Materials to Bio inspired Adaptation Hardware and Systems, 2007. [73] K. Mizuno, N. Oh Analog CMOS IC for high temperature operation with leakage current compensation 4 th International High Temperature Electronics Conference, June 1998. [74] S. Terry, B. Blalock, J. Jackson, S. Chen, C. Durisety, M. Mojarradi, a nd E. Kolawa, Development of robust analog and mixed signal electronics for extreme environment electronics Proceedings of IEEE Aerospace Conference, March 2004. [75] Design considerations in high temperature analog CMOS ICs Transacti ons on Components, Hybrids, and Manufacturing Technology, v ol: CHMT 9, no.3, pp. 242 251, 1986. [76] High Temperature Electronics using Silicon Technology l Solid State Circuits Conference, February 1996. [77] Toward evolvable hardware chips: experiments with a programmable transistor array th International conference on Micro electronics for Neural, Fuzzy, and Bio inspired syste ms, pp. 156 162, 1999. [78] A. Stoica, R. Zebulum, D. Keymeulen, R. Tawel, T. Daud, and A. Thakoor, Reconfigurable VLSI Architectures for Evolvable Hardware: From experimental FPTAs to Evolution oriented chips 9(1), pp 227 232, February 2001. [79] A. Stoica, D. Keymeulen, and R. Evolvable Hardware Solutions for Extreme Temperature Electronics 97, 2001. [80] R. S. Zebulum, D. Keymeulen, J. Neff, R. Rajeshuni, T. Daud, and A. Extreme Temperature Electro nics using a Reconfigurable Analog Array [81] Y. Design of High Frequency integrated analogue filters

PAGE 142

131 [82] A. Stoica, R. S. Zebulum, D. Keymeulen, R. Ramesham, J. Neff, and S. Katkoori, Temperature Adaptive Circuits on Reconf igurable Analog Arrays Conference on Adaptive Hardware and Systems (AHS 2006), pp. 28 31 15 18 June 2006. [83] P. Fernando, H. Sankaran, S. Katkoori, D. Keymeulen, A. Stoica, R. Zebulum, and R. A customizable FPGA IP core implement ation of a general purpose Genetic Algorithm engine Proceedings of IEEE International Symposium on Parallel and Distributed Processing, 14 18 April 2008.

PAGE 143

ABOUT THE AUTHOR Pradeep Ruben Fernando received a Bachelor of Engineering Degree in C omputer Science and Engineering from the University of Madras in 2002 and a M aster of S cience Degree in Computer Engineering from the University of South Florida in 2006. He was admitted into the Ph.D. program at the University of South Florida in the summ er of 2005. While in the Ph.D. program Pradeep has co authored several technical publications and presented papers at technical conferences.