USFDC Home  USF Electronic Theses and Dissertations   RSS 
Material Information
Subjects
Notes
Record Information

Full Text 
PAGE 1 Performance Optimization in ThreeDimensional Programmable Logic Ar rays (PLAs) by Supriya Sunki A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Engineering Department of Computer Science and Engineering College of Engineering University of South Florida Major Professor: Srinivas Katkoori, Ph.D. N.Ranganathan, Ph.D. Soontae Kim, Ph.D. Date of Approval: June 7, 2005 Keywords: 3D Integrated Chips, 3D via, PLA, Interwafer via, MAGIC layout s ystem, VILIC, MPLA Copyright 2005, Supriya Sunki PAGE 2 DEDICATION To my parents PAGE 3 ACKNOWLEDGEMENTS It is a great pleasure to thank my advisor, Dr.Srinivas Katkooori for hi s immense patience, guidance, encouragement and insightful suggestions. I am also grateful to my co mmittee members Dr.Ranganathan and Dr.Soontae Kim for their critical advice and feedba ck. I wish to acknowledge and thank all those people who helped me through this work. S pecial thanks to my friends Pradeep, Ranga and Ashwath for helping me unconditionally. Th anks to all the VCAPP members and also the technical support for bearing with my pr oblems, especially Daniel. I would also like to thank Priya, Karthik, Sruthi, Jyothi, Shukla and al so my best friends for being there. My inspiration my Dad, my bundle of confidence my mother, my moral support my brot her and my caring sister without whom I would not be able to stand strong. I would like to thank Dr. Katkoori again, as a great mentor and a great person. PAGE 4 i TABLE OF CONTENTS LIST OF TABLES.........................................................................................................................iii LIST OF FIGURES.........................................................................................................................iv ABSTRACT....................................................................................................................................vi CHAPTER 1.....................................................................................................................................1 INTRODUCTION............................................................................................................................1 1.1 Interconnect Effects...............................................................................................................1 1.2 What are 3D ICs?...................................................................................................................3 1.3 Summary of Overall Idea.......................................................................................................5 1.4 Discussion About Results......................................................................................................5 1.5 Thesis Organization...............................................................................................................6 CHAPTER 2.....................................................................................................................................7 RELATED WORK..........................................................................................................................7 2.1 2D PLAs Organization...........................................................................................................7 2.2 Optimization, Rowfolding, Columnfolding........................................................................8 2.2.1 Branch and Bound Algorithm.........................................................................................9 2.2.2 Heuristics Algorithms...................................................................................................10 2.2.3 Simple Folding.............................................................................................................10 2.2.3.1 Simple Column Folding (SCF).............................................................................10 2.2.3.2 Simple Row folding (SRF)....................................................................................11 2.2.4 Multiple Row/Column folding.....................................................................................11 2.2.5 Bipartite Folding...........................................................................................................12 2.2.5.1 Bipartite Column Folding......................................................................................12 2.2.3.2 Bipartite Row Folding...........................................................................................13 2.3 Types Of PLA......................................................................................................................13 2.3.1 PseudoNMOS PLA, Dynamic PLA............................................................................13 2.3.3 Blairs PLA...................................................................................................................16 2.3.4 Dhongs PLA................................................................................................................17 2.3.5 Wangs PLA.................................................................................................................18 PAGE 5 ii CHAPTER 3...................................................................................................................................19 PERFORMANCE OPTIMIZATION OF 3D PLAs.......................................................................19 3.1 Architecture of 3D PLAs.....................................................................................................20 3.1.1 Horizontal Partition....................................................................................................22 3.1.1.1 PLA Optimization in Horizontal Partition..........................................................23 3.1.2 Vertical Partitioning......................................................................................................30 3.1.2.1 PLA Optimization in Vertical Partition.................................................................30 3.1.3 Summary.......................................................................................................................34 CHAPTER 4...................................................................................................................................35 EXPERIMENTAL RESULTS.......................................................................................................35 4.1 Design Flow.........................................................................................................................35 4.3 Results Obtainediii LIST OF TABLES Table 4.1 Optimization Algorithm Implemented on a PLA Truth Table.......................................39 Table 4.2 Worst Case Delay on Vertical Partitioning of the 2D PLAs..........................................39 Table 4.3 Worst Case Delay on Horizontal Partitioning of the 2D PLAs......................................40 Table 4.4 Power Savings on Vertical Partitioning of the 2D PLAs...............................................41 Table 4.5 Area Savings with Horizontal Partitioning of the 2D PLAs..........................................42 Table 4.6 Area Savings with Vertical Partitioning of the 2D PLAs..............................................43 PAGE 7 iv LIST OF FIGURES Figure 1.1. Typical Gate and Interconnect Delays as Functions of Technology Node s (Reproduced from [1], page 165).....................................................................................................2 Figure 1.2. Comparison of Interconnect Delay as a Function of Technology Nodes for 2 D and 3D ICs (Reproduced form [1], page 165).......................................................................................3 Figure 1.3. Schematic Representation of 3D Integration of Multilevel Wi ring Network.............4 Figure 1.4. Schematic of a 3D chip Showing Heterogeneous Technology Integration (Reproduced from [12], page 2).......................................................................................................5 Figure 2.1. PLA Representing AND and OR Planes.....................................................................7 Figure 2.2. General Structure of the PLA......................................................................................8 Figure 2.3. Personality Matrix of a PLA........................................................................................9 Figure 2.4. Column Folding of the Sample PLA.........................................................................11 Figure 2.5. Row Folding of the Sample PLA (ORANDOR configuration)..............................11 Figure 2.6. Bipartite Column Folding..........................................................................................12 Figure 2.7. Bipartite Row Folding................................................................................................13 Figure 2.8. PseudoNMOS PLA..................................................................................................14 Figure 2.9. Dynamic PLA............................................................................................................15 Figure 2.10. Conventional Singlephased Dynamic CMOS PLA.................................................16 Figure 2.11. Blairs PLA...............................................................................................................17 Figure 2.12. Dhongs PLA............................................................................................................17 Figure 2.13. Wangs PLA.............................................................................................................18 Figure 3.1. General Structure of 2D PLA...................................................................................21 Figure 3.2. AND and OR SubPLAs after Vertical Partitioning..................................................21 Figure 3.3. Top and Bottom SubPLAs after Horizontal Partitioning.........................................22 Figure 3.5. Greedy algorithm for PLA Optimization...................................................................25 Figure 3.6. Personality Matrix Before Implementation of Greedy Algorithm.............................26 Figure 3.7. Illustration of the Step 2 of the Greedy Algorithm....................................................26 Figure 3.8. Personality Matrix of PLA After Implementing the Greedy Algorit hm....................27 Figure 3.10. Topology of PLA Before Implementing Algorithm.................................................27 PAGE 8 v Figure 3.11. Topology of PLA After Implementing Algorithm....................................................28 Figure 3.12. Top SubPLA Obtained After Horizontal Partition..................................................28 Figure 3.13. Bottom SubPLA Obtained After Horizontal Partition.............................................29 Figure 3.14. Merging of Top and Bottom PLAs in Third Dimension, where VILIC (Ve rtical Inter layer inter connections)..........................................................................................................29 Figure 3.15. AND and OR Planes Formed After Vertical Partition of the PLA........................... 30 Figure 3.16. Greedy Algorithm for PLA Optimization.................................................................31 Figure 3.17. Topology of PLA Before Implementation of Greedy Algorithm.............................32 Figure 3.18. AND Plane After Optimization................................................................................32 Figure 3.20. Merging of AND and OR Planes in Third Dimension, VILIC (Vertica l Inter Layer Inter Connections)........................................................................................................................34 Figure 4.1. Design Flow of the Experiment.................................................................................36 Figure 4.2. Improvement in Delay...............................................................................................40 Figure 4.3. Improvement in Power...............................................................................................41 Figure 4.4. Improvement in Area.................................................................................................42 Figure 4.5. Improvement in Area.................................................................................................43 PAGE 9 vi PERFORMANCE OPTIMIZATION OF THREEDIMENSIONAL PROGRAMMABLE LOGIC ARRAYS (PLAs) Supriya Sunki ABSTRACT Increased chip size and reduced feature size has helped followi ng Moores law for long decades. This has an impact on interconnect length, which is resulting i n chip performance degradation. Despite the introduction of new materials with LowK dielectrics for interconnects, their delay is expected to substantially limit the chip performance. To overcome this problem the need for new technology has arrived. One such promising technology is the threedimensional Integrated chips (3D ICs) with multiple silicon layers. In this thesis, three dimensional integrated chip (3D IC) tec hnology has been implemented on programmable logic arrays (PLAs). The twodimensional PLAs a re converted to threedimensional PLAs to realize the advantages of the third dimens ion. Two novel approaches for partitioning of PLAs are introduced for topological optimization Greedy algorithm is implemented on the partitioned PLAs to utilize the third dimension fo r further enhancement in scalability factors. This concept has been implemented on MPLA ( Magic Programmable Logic Array) tool. The 3D PLA has been tested on MCNC91 benchmark suite and the result s are presented. The experimental results are compared with the 2DPLA on the same be nchmark set. The results obtained indicate the efficacy of the proposed synthesis approach. PAGE 10 1 CHAPTER 1 INTRODUCTION The complexity of the very large scale integrated (VLSI) circuits is growing dramatically over the past few decades. The trend is likely to continue for the futur e generations due to the evergrowing demand for functionality and higher performance. VLSI circ uits are being aggressively scaled to meet this demand. According to the predictions of Interna tional Technology Roadmap for Semiconductors (ITRS), VLSI technology is expected to reach t etrascale integration in the year 2014 [1], hence an increase in number of transistors. To accommoda te such a high complexity, the chip size would be well over 1000 mm 2 and the feature size as small as 25 nm with the metal layers as many as 10 [1]. With these trends to be f ollowed, the problems accompanied are quantum effect and short channel effect of ultramini device s, the complexity of the global interconnections spanning over a vast single device layer, and the most conspicuous is the delay due to the interconnects. 1.1 Interconnect Effects A critical challenge in the design of the highspeed next gene ration circuits is interconnect delays. Interconnect delays are increasingly dominating the IC performa nce. Decreasing wire crosssections, smaller pitch, and longer lines to traverse large r chips have increased the resistance and the capacitance of these lines resulting in considerable inc rease in signal propagation (RC) delay. The factors have formed the major hindrance to optimizing c ircuit performance. A significant amount of powerconsumption of the total chip can be due to the wiring ne twork for clock distribution which is usually realized by long global wires. Figur e 1.1 illustrates this problem where the optimized interconnect delays and gate delays are shown as functions of various technology nodes [1]. Historically the industry used a combination of low k dielectrics to alleviate the wire capacitances and copper (Cu) damascene pr ocesses to reduce the resistance. From Figure 1 we can observe that, inspite of introduction of the new materials such as copper and other lowK dielectric materials, below 130 nm technology node substantial interconnect delays are expected to occur, thereby severely limiting the chip perform ance. PAGE 11 2 Figure 1.1. Typical Gate and Interconnect Delays as Functions of Technology Nodes (Reproduced from [1], page 165). The other limiting factors affecting the monolithic integrat ion paradigm other than interconnect delays are, Memory bandwidth has already become a limiting factor impeding the performance of generalpurpose microprocessors, multimedia and dataintensive applicat ions. Increasing complexity in integration of heterogeneous components like microprocessor, analog/RF circuit, high performance logic in modern SystemOnChi p (SOC)s which are targeted for different fabrication processes with very dive rse configurations and manufacturing steps. All of these issues pose serious challenges for the twodim ensional technology and concede for a new technology. 3D technology arrives as a viable alternative with promi sing advantages. Figure 1.2 gives a comparison of interconnect delay as a function of technology nodes for 2D and 3D ICs. Moving repeaters to the upper active layer reduces interconnect delay by 9%. For the 50nm node, 3D (with the same number of interconnect levels as t he 2D chip) shows the significant delay reduction of 63%. Increasing the number of metal layers in 3D reduces the interconnect delay further by 35% assumption made by the Figure 1.2 is that 3D chip (footprint) area equals 2D chip area. PAGE 12 3 Figure 1.2. Comparison of Interconnect Delay as a Function of Technology Nodes fo r 2D and 3D ICs (Reproduced form [1], page 165) 1.2 What are 3D ICs? Recent development in technology has favored the fabrication of st acked multiple deviceinterconnect layers on top of each other on a singlechip. This novel approach is c ommonly called as 3D integration of ICs. The main idea is the integration of several device layers in the third dimension (z plane) to decrease the interconnect delay by using (vertic al) vias in third dimension. Figure 1.3 illustrates the 3D integration that creates multiactive layers, as a result allows higher transistor packing density and reduced chip area. In the 3D design archit ecture, a 2D chip can be divided into logic blocks. Each block can be placed on a separa te active layer stacked on top of each other. Each active layer is accompanied with a number of int erconnect layers. These stacked layers can be connected with short vertical inter layer inte r connections (VILICs) as shown in the Figure 1.3. The VILICs can eliminate the long global wires that realize the interblock communications in 2D. 3D architecture allows extra flexibili ty in the system design, placement and routing by allowing the logic gates on a critical path to be placed very close to each other using multiple active layers. This would result in significa nt reduction of the RC delay and can enhance the performance of the logic circuits. PAGE 13 4 Figure 1.3. Schematic Representation of 3D Integration of Multilevel W iring Network and VILICs. T1 : first active layer device, T2 : second active layer device, Optical I/O device: third active layer I/O device. M1 and M2 are for T1, M1 and M2 are for T2. M3 and M4 a re shared by T1,T2 and the I/O device. 3D IC technology can be exploited to build SOCs by placing circuit s with different technologies and performance requirements on separate layers to reduce the nois e, as shown in Figure 1.4. For instance, the components of the mixedsignal systems, namely, the di gital and analog can be placed on different Si layers, thereby achieving better noise performance due to lower electromagnetic interference between the circuit blocks. In the perspective of heterogeneous integration, mixedtechnology assimilation could be made less compl ex and more cost effective by fabricating such technologies on separate substrates followed by physical b onding. PAGE 14 5 Figure 1.4. Schematic of a 3D chip Showing Heterogeneous Technology Integrat ion (Reproduced from [12], page 2) 1.3 Summary of Overall Idea As chip size increased, interconnect propagation delay increase s potentially limiting chip performance. 3D integration to create multilevel ICs is a tec hnology that increases transistor packaging density and therefore can potentially reduce chip area. In this work, 3D IC technology has been implemented on programmable logic arrays (PLAs). Novel partitioning techniques have been implemented to reduce the critical delay of the PLAs a nd the topological optimization has been done by the virtue of third dimension. Area and power optimization has followed with the work done. 1.4 Discussion About Results The three dimensional PLAs are tested against two dimensional PLAs on MCNC benchmark circuits. The different comparisons made are Comparison of delays of PLA with horizontal partition against 2D PL A. The results obtained were a good reflection of the technique employed. The delay ha s reduced by approximately 30% for most of the cases. PAGE 15 6 Comparison of delays of PLA with vertical partition against 2D PLA. Though the reduction in delay was not drastic, the savings obtained are direct result from the greedy algorithm employed. Comparison of power of vertically partitioned PLA with 2D PLA. Savings made in the polysilicon and metal lines in the PLA aft er partitioning against before partitioning. Footprint area of horizontally partitioned PLA with area of 2D PLA are com pared. Footprint area of vertically partitioned PLA with 2D PLA area are com pared. 1.5 Thesis Organization In Chapter 2, the related work regarding the two dimensional PLAs a nd the need for the switch to the third dimension has been discussed. In Chapter 3, PLA optimizati on techniques implemented are discussed in detail. In Chapter 4, experimental Results obtai ned are discussed. In Chapter 5, we make conclusions from experimental results and discuss scope for fut ure research. PAGE 16 7 CHAPTER 2 RELATED WORK 2.1 2D PLAs Organization Programmable logic devices (PLDs) have been predominantly used f or implementation of the control logic due to their rapid manufacturing turn aroundtime, higher delay predictability, low startup costs and ease of design changes. The two major types of PLDs are, field programmable gate arrays (FPGAs) and complex programmable logic devices (CPLDs). The finegrained programmable logic cells in FPGAs produce high logic densities and provide high design flexibilities. However, the interconnect structures for FPGA s are complex and delay is often unpredictable in prelayout stages. On contradictory, the logic ce lls in CPLDs are coarsegrained twolevel ANDOR programmable logic arrays (PLAs) Figure 2.1.Their regular structure enables automatic layout and facilitate the verification of the genera ted functions. Although the logic density of PLA is comparatively less, their interconnect structur es are much simpler and the delay is more predictable. PLAs are being rediscovered as an ef ficient implementation style for highperformance circuits. For instance, a critical piece of the control logic of the Intel Pentium 2 MMX processor was implemented with a PLA [2] and also used as quite an attractive feature in GHz microprocessor[3]. Figure 2.1. PLA Representing AND and OR Planes AND PLANE OR PLANE INPUTS OUTPUTS PAGE 17 8 2.2 Optimization, Rowfolding, Columnfolding PLA implementation design is divided into two tasks: Functiona l design and Physical design. Translation of the Boolean equations specification of a multiple output combinational logic function into a set of sumofproducts logical implicates followed by minimization such that the resulting PLA implementation meets the design objectives (e .g, minimum silicon area maximum switching speed, etc.), forms the Functional design. The next task i s to map the logic into a topological representation of the final PLA structure. The t opology of the PLA consists of two separate planes, the ANDplane and the ORplane. The inputs and thei r complements run vertically through the matrix of circuit elements called AN D plane. The AND plane generates the product terms which become the input to another matrix of circuit elements called the ORplane. The signals thus formed are the sumofproducts form of the Boolea n functions of PLA inputs. Fig 2.2, gives the general structure of the PLA. Figure 2.2. General Structure of the PLA A personality matrix is a symbolic representation of the PLA w hich has one column for every input and output line, and one row for every product term. A one in the (i,j)th position indicates that the jth input is present in the ith product, or that the i th product term is present in the jth output. A zero indicates that the complement of the jth input is pr esent in the ith product term. A dash represents no connection. Refer Figure 2.3. A A B B C C D D ANDplane f 1 =AB v AB v BC f 2 =BC v CD v CD f1 f2 ORplane PAGE 18 9 Figure 2.3. Personality Matrix of a PLA The physical area of the PLA is proportional to the physical leng th between each two columns and each two rows. This distance is limited by the technology used, he nce the fewer columns or rows, lesser is the area of the PLA. One of such techniques used to minimize the area of a PLA by reorganizing the columns and rows is, PLA folding. PLA folding algorithms generally follows one or combination of the followi ng approaches: BranchandBound Algorithm Heuristic Algorithms 2.2.1 Branch and Bound Algorithm The object of the PLA folding is to find the maximum number of pa irs of the columns/rows that can be folded simultaneously. The PLA folding has a complex functiona l dependence on the ordering of the rows. The optimal folding problem has been shown to be NP complete. Many algorithms and heuristics have been developed to solve this problem The simplest of the algorithms is the branch and bound algorithm [8]. Although branch and bound Algorithm can find the theoretical optimal solution by investigation of all possi ble solutions, its practicality for large PLAs is questionable due to its time constraint produced of i ts exhaustive search for an optimal solution. The algorithms time complexity is not strictly pre dicted since the algorithm employs backtracking to the point where maximal objective function has been determined when no better search is found in forward search. The better the upper bound on the ob jective function is, the better would be the performance of the algorithm. Generally spea king, branchandbound 1 1 1 0 0 1 1 1 0 0 1 1 1 1 0 1 INPUTS OUTPUTS 1 : CONNECT TO THE INPUT (OUTPUT) 0 : CONNECT TO THE COMPLEMENT OF THE INPUT : DO NOT CONNECT (DONT CARE) PAGE 19 10 algorithms can only handle PLAs of moderate size (50100 input/output l ines).Hence many heuristics have been found which gives good but nonoptimal solution. 2.2.2 Heuristics Algorithms Heuristic Algorithm reorganizes PLA incrementally. At each step, t he best folding pair is selected based on the available information to build the PLA. A test is performed after each selection to ensure that no alternating cycle is introduced. Such algorithms do not carry out a thorough search of the solution space and the effectiveness of the algorithm s trictly depends on the selection rules. There is no guarantee of the solution to be optimal. PLA folding r esults thus obtained are only locally optimal and dependent on the selection order of the folding pairs. 2.2.3 Simple Folding Simple folding deals with the permutation of the rows (and/or c olumns) of the array which permits a maximal set of column pairs (and/or row pairs) to be implemented in the same column (row) of the physical array. Folding can be categorized in two differen t types: Column folding Row Folding. 2.2.3.1 Simple Column Folding (SCF) Splitting a PLA column into two segments to share the same physic al column between two inputs or outputs forms the concept of simple column folding, as shown in Figure 2.4. In such a configuration one of the input (output) runs from the top of the PLA and other from t he bottom of the PLA. This kind of folding can be implemented in custom designs a nd masterslice design. In custom designs, one has to consider the routing problems which may be created for the need of running inputs and outputs from the top and bottom of the arrays. Constrai nts would have to be put on the locations of the inputs and outputs not to lose the gain in area achieved by SCF in routing the signals. The masterslice PLA has a complex st ructure which almost entirely solves the routing maximum freedom to the folding process. PAGE 20 11 Figure 2.4. Column Folding of the Sample PLA 2.2.3.2 Simple Row folding (SRF) To split a PLA row into two segments as to two product terms m ay share the same row as shown in Figure 2.5. In this kind of row folding, a PLA may have two or more ANDarrays and/or two or more ORarrays. The two configurations which are of intere st are the ORANDOR and the ANDORAND configurations. Figure 2.5. Row Folding of the Sample PLA (ORANDOR configuration) 2.2.4 Multiple Row/Column folding Multiple folding is the generalization of the simple folding. The objective of the multiple column (and/or row) folding is to determine a permutation of the rows (and/or columns) of the PLA A A B B f1 C C D D f2 f 1 A A B B C C D D f 2 OR AND OR PAGE 21 12 w hich allows to implement in each column (and/or row) of the physical array a set of logic columns (rows).Thus the area saving achieved by this technique can always be made better than (or in the worst care, equal to) the gain achieved by the simple folding. 2.2.5 Bipartite Folding Bipartite folding is a special case of folding, where all the folding columns/rows split at the same horizontal/vertical level. Although such restriction may reduce the number of foldable pairs, the following are the advantages obtained: Routing of nets to and from the PLA is simplified due to the folded (input/output) lines entering from the top of the PLA are routed independent of the order of the folded (input/output) lines form the bottom of the PLA. Its uniform structure will help in reducing the constraints in subsequent folding. 2.2.5.1 Bipartite Column Folding Bipartite column folding is a folding in which all of the breaks (splits) of the columns occur at the same level, as shown in Figure 2.6. The single break level of a bipartite folding allows the PLA region to be divided into two regions called upper folding region which contains those folded input and output lines that are above the break and a lower folding region which contains the folded input and output lines that are below the break. A bipartite column folding exists if every input/output line in the upper folding region is disjoint from the input/output line in the lower folding region. The size of the column bipartite folding is the cardinality of either of the regions. Figure 2.6. Bipartite Column Folding D PAGE 22 13 2.2.3.2 Bipartite Row Folding Bipartite row folding can be defined in an analogous fashion to the folding of bipartite column folding. It is a folding in which all of the breaks (splits) o f the rows occur at the same level, as shown in Figure 2.7.The single break level of a bipartite folding allows the PLA region to be divided into two regions called left folding region which contains those fol ded input and/or output lines that are to the left of the break and a right folding re gion which contains the folded input and/or output lines that are to the right of the break. A bipa rtite row folding exists if every input/output line in the left folding region is disjoint from the in put/output line in the right folding region. The size of the row bipartite folding is the cardinality of either of the regions. Figure 2.7. Bipartite Row Folding 2.3 Types Of PLA 2.3.1 PseudoNMOS PLA, Dynamic PLA PLAs circuit design primarily falls into two different log ic categories: pseudoNMOS logic and dynamic logic. The pseudoNMOS design style uses ptype transist or as a static load with its gate tied to ground, and the function being implemented as a pulldown netw ork of ntype transistors. The AND and OR planes are achieved using multipleinput NOR gate s. PseudoNMOS logic is compact and fast due to single ptype transistor needed for a si ngle AND or OR term and low input capacitance. However, the direct current path from Vdd to G nd through the load and driver devices when the output is low causes the pseudoNMOS to consume static power. This disadvantage had given opportunities for alternative designs wit h dynamic design techniques. Dynamic CMOS PLAs dissipate less power and generate less ground noise than do the pseudoNMOS PLAs, but for large PLA layouts power dissipation is excessive. OR f 1 A A B B C C D D f 2 OR AND PAGE 23 14 The early dynamic techniques used were domino [4] and NORA [5]. The logic here works by charging (or discharging) the logicgate output to high (low) th rough a single ptype (ntype) transistor which isolates the output in its switch OFF mode. The n the logic is resolved by selectively discharging (charging) the output through a pulldown ( pullup) network corresponding to the logic function. On aggregating two such functions in series, it is necessary t o ensure that the initial state of the first output does not switch ON the pulldown (pullup) network of t he second. The possibility of wrongly discharging (charging) the second output before the firs t will be very high. In domino logic, the gates are either all precharged with pulldown netwo rks or all predischarged with pullup networks and connected to the next stage through inverters. Where as in NORA, the alternating stages are formed by precharged pulldown networks and predischa rged pullup networks with the omitted inverters. Refer Figures 2.8 and 2.9. Figure 2.8. PseudoNMOS PLA f 0 f 1 GNDGND V DD GND x 0 x 0 x 1 x 1 x 2 x 2 GND GNDGNDGND V DD ANDPLANEORPLANE PAGE 24 15 Figure 2.9. Dynamic PLA The limitations in the implementation of the above logic function for ms the vital point to be considered. For instance, one cannot achieve a simple dynamic implem entation of a PLA using two NOR gates similar to static pseudoNMOS version which ha s fast pulldown networks of parallel ntype transistors, since the precharge state of t he first gate would discharge the dynamic nodes of the second before the first could be resolved. 2.3.2 SinglePhased Dynamic PLA The typical singlephased dynamic CMOS circuit is implemented in domino logic and uses a NAND gate [5] to provide the AND plane as shown in Figure 2.10. It is a pure dynamic circuit with nodes p and x being precharged when clk is low and since the outp ut of the dynamic NAND gate is inverted, the input to the OR plane is thus low during the precharge and so OR plane does not require a discharge transistor gated by the clk signal. The primary bottleneck here forms the speed as the evaluation of the AND plane depends upon the discharge of the dynamic node through a potentially long series of transistors which form the NAND functi on. f 0 f 1 GND V DD f OR x 0 x 0 x 1 x 1 x 2 x 2 GND V DD ANDPLANEORPLANE f AND f OR f AND PAGE 25 16 Figure 2.10. Conventional Singlephased Dynamic CMOS PLA 2.3.3 Blairs PLA A new singlephased dynamic design has been introduced by Blair. T he AND plane is implemented by the predischarging pseudoNMOS NOR plane in order to shorten the series NMOS transistors in the evaluation block which is the signifi cant bottleneck for speed, Figure 2.11. The ratioed logic of the pseudoNMOS makes it hard to drive a large capacitance load and hence results in a long rise time. The advantage of the Blairs PLA is that its major ac power consumption comes from the power factor of the OR plane alone. However, the AND plane circuit transforming to a pseudoNMOS circuit is rather high. Therefore, the dc power consumption of the AND plane gates will compensate with the benefits gained from the reduced ac power cons umption. This effect gets more severe when the operating frequency gets lower. At last, the power factor of the Blairs PLA is similar to that of the clockdelayed PLA. PAGE 26 17 Figure 2.11. Blairs PLA 2.3.4 Dhongs PLA Dhong et al. proposed a PLA design approach which employs a precharge d OR array and a chargesharing AND array to assist in eliminating the ground sw itch. Due to the chargesharing AND plane, the output voltage VoH can only reach approximately 3.0 V when Vdd is 5.0 V. Consequently the full swing of the voltage is not achieved aside f rom the lownoise margin problem, Figure 2.12. A delayed clock is also required in order to prevent the racing problem. Apart from these, capacitors are also needed for this design cir cuit resulting in large area consumption. Figure 2.12. Dhongs PLA Clk Out in 0.5p 1p 1p 2p Clkd Clk Out in 0.5p 1p 1p PAGE 27 18 2.3.5 Wangs PLA Another Singlephase dynamic PLA of high speed low power PLA circui t design has been proposed by Wang et al., achieved by a combination of pseudoNMOS, dyna mic and domino logic design styles. The primary concept applied here is to insert a buffering NAND gate between two NOR planes to eliminate the ground switch and reduce the dynam ic power spikes duration to avoid racing problems. In original Wangs PLA, the clock signal to drive the NOR ga te in the AND plane and the OR plane is the same. Such a design leads to racing problem and resul t in evaluation errors. To overcome this problem the clock signal has to be delayed. To im plement this delaying clock the circuit uses two inverters and henceforth this circuit is call ed as modified Wangs PLA, Figure 2.13. The primary design concept contributed by Wangs work comes from the ANDinterpl ane buffer. As a result, the switching activity of this plane is kept low and the power consumption is negligible. Hence, the major ac power consumption comes only from the AND and O R planes. Figure 2.13. Wangs PLA Clk Out in 0.5p 1p 1p 2p Clkd PAGE 28 19 CHAPTER 3 PERFORMANCE OPTIMIZATION OF 3D PLAs Programmable Logic Arrays (PLAs) are extensively used in the structured design of VLSI because of the ease with which any combinational and sequential logic function can be implemented. For large controllers, particularly when there a re many inputs and outputs, a single PLA realization can rapidly become large and slow. For this reason, algorithms have been proposed to optimize and partition the logic realizing the controlle r. Several optimization steps are involved in this procedure, some of which are commonly gathere d into two distinct stages of Logic Design and Topological Design. Logic Design Optimization : It is the translation of the set of Boolean functions into a minimal set of twolevel sumofproducts. Topological Design Optimization : Determination of the optimal layout of the circuit with respect to the area occupied by the PLA to restrain to the c onstraints of the area specifications. PLA folding [5, 6] is one of the effective and widely used techniques to perform chip area optimization. The problem of finding the optimal PLA layout by mea ns of such a technique is known as PLA folding problem. Folding models do not always give a proble m description as accurate as required and also the cost of the folding is usuall y assumed to be equal to the area of the minimal rectangle containing the PLA. Nevertheless, despi te the above the assumptions, PLA folding problem is known to be difficult. Therefore, area reduct ion is obtained by means of enumerative and heuristic techniques, such as branch and bound [8] local search [9], and simulated annealing [10]. PLA partitioning is a technique of breaking a large PLA into subPLAs in order to minimize the total area and the delay of the subPLAs. In the past, lot of rese arch has been done in the PLA partitioning problem. Kang [7] proposed a heuristic algorithm whic h was later improved by Hennessy[4]. Shihming Liu et.al. [5] has proposed performa nce driven partitioning algorithms. All of these works focus on reducing the PLA area. The rational e being that smaller PLAs incur smaller delays. However, although smaller PLAs run faster than larger ones, simple partitioning PAGE 29 20 for area does not often produce circuits that meet the speed requ irements, because of the uneven sizes of the subPLAs. And the merging of these subPLAs becomes completely d ependent. In this approach we have come up with novel techniques to partition PL As and also in merging of these subPLAs. The topological optimization techniques employed on the partitioned planes are highly independent of the planes, forms the critical part of thi s work. Unlike the past research work, the folding techniques were dependent of the position of the inputs, outputs and the pr oduct terms [5]. The delay optimization is performed on both the planes to reduce the length of the critical delay and the area optimization can be seen by the vir tue of third dimension. Power savings follow with the reduction of the node capacitance. 3.1 Architecture of 3D PLAs In a real circuit, a large PLA tends to be quite wasteful or not fast enough to support the other parts of the system. In this case, we can split it into s everal smaller PLAs to reduce the chip area and/or improve the speed. Due to its twolevel structure, a PLA has some inherent redundanc y from the classical point of view, which put some constraints on its use. But by exploiting the design methodology of its personality matrix, the PLA can be optimized. This optimization c an be done in three ways: minimization, partitioning, and folding. In this thesis, we have concentr ated on two of them, partitioning and folding. Performance optimization is mainly due to PLA partitioning. The two novel approaches implemented for partitioning are: Horizontal Partitioning Vertical Partitioning Figure 3.1 shows general structure of the 2D PLA, Figure 3.2 illust rates vertical partitioning where AND and OR subPLAs are formed, Figure 3.3 illustrates ho rizontal partitioning where topPLA and bottomPLA are shown after partitioning. PAGE 30 21 Figure 3.1. General Structure of 2D PLA Figure 3.2. AND and OR SubPLAs after Vertical Partitioning ORplane A A B B C C D D ANDplane f 1 =AB v AB v BC f 2 =BC v CD v CD f1 f2 ORplane A A B B C C D D ANDplane f 1 =AB v AB v BC f 2 =BC v CD v CD f1 f2 PAGE 31 22 Figure 3.3. Top and Bottom SubPLAs after Horizontal Partitioning 3.1.1 Horizontal Partition The PLAs input which is provided in the truth table format, is parti tioned into two symmetrical PLAs with the partition made with respect to xaxis. This sym metry is reflected by dividing the number of product terms by two and placing the obtained number of product term s in each of the planes by duplicating the inputs on each of these planes. The numbe r of product terms would be equal on both the planes if they are even or it is partitioned with number of product terms in one plane greater than other by one if odd. Merging of top and bottom subplanes is made sure with the third dim ensional interconnects (interwafer vias) at the inputs. The input signal is car ried from one plane to another through the interwafer via. The signal thus carried to the other plane (top ) would compute the product term through the AND plane, as shown in Figure 3.1 Henceforth carried to th e OR plane to compute the sum of the AND terms which forms one of the output functi on of the top plane. The same procedure is followed in the bottom plane for the output function to be c omputed. The two output values thus obtained are wiredor to get the final single output. In the past research, the partitioning was in terms of dividing t he inputs and outputs in between the subPLAs [11]. But the novel approach implemented in here is the i nput duplication i.e., all the inputs are duplicated on both the planes, as shown in Figure 3.4. T op PLA Bottom PLA ORplane A A B B C C D D ANDplane f 1 =AB v AB v BC f 2 =BC v CD v CD f1 f2 PAGE 32 23 Figure 3.4. Top and Bottom SubPLAs After Partitioning a 2D PLA 3.1.1.1 PLA Optimization in Horizontal Partition PLA design is easily automated because of a direct corres pondence between physical PLA layout and the personality matrix. The major disadvantage of the PLA is that most practical logic problems leave much of the PLA area unused. A straightforward physi cal design results into a significant waste of silicon area, which may be unacceptable Also, speed and power become critical parameters as the size of the PLA increases. The gate capacitances of the input signals carried by long polysilicon lines become the key factor in determ ining the timing (speed) performance. In moderate to large PLAs, the polysilicon resista nce becomes a critical factor. The signal can be seriously degraded with the large resistance add ed to the line, no matter how large the drivers are. Further, if the PLA becomes large, the widt h of the power and the ground lines should also be increased to avoid possible metal migration. Hence to reduce upon the polysilicon lines and the metal lines, optimization algorithms have been implem ented. PAGE 33 24 The delay optimization obtained from the above topological optimiza tion of partitioning is further enhanced by the PLA optimization algorithms implemented on t he input truth table. The algorithm implemented here is the greedy algorithm. Though the alg orithm makes sure of the reduction of the high resistance poly lines, the percentage of re duction also depends on the PLA personality matrix. The pseudocode for the algorithm HOR_PLA_Optimzation( ) is shown in Figure 3.5. PAGE 34 25 procedure HOR_PLA_Optimization( I,O,P ) //I is the number of inputs, O is the number of outputs and P is the num ber of product terms for i 1 to P 1 for j 1 to I AND (i,j) = input for k 1to O OR (i,k) = input for i 1 to P 2 for j 1to I if ( AND (i,j) == 1  AND (i,j) ==0) AND (i,j)= 5 (or any constant) else AND (i,j) = 0 for k 1 to O 3 i f (OR (i,k) == 1) OR (i,k) = 5 else OR (i,k) = 0 for i 1 to P 4 for j 1 to I Sum(i) = Sum(i) + AND(i,j) Quicksort(Sum( )) // Sorts the product terms with respect to the weight of each Product term 5 for i 1 to O 6 for j 1 to P Sum(i) = Sum(i) +OR (i,j) Quicksort(Sum( )) // Sorts the output columns with respect to the weight of each output column. Figure 3.5. Greedy algorithm for PLA Optimization PAGE 35 26 The algorithm given in Figure 3.5 will be explained with an ill ustration. Given a PLA with below personality matrix, Figure 3.6, the steps followed are: Figure 3.6. Personality Matrix Before Implementation of Greedy Algorithm The input array and output array are stored into separate AND and OR matrix in line 1. Constant weight (5, or any other constant) is replaced with ea ch input and output if it is programmed according to line 2 of the algorithm, we get Figure 3.7. Figure 3.7. Illustration of the Step 2 of the Greedy Algorithm Note that in the AND array 1 => I is programmed, 0 => Inver ted I is programmed, => input is not programmed and in OR array 1 => that particular pr oduct term is programmed, 0 => particular product term is not programmed, refer Figure 3.6. The sum of each row of the AND matrix is calculated and sorted by quicksort with respect to weights. When the AND rows are being exchanged t he respective OR rows are also exchanged simultaneously to maintain the product term on both planes. Similarly the sum of each of the columns in the OR matrix is calculated and sorted with quick sort. The final personality matrix obtained with the alg orithm applied is as follows (Figure 3.8): 1 1 1 1 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 I1 I2 I3 O1 O2 O3 5 5 5 5 0 5 0 5 5 0 0 5 5 5 0 0 5 0 5 0 5 0 0 5 0 0 5 0 5 5 5 5 5 5 5 5 I 1 I 2 I 3 O 1 O 2 O 3 PAGE 36 27 Figure 3.8. Personality Matrix of PLA After Implementing the Greedy Alg orithm The topology of the PLA before the implementation of the greedy algorithm is as shown in Figure 3.5.The red lines represent the polysilicon layer and the blue li nes form the metal line that compute the product term. The savings in the polysilicon in both the AN D and OR planes can be noticed from Figures 3.10 and 3.11. The number of polysilicon units in AN D plane which are 31units (Figure 3.10) is reduced to 22 units(Figure 3.11), 17 units(Fig ure 3.10) in OR plane is reduced to 12units(Figure 3.11). Hence a noticeable savings amount of 29.03% and 29.4% respectively are made in the polysilicon layer. Figure 3.10. Topology of PLA Before Implementing Algorithm I 1 I 2 I 3 O 1 O 2 O 3 0 1 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 1 1 0 1 0 1 1 1 1 1 I1 I2 I3 O3 O2 O1 PAGE 37 28 Figure 3.11. Topology of PLA After Implementing Algorithm The PLA thus obtained is partitioned into two symmetrical PLAs with equal number of product terms if even or with an exceeded product term in one of the subPLAs if o dd. Figure 3.12. Top SubPLA Obtained After Horizontal Partition I 1 I 2 I 3 O 1 O 2 O 3 I 1 I 2 I 3 O 6 O 5 O 4 PAGE 38 29 Figure 3.13. Bottom SubPLA Obtained After Horizontal Partition. Merging of the two subPLAs is done in the third dimension by VILI Cs. VILICs are placed on each of the inputs in one plane so as to connect to their resp ective input on the other plane. The output functions generated on both the subPLAs are wiredor with the output signal which lies on top/bottom if it, refer Figure 3.14 ,the output signals O 3 and O 4 are wiredor to get one of the three final outputs. Figure 3.14. Merging of Top and Bottom PLAs in Third Dimension, where VILIC (V ertical Inter layer inter connections) I 1 I 2 I 3 O 1 O 2 O 3 PAGE 39 30 3.1.2 Vertical Partitioning PLA is partitioned into two planes with respect to yaxis, as shown in Figure 3.2 where the partition is made vertically between the AND and OR planes. AND and OR planes are partitioned henceforth forming the two separate planes, refer Figure 3.10. The merging of the two planes is made by the 3D interconnects (interwafer vias) placed on t he each of the product terms. The position of the interconnects being dependent on the optimizati on procedure followed which is explained in the section 3.1.2.1. Figure 3.15. AND and OR Planes Formed After Vertical Partition of the PLA 3.1.2.1 PLA Optimization in Vertical Partition The optimization procedure of the subPLAs after the vertica l partition is very similar to that of the horizontal partition. It varies with an additional final ste p of sorting the columns of the AND array to save the metal by utilizing the partition made f or third dimension. Vertical partition made between the AND plane and the OR plane allows us to reduce upon the m etal lines on the product terms that have to run all the way through to the OR plane in horizontal partiti on. The pseudocode for the algorithm VER_PLA_Optimzation( ) is shown in Figure 3.16. PAGE 40 31 procedure VER_PLA_Optimization( I,O,P ) //I is the number of inputs, O is the number of outputs and P is the number of product terms for i 1 to P 1 for j 1 to I AND (i,j) = input for k 1to O OR (i,k) = input for i 1 to P 2 for j 1to I if ( AND (i,j) == 1  AND (i,j) ==0) AND (i,j)= 5 (or any constant) else AND (i,j) = 0 for k 1 to O i f (OR (i,k) == 1) OR (i,k) = 5 else OR (i,k) = 0 for i 1 to P 3 for j 1 to I Sum(i) = Sum(i) + AND(i,j) Quicksort(Sum( )) // Sorts the product terms with respect to the weig ht of each product term for i 1 to I 4 for j 1 to P Sum(i) = Sum(i) +OR(I,j) Quicksort(Sum( )) // Sorts the input columns with respect to the weight of each input column for i 1 to O 5 for j 1 to P Sum(i) = Sum(i) +OR (i,j) Quicksort(Sum( )) // Sorts the output columns with respect to the weight of e ach output column. Figure 3.16. Greedy Algorithm for PLA Optimization PAGE 41 32 The topology of the PLA before the implementation of the greedy algorithm can be understood from the Figure 3.17. Figure 3.17. Topology of PLA Before Implementation of Greedy Algorithm The AND plane and OR plane obtained after the execution of the op timization procedure can be understood through the Figure 3.18 and Figure 3.19. Figure 3.18. AND Plane After Optimization Figure 3.19. OR Plane Afte r Optimization I 1 I 2 I 3 O 1 O 2 O 3 O 1 O 2 O 3 I 1 I 2 I 3 PAGE 42 33 Merging in the third dimension, Figure 3.14 represents merging of AND and OR planes of the above example: The merging of the two planes can fall into one of the following two conditions: Overlap of AND and OR planes In this case when the number of outputs ( O ) is less than number of inputs (I) ( i.e., O <= 2I ). The priority is given to the poly in the OR plane and hence the position of the third dimensional via is fixed by the first programmed place in a product t erm in OR plane and placed at the same position in the AND plane. NonOverlap of AND and OR planes In this case the number of outputs is greater than number of twi ce the number of inputs i.e., ( O > 2I ). The two potential possibilities that can be discussed in t his case are: If all the outputs in the OR plane are programmed for only one pro duct term then merging in the third dimension becomes difficult. Since the i nterwafer vias, that are placed on every product term are all placed on the first programm ed place of the OR plane of that particular product term, the interwafer via ca nnot be dropped directly from OR plane to AND plane as the size of the AND place is smaller comparatively. Hence in such cases, the partition of the OR plane can be done t o support 3D ICs for three planes. If partial number of product terms are single programmed (i.e., pr ogrammed for only one product term) and the remaining are programmed more than once in t he OR plane. Here a combination of the above procedures i.e., case 1 of nonove rlap and the overlap procedures described above can be used to overcome the problem. PAGE 43 34 Figure 3.20. Merging of AND and OR Planes in Third Dimension, VILIC (Verti cal Inter Layer Inter Connections) 3.1.3 Summary PLAs provide a flexible and efficient way of synthesizing a rbitrary combinational functions as well as sequential logic circuits. They are used in both LSI and VLSI technologies. The disadvantage of using PLAs is that most PLAs are very spars e. The high sparsity of the PLA results in a significant waste of silicon area. In this the sis, two novel approaches of partitioning techniques, the horizontal partitioning and the vertical parti tioning are introduced. The horizontal partitioning is the partitioning of the two dimensional PLA w ith respect to xaxis and vertical partitioning is the partitioning of the two dimensional PLA with respect to yaxi s. The topological minimization has been realized with employment of a novel greedy algorithm VILIC PAGE 44 35 CHAPTER 4 EXPERIMENTAL RESULTS 4.1 Design Flow The design flow followed for the synthesis of the three dimensional PLA is a s follows: The scmos technology file provided by the Magic version 7.1 is edit ed to introduce the new interwafer via. The PLA generation tool (MPLA) is provided with the templates which consists tiles used for the automatic generation of PLAs. Templates are ed ited to accommodate the new interwafer via tile. The PLA generation tool (MPLA) which is described in high level language C is edited to adapt this new tile. MPLA tool is also edited to integrate the greedy algorithm for optimiza tion. The tool henceforth when used with the new technology file on a set of benchmarks produces two magic layouts which are integrated with the interwafer v ias. These layouts are extracted with the Magic layout editor with a technology of 0.5m. The spice netlists are also used by the timing analyzer Pat hmill to compute the delay of the critical paths. The extracted files are converted to spice netlists and simul ated using HSPICE tool for the measurement of the power using 1000 random vectors. PAGE 45 36 Edit Magic techfile to introduce Inter wafer via Edit MPLA tool template to introduce Interwafer via tile Edit MPLA tool code to adapt the newtiles Algorithm to be implemented is accommodated with MPLA tool MPLA with newtemplate produces two magic layouts Magic layouts are extracted with 0.5 m technology Converted to Spice netlists Pathmill for delay calculation HSPICE for power measurement Figure 4.1. Design Flow of the Experiment PAGE 46 37 4.2 Tools Used in experiment The tools used in the experimental flow are: MPLA (Magic Programming logic array): A Berkeley tool used for the automatic generation of the two dimensional PLA layouts. This tool accepts the truth table of the PLA as the input and produ ces magic layout of the two dimensional PLA as the output using the scmos technology files prov ided by the Magic layout editors database. MPLA uses the library provided by the magic tool for the synthesis of .mag files. It uses the regular magic tiles in the templa tes (provided with the tool) to produce the regular structured PLA. The tool uses high level la nguage C for the placement of the regular tiles and hence the placement and routing is accomplished. Magic: A Layout editor developed by Berkeley University is used for editing of layouts with .mag file extension. The scmos.tech27 technology file provided by the magic database is edited to introduce interwafer via, which is used for merging of two tw odimensional ICs. Awk: It is a specialised langauge used for the processing of text f iles into alternate formats, and acting on the content of those text files. Like many other l anguages in the common UNIX utility suite, it is an interpreted scripting language. Used to select particular records in a file and perform operations upon them. HSPICE: Synopsys tool used for the simulating the spice files to est imate the accuracy of the outputs and to compute the power of the ASIC design. PathMill: A timing analyzer tool developed by Synopsys, used for the measure ment of the delay of the critical paths. PAGE 47 38 4.3 Results Obtained The following are the results obtained with the synthesis of the t hreedimensional PLA implemented on a MCNC benchmark suite: The results obtained with the reduction in the polysilicon and the m etal layers with the implementation of the greedy algorithm. The worst case delays obtained with the horizontal partition of the PLA. The PLAs thus generated with the horizontal partition were extracted and conver ted to HSPICE netlists. The netlists obtained were given as inputs to Pathmill tool to g enerate the worst case delays of both the partitioned PLAs (ie top and bottom PLAs). The worst of the delays of the two planes is considered. The percentage of savings is computed. Delays obtained from the vertical partition of the PLAs are measured. The worst case delays of the AND plane and the OR plane are calculated. The sum of the delays of the AND plane and the OR plane with same product terms is computed and the worst case of them is searched and reported. The average power is computed with the HSPICE tool with a range of 1000 input vectors and reported for both the twodimensional and threedimensional PLAs with the savings made. Footprint area of horizontally partitioned PLA with the area of 2D PLA i s compared. Footprint area of vertically partitioned PLA with the 2D PLA area is compared. PAGE 48 39 Table 4.1 Optimization Algorithm Implemented on a PLA Truth Table Table 4.2 Worst Case Delay on Vertical Partitioning of the 2D PLAs MCNC Benchmarks I,O,P Delay in 3DPLA VerticalPartition (ns) Delay in 2DPLA (ns) % Change in delay Apex1 45,45,206 17.05 17.216 1.06 Apex3 54,50,280 19.478 19.949 2.3 Apex4 9,19,438 25.764 26.289 1.9 Misex1 8,7,32 3.486 3.4 2.5 Misex2 25,18,29 3.618 3.669 1.3 Seq 41,35,1459 39.789 45.106 11.7 Rd84 8,4,256 19.137 20.16 5.07 T481 16,1,481 34.796 37.312 6.74 Ex1010 10,10,1024 30.106 31.827 5.4 Con1 7,2,9 2.157 2.269 4.88 Z5xp1 7,10,128 11.042 11.613 4.9 MCNC Benchmarks PolyReduction in AND plane MetalReduction in AND plane PolyReduction in OR plane Apex1 36.85% 42.81% 47.96% Apex3 18.42% 69.03% 28.34% Apex4 0.28% 0.05% 21.39% Misex1 6.68% 1.32% 40.15% Misex2 38.08% 37.08% 41.48% Seq 9.79% 46.5% 42.81% Rd84 0.006% 0.0% 60.59% T481 1.3% 2.98% 0.0% Con1 13.7% 13.33% 60.0% Z5xp1 0.03% 0.0% 18.05% PAGE 49 40 Table 4.3 Worst Case Delay on Horizontal Partitioning of the 2D PLAs MCNC Benchmarks I,O,P Delay in 3DPLA HorizontalPartition (ns) Delay in 2DPLA (ns) % Change in delay Apex1 45,45,206 11.53 17.22 33.1 Apex3 54,50,280 13.57 19.95 31.7 Apex4 9,19,438 15.28 26.29 41.8 Misex1 8,7,32 2.87 3.40 16.5 Misex2 25,18,29 3.17 3.67 13.4 Seq 41,35,1459 40.21 45.10 13.2 Rd84 8,4,256 11.84 20.16 41.2 T481 16,1,481 22.28 37.31 40.2 Ex1010 10,10,1024 30.55 31.82 4.1 Con1 7,2,9 2.07 2.27 8.29 Z5xp1 7,10,128 6.92 11.61 40.4 Delay Horizontal Partitioning0 10 20 30 40 50 Ap e x 1 Ap e x4 Misex2 R d8 4 E x1 01 0 Z5 x p 1 BenchmarksDelay (ns) 3DPLA 2DPLA Figure 4.2. Improvement in Delay PAGE 50 41 Table 4.4 Power Savings on Vertical Partitioning of the 2D PLAs Power Vertical Partitioning0 200 400 600 800 Ap e x1 Ape x 4 Mise x 2 Rd8 4 Ex10 1 0 Z5xp1 BenchmarksPower (mw) 3DPLA 2DPLA Figure 4.3. Improvement in Power MCNC Benchmarks I,O,P Power in 3DPLA (VerticalPartition) (mw) Power in 2DPLA (mw) % Change in Power Apex1 45,45,206 83.72 88.63 5.50 Apex3 54,50,280 111.67 116.41 4.07 Apex4 9,19,438 174.78 179.71 2.74 Misex1 8,7,32 12.84 13.203 3.53 Misex2 25,18,29 11.42 12.03 5.04 Seq 41,35,1459 570.35 597.11 4.48 Rd84 8,4,256 101.13 102.64 1.50 T481 16,1,481 193.76 194.35 0.30 Ex1010 10,10,1024 404.02 412.81 2.13 Con1 7,2,9 3.644 3.829 4.82 Z5xp1 7,10,128 51.05 52.94 3.57 PAGE 51 42 Table 4.5 Area Savings with Horizontal Partitioning of the 2D PLAs Area Horizontal Partitioning Ap ex 1 Ap ex 4 Misex2 R d 8 4 Ex1010 Z5xp1 BenchmarksArea (Sq.units) 2DPLA 3DPLA Figure 4.4. Improvement in Area MCNC Benchmarks I,O,P 2D Area (units) 3D Area (Horizontal Partition)(units) % Change in Area Apex1 45,45,206 2230357 1623891 31.4 Apex3 54,50,280 3497244 2438580 32.5 Apex4 9,19,438 1452487 1548800 6.6 Misex1 8,7,32 126321 65436 50 Misex2 25,18,29 217997.5 149548 34.3 Seq 41,35,1459 8876116 8685410 2.5 Rd84 8,4,256 615666 435912 34.5 T481 16,1,481 1736682 1444716 14.9 Ex1010 10,10,1024 3341559 2456268 48.2 Con1 7,2,9 36240.5 28178 38.8 Z5xp1 7,10,128 379002 330636 30.2 PAGE 52 43 Table 4.6 Area Savings with Vertical Partitioning of the 2D PLAs MCNC Benchmarks I,O,P 2D Area (sq. units) 3D Area (Vertical Partition) (sq.units) % Change in Area Apex1 45,45,206 2230357 1528397 27.2 Apex3 54,50,280 3497244 2358267 30.2 Apex4 9,19,438 1452487 1113443.5 6.6 Misex1 8,7,32 126321 63160.5 48.2 Misex2 25,18,29 217997.5 143223.5 31.3 Seq 41,35,1459 8876116 8646416 2.1 Rd84 8,4,256 615666 403186 29.1 T481 16,1,481 1736682 1476964 16.8 Ex1010 10,10,1024 3341559 1729381.5 26.4 Con1 7,2,9 36240.5 22151.5 22.2 Z5xp1 7,10,128 379002 264390 12.7 Area Vertical Partitioning Apex 1 Apex 4 Mi s e x2 Rd84 Ex10 1 0 Z 5 xp1 BenchmarksArea (Sq.units) 2DPLA 3DPLA Figure 4.5. Improvement in Area PAGE 53 44 CHAPTER 5 CONCLUSIONS AND FUTURE RESEARCH A novel technology, three dimensional ICs (3D ICs) is introduced in t his work. The 3D technology is implemented on Programming logic arrays (PLAs). T wo novel approaches of partitioning the PLA, the horizontal partitioning and vertical par titioning techniques are introduced. The past research though concentrated on the partitioning problem, the dependenc y of the subPLAs was quite high. With the virtue of the third dim ensional technology the two planes are highly independent of their optimization techniques which contribu te for reduction in delay factor. A novel algorithm that utilizes the sparsity of the cl assic PLA architecture is used, which enhances the scalability factors. Though area optimization was due to the virtue of third dimension, power savings were realized with the reduction in node capacitance. The related work was concentrated on either the area or the power optimization but th is work has effectively been able to look into area, power and performance factors. MPLA tool provided by Berkeley, generates 2D PLAs automatically, was edited to support 3D PLA s. This tool now generates two magic files which represent the two planes after partition w ith the interwafer vias (VILICs) introduced. Technology files of MAGIC were edited to support these vias. The results obtained for the MCNC benchmark suite are presented. The results obtained w ere compared with the ones obtained from 2D PLA. Horizontal partitioning results have shown a g ood reduction in delay of atleast 30% in most of the cases, which was a good reflection of t he decrease of the critical delay length by almost 50%. The results obtained from area and power sav ings affirm the employed method. There is a good scope for the future research in this work. 3D technology increases the scalability of the number of planes that a 2D PLA can be partit ioned. For example, in horizontal partitioning, each of the two subPLAs obtained can be verticall y partitioned to obtain the AND and OR planes, hence increasing the number of planes to 4 planes and real izing the advantages of both the horizontal and vertical partitioning techniques. PAGE 54 45 REFERENCES [1] Semiconductor Industry Association, The International Technology for Semiconductors, 2001 Edition. [2] Michael Kagan, Simcha Gochman, Doron Orenstein, and Derrick Lin. MMX Microarchitechture of Pentium Processors with MMX Technology and Pentium 2 Microprocessors. Intel Technology Journal Dec 1997. [3] Posluszny et al. Design methodology for a 1.0 GHz microprocessor. in Proc. Int. Conf. on Computer Design: VLSI in Computers and Processors, IC CD98 pp. 17 23. [4] J. Hennessy. Partitioning Programmable Logic Arrays. ICCAD83, pp.1801 81, 1983. [5] G. D. Hatchtel, A. R. Newton, A. SangiovanniVincentelli. Techn iques for PLAs Folding. DAC19 ,pp.14155,1982. [6] G. DeMicheli, A.sangiovanniVincentelli. Muiltiple Constrai ned Folding of PLAs: Theory and Applications. IEEE Transactions on CAD Vol. CAD6,No. 1,pp. 7984,1983. [7] S. Kang, W. Vancleemput. Automatic PLA synthesis from a DDl P description. DAC pp.391397, 1981. [8] J. L. Lewandowsky, C. L. Liu. A Branch and Bound Algorithm for Opti mal PLA Folding. DAC21 ,pp.426433,1984. [9] S. Y. Hwang, R. W. Dutton, T. Blank. A BestFirst Search Algor ithm for Optimal PLA Folding. IEEE Transactions on CAD ,Vol. CAD5,No.3,pp. 433442,1986. [10] D. F. Wong, H. W. Leong, C. L. Liu. PLA Folding by Simulated Anne aling. IEEE Journal of Solid State Circuits Vol. SC22, No.2,pp. 208215,1987. [11] S. Kang, W. M. Vancleemput. Automatic PLA synthesis from a DD LP description. Proceedings of the 18th conference on Design automation p.391397, June 29July 01, 1981, Nashville, Tennessee, United States. [12] Shukri J. Souri, Kaustav Banerjee, Amit Mehrotra, Krishna Sa raswat. 3D Heterogeneous ICs: A technology for the Next Decade and Beyond. 5 th IEEE workshop on Signal propagation on interconnects Venice, Italy May, 1316,2001. [13] Baliga. J. Chips go vertical [3d IC Interconnection]. Spectr um, IEEE Volume 41, Issue 3, March 2004 Page(s):43 47. PAGE 55 46 [14] Arifur Rahman, Rafael Reif. SystemLevel Performance Evaluation of ThreeDimensional Integrated circuits. IEEE transactions on very large scale integr ation (VLSI) systems, Vol. 8, No. 6, December 2000. [15] Krishna C. Saraswat, K. Banerjee, P. Kapur, and S. J. Souri. 3D ICs:A Novel Chip Design for Improving DeepSubmicrometer Interconnect Performance and S ystemOnChip Integration. Proceedings o f the IEEE, Vol. 89, No. 5, May 2001. [16] A. Rahman, A. Fan, J. Chang, and R. Reif. Wirelength distribution of threedimensional integrated circuits. IEEE Int. Interconnect Technology Conf. Pr oceedings, June 1999, San Francisco, pp. 233235. [17] S. J. Souri K. Banerjee, A. Mehrotra, K. C. Saraswat. Mu ltiple Si Layers ICs: Motication, Performance Analysis, and Design Implications. Prc. 37 th ACM Design Automation Conference (DAC), June 59, Los Angeles, CA, 2000, pp. 213220. [18] K. C. Saraswat, S. J. Souri, V. Subramanian, A. R. Joshi, and A. W Wang. Novel 3D structures. Proc. 1999 IEEE Int. SOI Conf. pp.5455, October 47, 1999. [19] Kaustav Banerjee, Shukri J. Souri, Pawan Kapur and Krihna C.S araswat. 3D: Heterogeneous ICs: A technology for the next decade and beyond. 5 th IEEE Workshopon Signal Propagation on Interconnects, Venice, Italy May,1316,2001. [20] JinnShyan Wang, ChingRong Chang, and Chingwei Yeh. Analysis and D esign of HighSpeed and LowPower CMOS PLAs. IEEE Journal of SolidState Ci rcuits, Vol. 36, No. 8,Aug 2001. [21] Egan, J. R., Liu, C.L. Bipartite Folding and Partitioning of a PLA. IEEE Tra nsactions on ComputerAided Design of Integrated Circuits and Systems, Volume 3, Issue 3, July 1984 Page(s):191 199. [22] Hachtel, G. D., Newton, A. R., SangiovanniVincentelli, A. L. An Algorithm for Optimal PLA Folding. IEEE Transactions on ComputerAided Design of Integrated Circu its and Systems, Volume 1, Issue 2, April 1982 Page(s):63 77. [23] Wong, D. F., Leong, H. W., Laung, C. L. PLA folding by simulated annealing. IEEE Journal of SolidState Circuits, Volume 22, Issue 2, Apr 1987 Page(s):208 215. [24 ] Sun Young Hwang, Dutton, R. W., Blank, T. A BestFirst Search Algorithm for Optim al PLA Folding. IEEE Transactions on ComputerAided Design of Integrated Circu its and Systems, Volume 5, Issue 3, July 1986 Page(s):433 442. [25] Makarenko, D. D., Tartar, J. A Statistical Analysis of PLA Folding IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, Volume 5, Issue 1, J anuary 1986 Page(s):39 51. [26] Abe, K., Iijima, J., Nakashima, T., Wakatsuki, Y. Implementation of arith metic algorithms using a PLA. IEEE Transactions on Education, Volume 32, Issue 3, Aug. 1989 Page(s):370 375 PAGE 56 47 [27] Dhong, Y. B.; Tsang, C. P. High speed CMOS POS PLA using predischarged OR arra y and charge sharing AND array. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Volume 39, Issue 8, Aug. 1992 Page(s):557564. [28] Blair, G.M PLA design for singleclock CMOS. IEEE Journal of Soli dState Circuits, Volume 27, Issue 8, Aug. 1992 Page(s):1211 1213. [29] ChunYeh Liu, Saluja, K. K. An efficient algorithm for bipartite PLA folding. IEEE Transactions on ComputerAided Design of Integrated Circuits and Syst ems, Volume 12, Issue 12, Dec. 1993 Page(s):1839 1847. [30] ChuaChin Wang, ChiFeng Wu, RainTed Hwang, ChiaHsiung Kao. A lowpower a nd highspeed dynamic PLA circuit configuration for singleclock CMOS. I EEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, Volume 46, Is sue 7, July 1999 Page(s):857 861. [31] Fan Mo, Brayton, R. PLAbased regular structures and their synthesis. IEE E Transactions on ComputerAided Design of Integrated Circuits and Systems, Volume 22, Issue 6, J une 2003 Page(s):723 729. [32] Yang, Y.Y., Kyung, C.M. Threestep heuristic algorithm for optimal PL A column folding . Electronics Letters,Volume 24, Issue 17, 18 Aug. 1988 Page(s):1088 1090. [33] KwangIl Oh, LeeSup Kim. A high performance low power dynamic PLA with condit ional evaluation scheme. Symposium on Proceedings of the Circuits and Systems, 200 4. ISCAS '04, Volume 2, 2326 May 2004 Page(s):II 881884. [34] Raahemifar, K.; Ahmadi, M. An efficient 01 linear programming for opti mal PLA folding. 8th IEEE International Conference on Electronics, Circuits and Sys tems, 2001. ICECS 2001. The Volume 3, 25 Sept. 2001 Page(s):1135 1138. [35] Engeler, W. E., Lowy, M., Pedicone, J., Bloomer, J., Richotte, J., Chan, D. A hi gh speed static CMOS PLA architecture. IEEE International Conference on Compute r Design: VLSI in Computers and Processors, 1988. ICCD '88., 35 Oct. 1988 Page(s):348 351. xml version 1.0 encoding UTF8 standalone no record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd leader nam Ka controlfield tag 001 001670385 003 fts 005 20051216093340.0 006 med 007 cr mnuuuuuu 008 051122s2005 flu sbm s000 0 eng d datafield ind1 8 ind2 024 subfield code a E14SFE0001255 035 (OCoLC)62323499 SFE0001255 040 FHM c FHM 049 FHMM 090 TA145 (Online) 1 100 Sunki, Supriya. 0 245 Performance optimization in threedimensional programmable logic arrays (PLAs) h [electronic resource] / by Supriya Sunki. 260 [Tampa, Fla.] : b University of South Florida, 2005. 502 Thesis (M.S.C.P.)University of South Florida, 2005. 504 Includes bibliographical references. 516 Text (Electronic thesis) in PDF format. 538 System requirements: World Wide Web browser and PDF reader. Mode of access: World Wide Web. 500 Title from PDF of title page. Document formatted into pages; contains 56 pages. 3 520 ABSTRACT: Increased chip size and reduced feature size has helped following Moores law for long decades. This has an impact on interconnect length, which is resulting in chip performance degradation. Despite the introduction of new materials with LowK dielectrics for interconnects, their delay is expected to substantially limit the chip performance. To overcome this problem the need for new technology has arrived. One such promising technology is the threedimensional Integrated chips (3D ICs) with multiple silicon layers. In this thesis, three dimensional integrated chip (3D IC) technology has been implemented on programmable logic arrays (PLAs). The twodimensional PLAs are converted to threedimensional PLAs to realize the advantages of the third dimension. Two novel approaches for partitioning of PLAs are introduced for topological optimization. 590 Adviser: Srinivas Katkoori. 653 3D integrated chips. 3d via. PLA. Interwafer via. MAGIC layout system. VILIC. MPLA. 690 Dissertations, Academic z USF x Computer Engineering Masters. 773 t USF Electronic Theses and Dissertations. 4 856 u http://digital.lib.usf.edu/?e14.1255 