USF Libraries
USF Digital Collections

Partial evaluation based triple modular redundancy for single event upset mitigation


Material Information

Partial evaluation based triple modular redundancy for single event upset mitigation
Physical Description:
Kakarla, Sujana
University of South Florida
Place of Publication:
Tampa, Fla.
Publication Date:


Subjects / Keywords:
Redundant gates
Dissertations, Academic -- Computer Engineering -- Masters -- USF   ( lcsh )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )


ABSTRACT: We present a design technique, called partial evaluation triple modular redundancy for hardening combinational circuits against Single Event Upsets (SEU). The input environment is given in terms of signal probabilities of the lines. This is useful information to determine the redundant gates of the given circuit. The basic ideas of partial redundancy and temporal triple modular redundancy are used together to harden the circuit against SEUs. The concept of partial redundancy is used to eliminate the gates whose outputs can be determined in advance. This technique fails in cases when the actual inputs to the circuit are not in accordance to the rounded logic values. In such cases the technique of temporal TMR is used. However, there is some overhead in this process because of the voter circuits and the need to choose the outputs computed by partially evaluated circuit and circuit using temporal TMR.
Thesis (M.S.C.P.)--University of South Florida, 2005.
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
Statement of Responsibility:
by Sujana Kakarla.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 90 pages.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001680974
oclc - 62474610
usfldc doi - E14-SFE0001146
usfldc handle - e14.1146
System ID:

This item is only available as the following downloads:

Full Text


Partial Evaluation Based Triple Modular Re dundancy For Single Event Upset Mitigation by Sujana Kakarla A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Engineering Department of Computer Science and Engineering College of Engineering University of South Florida Major Professor: Srinivas Katkoori, Ph.D. Nagarajan Ranganathan, Ph.D. Soontae Kim, Ph.D. Date of Approval: March 24, 2005 Keywords: radiation, hardening, tem poral, redundant gates, simulator Copyright 2005, Sujana Kakarla


DEDICATION In memory of my loving grand parents.


ACKNOWLEDGEMENTS My sincere thanks to Dr. Srinivas Katkoori, without whom, this work would not have been possible. I am grateful to him for assigni ng me this interesti ng task and helping me through out the project with hi s valuable suggestions. I would acknowledge the real debt of gratitude that I owe to my parents and brother for always being there for me. I also thank Dr. Ranganathan and Dr. Kim for being on the committee and giving me detailed review and helpful suggestions. My special thanks to my friends in VC APP group, for providing a wonderful working environment in the lab. I would also like to thank all my friends for their support and encouragement.


i TABLE OF CONTENTS LIST OF TABLES iii LIST OF FIGURES iv ABSTRACT v i CHAPTER 1 INTRODUCTION 1 1.1 Radiation Effects 2 1.1.1 Types of SEUs 5 1.2 CMOS Technology 9 1.3 FPGAs 9 1.4 Radiation Hardening by Design 11 1.5 Partial Evaluation 13 1.6 Temporal TMR 14 1.7 Partial Evaluation Based Triple Modular Redundancy 15 1.8 Results 17 1.9 Organization of the thesis 17 CHAPTER 2 BACKGROUND AND RELATED WORK 18 2.1 Radiation Effects 18 2.2 Radiation Hardening by Shielding 22 2.3 Radiation Hardening by Fabrication Techniques 23 2.4 Radiation Hardening by Design 25 2.4.1 System Level Design Hardening Techniques 25 Coding Techniques 25 Current Monitoring Techniques 28 2.4.2 Resistive or Capacitive Hardening 29 2.4.3 Circuit and Logic Design Techniques 30 Ratioing 31 Rockett Cell 31 2.4.4 Redundancy 32 Lockstep System 33 Triple Modular Redundancy 33


ii Dual Voting Double Redundancy 34 Selective Triple Modular Redundancy 34 2.5 Miscellaneous Techniques 35 2.6 Partial Evaluation 37 2.7 Temporal Latch 3 9 2.8 Voter Circuits 43 2.9 Summary 47 CHAPTER 3 PARTIAL EVALUATION REDUND ANCY 48 3.1 Definitions 48 3.2 Characterizing I nput Environment 49 3.3 Implementation of the Partial Evaluation Based TMR Technique 50 3.4 Algorithms used in Partial Evaluation Based TMR 54 3.5 Advantages and Disadvantages 55 3.5.1 Advantages 55 3.5.2 Disadvantages 57 3.6 Illustrative Example 57 3.7 Summary 60 CHAPTER 4 EXPERIMENTAL RESULTS 62 4.1 Experimental Setup 62 4.2 Results 67 CHAPTER 5 CONCLUSIONS 77 BIBLIOGRAPHY 78


iii LIST OF TABLES Table 1.1 Summary of Single Ev ent Effects on Space Electronics 6 Table 2.1 Error Detecti on and Correction Codes [2] 26 Table 3.1 Computation of Output Probability 50 Table 4.1 Results for circuits wi th positive gains and rounding range as 0.0 < p < 0.2 => logic ‘0’ and 0.9 < p < 1.0 => logic ‘1’ 67 Table 4.2 Results for circuit with negative gains and rounding range as 0.0 < p < 0.2 => logic ‘0’ and 0.9 < p < 1.0 => logic ‘1’ 69 Table 4.3 Results for circuits with positive gains and rounding range as 0.0 < p < 0.3 => logic ‘0’ and 0.8 < p < 1.0 => logic ‘1’ 70 Table 4.4 Results for circuit with negative gains and rounding range as 0.0 < p < 0.3 => logic ‘0’ and 0.8 < p < 1.0 => logic ‘1’ 72 Table 4.5 Results for circuit with positive gains and rounding range as 0.0 < p < 0.4 => logic ‘0’ and 0.7 < p < 1.0 => logic ‘1’ 73 Table 4.6 Results for circuit with negative gains and rounding range as 0.0 < p < 0.4 => logic ‘0’ and 0.7 < p < 1.0 => logic ‘1’ 75


iv LIST OF FIGURES Figure 1.1 Plot of Variation of Critical Transient Width With Feature Size [14] 3 Figure 1.2 Occurrence of SET E rrors in Sequen tial Circuits 7 Figure 1.3 Temporal Relationship fo r Latching a Data SET as an Error 8 Figure 1.4 Temporal Relationship for La tching a Clock SET as an Error [14] 8 Figure 1.5 Power vs Throughput Graph for SET-Mitigation Techniques [35] 14 Figure 1.6 Flow of PTMR Technique 16 Figure 2.1 Configuration Memory [15] 21 Figure 2.2 Resistive Hardened CMOS SRAM Cell Design [36] 29 Figure 2.3 Hardening by Use of Ratioing 31 Figure 2.4 Rockett Cell [1] 32 Figure 2.5 Triple Redundant FPGA Inputs [18] 34 Figure 2.6 Dual Voting Double Redundancy [15] 35 Figure 2.7 Effect of Transients With Low Pulse Widths 40 Figure 2.8 Effect of Transi ents With High Pulse Widths 41 Figure 2.9 Spatially Redundant Latch 42 Figure 2.10 Temporally Redundant Latch 43 Figure 2.11 Majority Vo te Circuit [15] 44 Figure 2.12 Majority Vote Ci rcuit Using LUTs [15] 44


v Figure 2.13 Majority Vote Ci rcuit Using BUFTs [15] 45 Figure 2.14 Majority Vote Circ uit Using LUT and FF [15] 46 Figure 2.15 Implementation of Modul e Redundancy and Mitigation On Single Chip [15] 46 Figure 3.1 Implementation of Redundancy Using Partially Evaluated Circuit 51 Figure 3.2 Evaluation of Temporal TMR 52 Figure 3.3 Overall Implemen tation of the Technique 52 Figure 3.4 Algorithm to Gene rate the Reduced Circuit 53 Figure 3.5 Algorithm to Identify if a Gate is Redundant 54 Figure 3.6 Algorithm That Describe s the Overall Implementation of the PTMR Techinque 55 Figure 3.7 Comparison of Area/Delay Overhead For Different Redundancy Techniques 56 Figure 3.8 Original Circuit 57 Figure 3.9 The Rounded Values Ar e Propagated Over the Circuit 58 Figure 3.10 Reduced Circuit 59 Figure 3.11 Circuit That Co mputes Output From Part ially Evaluated Circuit 59 Figure 3.12 Circuit That Computes Out put From Temporal TMR Circuit 60 Figure 3.13 Circuit Showing The Overall Implementation 61 Figure 4.1 Experimental Flow 63 Figure 4.2 Resolution Function 64 Figure 4.3 Fault Insertion On Line “A” by SEU Simulator 65 Figure 4.4 Fault Insertion On Line “A” by SEU Simulator 65 Figure 4.5 Validation of the Technique 66


vi PARTIAL EVALUATION BASED TR IPLE MODULAR REDUNDANCY FOR SINGLE EVENT UPSET MITIGATION Sujana Kakarla ABSTRACT We present a design technique, calle d partial evaluation triple modular redundancy for hardening combinational circuits against Single Event Upsets (SEU). The input environment is given in terms of signa l probabilities of the lines. This is useful information to determine the redundant gates of the given circuit. The basic ideas of partial redundancy and temporal triple modul ar redundancy are used together to harden the circuit against SEUs. The concept of par tial redundancy is used to eliminate the gates whose outputs can be determin ed in advance. This techni que fails in cases when the actual inputs to the circuit ar e not in accordance to the rounded logic values. In such cases the technique of temporal TMR is used However, there is some overhead in this process because of the voter circuits and the need to c hoose the outputs computed by partially evaluated circuit a nd circuit using temporal TM R. For testing the circuit exhaustively against SEUs, a fault insertion simulator is used. This simulator introduces errors in the circuits during simulation whic h represent SEUs. This technique of partial evaluation redundancy is thoroughly tested on MCNC’91 benchmarks using Cadence NCLaunch simulator. By employing this techni que, in most of the cases we can reduce the area overhead of the hardened circuit when compared with the traditional Triple Modular Redundancy (TMR). The improvement in area is based on the total number of gates and the actual number of outputs. For ci rcuits with large number of gates and less number of outputs, there is greater savings in area. In some cases, the area overhead because of the proposed technique is greate r than the traditional TMR. This usually occurs in smaller circuits or in circui ts with more number of outputs.


1 CHAPTER 1 INTRODUCTION High performance, low power consumption, increased speed, and cost are the key factors in circuit design. One of the major obstacles for reliable space based computational systems is the occurrence of SEUs in electronic ci rcuitry. Upset of the control has more serious system level consequences [1]. Radiation in space occurs because of the fusion process occurring in the sun which creates a constant stream of pa rticles flowing through space. The earth’s atmosphere helps in filtering the ionizatio n radiation. Protons and heavy ions emitted by sun, galactic cosmic rays, and particles trappe d in the earth’s magnetic field are the main contributors to space radiation. There are different types and levels of radiation around the earth. At regions 500 km above the surface of earth, i.e., at Low Earth Orbit, radiation doses are the lowest. Only a few heavy ions penetrate the magnetic fields at this level. Van Allen belts increase the dosage levels at Polar Regions. More heavy ions are able to penetrate in these regions. At geosynchronous orbit, doses are still higher. Geomagnetic shielding causes interplanetary space to have the highest dosage levels. Continuous exposure to radiation causes de gradation of the devi ce. The level of degradation is based on the total dose and dose rate of irradiation. Radiation has l ong-term effects such as total ionizing dose and single partic le effects such as singleevent latchup and single-event upset [38].


2 1.1 Radiation Effects The behavior of an electronic circuitry is different in outer space as compared to the normal environment because it is exposed to a flux of ionized particles [32]. The various effects of ionizati on are summarized below: 1. Penetration of ions through the space craf t generates X-rays. These X-rays could cause the ionization of silic on and silicon dioxide layers. This results in temporary effects such as corruption of memory cell contents or permanent effects when the ionization triggers latchup in the devi ce. Electron-hole pairs created by the electrons and x-rays are usually accu mulated at power supply nodes [38]. 2. Data in the device changes because of ch arge collection at a circuit node [38]. 3. Ionization changes the characteristics of a device by shifting tran sistor thresholds [38]. 4. The major effect is seen as a change in the contents of a MOS memory cell which occurs when the energy level of charged particle is high and it passes through the diffusion region of a susceptible node [32]. 5. The other effects ionization could have are change in leakage current, charge trapping, and generation of interface states [32]. When a highly energized particle passes through a sensitive device, it loses energy and ionizes the material. This forms a dense track of electronhole pairs. Stopping power, usually termed as Linear Energy Transfer (LET), is the rate at which the ion loses energy. Under the influence of electric field, the electron-hole pair s drift in opposite direction and are collected at the respective voltage sources. This produces a current transient [14]. Feature size plays a role here, in that the critical charge collected at a sensitive node which is able to produce an upset decreases as the square of the feature size [14]. Thus, decrease in feature size increase s radiation tolerance. Oxide purity and thickness also determine the amount of degr adation. The system function of a memory cell determines the effects a SEU would have on it.


3 Figure 1.1 shows the relati on between feature size and critical transient pulse width. If the transient pulse width is smaller than the critical width, the transient is attenuated because of the inherent inertial de lay of the gate. This causes the pulse to die out after it passes through some gates. However, if the pulse width is equal to or greater than the critical width, the transien t propagates through the gate [14]. Figure 1.1 Plot of Variation of Critical Tr ansient Width With Feature Size [14] The main radiation effects in Microelectronics are: 1. Long Term Ionizing Radiation Effects (total dose) 2. Transient Ionizing Radia tion Effects (dose rate) 3. Single Event Effects (i) Single Event Upset (SEU) (ii) Single Event Transient (SET) (iii) Single Event Latchup (SEL) (iv) Single Event Functional Interrupt (SEFI) (v) Single Event Gate Rupture (SEGR) (vi) Single Event Burnout (SEB) 4. Displacement Damage


4 Total Ionizing Dose (TID): High energy protons and electrons cause TID. Highenergy ionization radiation ge nerates electron-hole pairs within the oxide of a MOS device which cause charge bui ldup. Charge buildup has many e ffects such as change of threshold voltage, increase in leakage curr ent and change in timing of the MOS transistors. This could lead to functional fa ilure of the device [27]. Leakage currents are generated at the edge of MOS transist ors and neighboring N-type diffusions. Single Event Effects result from a single energetic particle. Single-Event Upset (SEU): Change in the state of a device or transient induced by a heavy ionizing particle such as a cosmic ra y or proton. SEUs alte r the logic state of a static memory element and cause transient pulses in combinational logic paths. These errors can be corrected by resetting or rewri ting of the device. Hence they are termed as soft errors. These soft errors occur due to th e change in state of a digital memory element because of the ionizing particle. SEUs occu r when a high energy particle hits a storage node and generates a strong ionization tract [3]. Charge transfer from one node to another occurs as the ionizing particle passes through the devi ce. This lowers the voltage of a memory cell and changes its internal state. The stored logic state is reversed if the collected charge on node is larger than the cri tical charge [3]. Digital, analog, and optical components are prone to SEUs. Single event latchup (SEL) is a condition th at causes loss of device functionality due to a single-event induced current stat e. Latchup within a CMOS device occurs because of a single charged particle. With suffi cient energy the charged particle triggers the parasitic npn-pnp circuit found within CMOS circuits. Because of latchup, high currents flow through the parasitic bi-polar transistors and destr oy the device. Device design characteristics such as material re sistivity, device geometry, and layout and contact characteristics affect th e SEL resistance of a device [27]. SELs are hard errors, and are potentially destructive ( i.e. may cause permanent damage). An SEL is cleared by a power off-on reset or by completely removing power to the device.


5 Single event burnout (SEB) is a condition that can cause device destruction due to a high current state in a power transistor. SEB causes the device to fail permanently because of failures of power MOSFET tr ansistors in high power applications. Single-event gate rupture (SEGR), which is the formation of a conducting path ( i.e. localized dielectric breakdown) in the gate oxide resulti ng in a destructive burnout. Single Event Transient: Effect s (e.g., current spikes in operational amplifiers) of short time duration that may lead to other eff ects downstream of the affected site that are longer in duration. SETs cause soft -error in the user data when it is registered at the flip flop inputs [39]. Displacement damage is due to nuclear intera ctions and causes latt ice defects. It is mainly a long-term non-ionizing damage cau sed by protons, electrons and neutrons. Collision between the incoming particle and a lattice atom subsequently displaces the atom from its original lattice position. 1.1.1 Types of SEUs 1. Single bit errors: A single ion passage cau ses a single bit to change its state. 2. Multiple bit errors: If the single ion causes multiple bits to flip in adjacent cells, then a multiple bit upset occurs. Multiple bit upsets require that the adjacent cells be mapped to the same logical address. 3. Control errors: SEUs causing operational di fficulties in the device such as reading or writing to the incorrect address or the occurrence of a f unctional halt [13]. System level effects of SEUs can be cat egorized as those that affect data responses and those that effect control of the device [2]. The effect of SEU transients on analog devices is slightly different from thei r effect on digital syst ems. Transients in analog devices propagate to the digital elect ronics of the surrounding circuitry. Specific circuit designs determine the effect these transients would have on the system. The definition of an analog SEU phenomenon is specific to the interface circuitry surrounding


6 the radiation-sensitive device. SEUs for the conventional analog-to-d igital converters can be categorized as noise, offset, and control e rrors. Although noise and o ffset errors do not disturb the system performance, control erro rs affect device operat ion, and hinder system performance [2]. Table 1.1 summarizes the lis t of errors occurring in space electronics [32]. Table 1.1 Summary of Single Event Effects on Space Electronics Type of error Description SEU in the configuration memory A bit flip in the configuration memo ry caused by a single particle strike, Neutron or alpha. SEFI in the target circuit A permanent mismatch of the output of the target circuit. It is created by a SEU in the configuration memory th at alters the Look-up Tables (LUT) or the routing of signals in the target circuit. Configuration circuitry failure A failure in the controlling circui try of the FPGA. Configuration and read back operations fail. Latchup The activation of a parasitic structur e in the silicon by a single neutron strike. The main consequences of la tchup effect are an increase of the current consumption and failures in the target circuit, the configuration memory or the controlling circuitry of the FPGA. Hard error A permanent failure in the FPGA that cannot be recovered after switching the beam off, switching the power off/on, and reconfiguration. Definition: SEU Mitigation: The pro cess of applying design techniques to strengthen the functional integrity of the user design and protect it from the effect of any Single Event Upset [29].


7 Various techniques are implemented to eval uate the rate of functional bit failure. Systems using ASIC technology use static upse t rate while those using Virtex series FPGAs define a dynamic upset rate [29]. SEEs affect sequential circ uits as well. Figure 1.2 represents the circuit topology of sequential circuits. Data from latch U1 is released to the combinat orial logic on a clock edge. Before the next clock edge, the output of the combinatorial logic reaches the latch U2. The latch stores the data pr esent at its input, at this cl ock edge. Consider a heavy ion striking within the combinatorial logic. For fa st logic, the SET app ears at the latch U2. The SET is propagated through the circuit based on the arrival time of the SET and latching edge of the clock. Figure 1.3 illustrate s this. Suppose the actual data to the circuit is low and a positive SET appears at the input of the latch. If the transient is high for the period between setup time before the clock edge to a hold time after the clock edge, then the transient is interpreted as data and stored in the latch. Figure 1.3 shows four instances of time at which the SET can arrive. Condi tions (a) and (d) satisfy a non-latching condition. Condition (b) represents the earliest arrival time for latching condition and condition (c) represents the late st time. Similar errors occur from transients appearing on the clock line as shown in Figure 1.4. Figure 1.2 Occurrence of SET Erro rs in Sequential Circuits


8 Figure 1.3 Temporal Relationship for Latching a Data SET as an Error Figure 1.4 Temporal Relationship for Latc hing a Clock SET as an Error [14]


9 Upsets occur in sequential circuits when the clock is low and the latch is in hold state. Thus, latch SEU rates do not depend on clock frequency. SETs in combinatorial logic are stored if they occur at clock edge. Thus the effect of SETs depends linearly on the frequency. 1.2 CMOS Technology Most of the circuits in space applications use CMOS processes because of their desirable features such as hi gh integration density, low power dissipation, and high noise immunity [7]. However, CMOS circuits are prone to the following failures: 1. Increase in Standby supply current. 2. Degradation of input levels and inte rnal noise immunity. This introduces functional failures. 3. Increase of rise time and decrease of fall time. This causes change in switching and dynamic parameters [33]. CMOS static RAMs have very high dynamic power consumption. There is negligible static leakage current [3]. CMOS devices are susceptible to single ev ent effects (SEE). This includes cell error events such as single event upset (SEU), and destructive conditions such as single event latchup (SEL). SEE results may vary based on temperatures [13]. 1.3 FPGAs FPGAs consist of devices comprised of an array of cells that can implement a variety of logic functions pl us some interconnection networ k. Configuration information is downloaded onto the FPGA to set up the cells and interconnection network to realize a specific circuit [17]. FPGAs have resulted in higher speed, lower core voltages, improved integration, and lower pow er consumption [20]. They are used in applications where


10 speed, efficiency, and the ability to program hardware to perfor m any user-specified operation are important and cannot be achieved by the traditional programmable processors. This is possible because of the ab ility to customize the data path within an FPGA to an application-specific computation [16]. FPGAs find their importance when production volumes are too low to develop an ASIC [24]. The development of SRAM cells enables FPGAs to be reprogrammed dur ing design or in sp ace [38]. FPGAs differ from ASICs in that they can be configured after the space-craft launch. This enables reusing of FPGA resources for multiple instru ments [16]. The main advantages of using FPGAs in space circuits are: 1. High flexibility in achieving multiple require ments such as cost, performance, and turnaround time. 2. In-flight reconfiguration SRAM-based FPGAs have the ability to change on-site the implemented function and hence ar e very convenient for space-based applications. 3. Decrease in number of devi ces required reduces weight. 4. The decrease in number of solder connections improves reliability. 5. Increased flexibility to make design change s after board layout is complete [40]. Hence changing or updating of hardware is easy. However, there are some problems asso ciated with using FPGAs in space environment: 1. FPGA-based applications are sensitive to heavy ion and proton induced SEUs (internal flip-flops). User design flip-flops, FPGA conf iguration bit stream, FPGA registers, latches, and internal state are affected by SEUs. 2. SRAM-based FPGA might have their c onfiguration altered by radiation. The function implemented by a device can be permanently affected by affecting the configuration memory of SRAM-based FP GAs by SEUs. The general solutions to this problem are Partial or to tal reconfiguration, and TMR.


11 3. FPGA Logic may be sensitive to transient errors. Smart clocking strategies can reduce this problem [40]. The basic concepts underlying the radiation tolerant Virtex FPGAs are as follows: 1. Re-configurable Logic Devices : Logic devices that can be customized more than one time are called Re-configurable Logi c devices. These devices use remote hardware changes and functional evoluti on for fast SEU Detection and correction [15]. 2. SEU Protection Design Techniques: Thes e techniques employ SEU detection and correction strategies which do full desi gn verification in a very short time and SEU correction without any functional inte rrupt. SEU mitigation is achieved by: 1. SEU Resistant Mitigation circuit 2. Module and logic node redundancy and mitigation 3. Logic partitioning for mitigation 4. Dual and triple device redundancy and mitigation [15]. Errors in an FPGA can be corrected by fixing the incorrect design and reconfiguring the FPGA with an updated conf iguration bit stream. Also, custom circuit designs can be created to avoid FPGA resources that have failed during the course of the spacecraft mission [16]. 1.4 Radiation Hardening by Design In recent years, high performance ICs for radiation environments are being designed because of the rapid pace of design innovation for commercial IC applications. Decrease in feature size of the device helps in the increase in speed and density of IC designs [38]. Creation of autonomous spacecr aft which rely on information processing on-board the vehicle made the ra diation hardening of circuits a more crucial aspect [38]. The main goal of design hardening techniqu es is to manufacture SEU-immune circuits using standard CMOS processing, with no addi tional masks. The basi c idea is to provide


12 memory elements with feedback [9]. Ha rdening commercial CMOS technologies has many advantages such as low power, low co st, higher speed, and hi gher density. This also facilitates the use of more adva nced, deep sub-micr on technologies [38]. Radiation hardening means the extra protective package that is provided to the chips to make them more resistant to the ioni zing radiation [38]. Har dness of a circuit can be achieved in three ways: 1. Shielding the circuit. 2. Process hardness: Altering the me thods of manufacturing the chips. 3. Design or layout techniques: Changing the designs of the chip. The principle of radiation hardening by design is the mini mization of radiation impact by the use of layout techniques. Design of radiation-resistant chip is a complicated process and involves deep understanding of the various concepts such as: 1. Device physics of the transistor s and other circuit elements. 2. Effect of various kinds of radiation on th e circuit elements at the atomic level. 3. Effect of fabrication processes (suc h as patterning, oxide deposition, ion implantation, etc.,) on de vice sensitivity, when expos ed to different radiation sources [41]. The conventional design and manufactur ing techniques employed to speed-up the circuit or lower the price could lead to adverse effects when exposed to ionizing radiation. Component radiation hardness depe nds on the orbit and time frame of the mission Generally, two complementary approaches are used in the design of radiationhardening circuits. They are: 1. Design radiation-hard characteristics into the chip. 2. Use special techniques that mitigate radiat ion effects at the processing phase [38].


13 Radiation hardening makes satellite design flexible. The development of radiation hardened SRAM cells enables the use of fi eld programmable gate arrays which can be reprogrammed during design or in space. This simplifies circuitry, reduces cost, and eases space-based usage [38]. To design circu its which are radiation hardened a set of hardened cells suitable for gate array or full custom designs are developed. Elements which are not radiation tolerant are replaced with the hardened elements. A general way of hardening the circuits is to make change s in technology parameters such as varying the Linear Energy Transfer (LET) threshol d, power supply, and on-chip detection mechanisms such as use of parity bit checks to provide error notices (in which case SEE are treated at system level by interruption handling) [33]. Radiation hardened versions have impr oved tolerance against long term Total Dose degradation, and Heavy ion induced eff ects such as latch-up. These versions for deep sub-micron technologies offer solutions to applications above 100 Krad (Si) range total dose requirements [33]. 1.5 Partial Evaluation Partial Evaluation is an optimization technique commonly used in software applications to increase the efficiency of the code. Optimization of a design by having the knowledge of its properties and structure is generally termed as partial evaluation. The process of specialization is done systematicall y, we start from a gene ral circuit and some data known at run-time and then using this data, transform the general circuit into specialized circuit [17]. The basic principle involved in this technique is that the known arguments to function calls are propagated throughout the definiti on of a function, yielding a new specialized func tion [24]. Thus partial eval uation can be thought of as specializing a program to its st atic inputs [17]. The specialized program is specific to a particular application and henc e works on fewer cases. This program is more efficient because some computation has already been done during specialization. The program which performs partial evaluation is termed as partial evaluator. The resultant program obtained is called re sidual program [26].


14 Logical inferences by unfolding predicate calls, propagation of instantiated values through the program, and evalua tion of built-in predicates are the main techniques implemented by the partial evaluator [26]. Op erations involving constant operands are eligible for reduction based on partial evalua tion [28]. A very simple form of partial evaluation corresponds to run-time constant pr opagation. Since propagated values are known at run-time only, we call this, dynamic synthesis of circ uits [17]. The atomically synthesized circuit is determin ed at run-time, and the dynamic input data determines the circuit. Partial evaluation can be implemented ‘in place [24].’ 1.6 Temporal TMR Temporal TMR has the main advantage of low power require ments. There are two ways of implementing Temporal TMR. In the first kind, three identical threads of computation followed by a voting circuit is used. This means that same data value is applied on each of the thread at three succ essive clock cycles. The next kind of implementation uses the register filtering technique. This uses multiple registers for each combinational logic output. Each register is clocked by a separate delayed clock. Voting of the resulting sampled data values is done to determine the correct output [35]. The graph in Figure 1.5 shows the throughput versus power graph for the different techniques. Singlet is the unhardened case [35]. Figure 1.5 Power vs Throughput Graph for SET-Mitigation Techniques [35] Singlet TMR-in-hardware TMR-in-time Register Filteriing Power Throughput


15 The main disadvantage of the TMR-in-tim e technique is that the throughput is reduced to 1/3rd of the original value. However, th e power requirements of this technique are low when compared to the standard TMR technique. It is because, in the absence of radiation-induced transients, data switching occu rs only at the beginning of the first clock cycle and none during the two successive clock cycles [35]. Based on the delay between clocks, regi ster filtering exhibits a variable throughput. If the delay is zero, the throughput, and power requirements are equal to that of an unhardened design. Advantage of this t echnique is that the se paration of clocks can be controlled by programming to determine th e maximum transient duration that can be filtered. This determines the energy of radiati on that can be tolerated. Hence the chip can be hardened to the desire d degree by programming [35]. 1.7 Partial Evaluation Based Triple Modular Redundancy In the proposed approach, the basic ideas of partial evaluation and triple modular redundancy are used together to device a scheme for designing radiation tolerant circuits. However, the behavior of input environment should be known in advance to implement this technique. The input environment is gi ven in terms of signal probabilities of the lines. Knowing the signal probability, these va lues are propagated to the output of the circuit. Logic value of signals with probabilities within the ra nge of 0.0 to 0.2 is set to ‘0’ and for those in the range of 0.9 to 1.0, it is set to ‘1’. These integer values are then propagated instead of the original probabilitie s. If any input to a logic gate is its controlling value, then the gate can be elimin ated. For instance, the controlling value of “and” and “nand” gates is a logic ‘0’ and for “or” and “nor” gates it is a logic ‘1’. By eliminating all the redundant gates, a reduced circuit which is functionally equivalent to the original circuit is obtained. This reduced circuit is then duplicated. To get the final correct output, the outputs generated by th e original circuit and the two duplicated circuits are voted. Voting is done by a majority voter. The te chnique of partial evaluation fails in cases where the actual inputs to the circui t are against the rounde d logic values. In such cases the output is evaluated by the technique of Temporal TMR. A multiplexer is


16 used to choose the outputs determined from partially evaluated circuit and from Temporal TMR. The overall flow of the Partial Ev aluation based TMR technique is shown in Figure 1.6. In the whole process, we assume that the majority voter circuit and the multiplexer are radiation hardened. Efficiency of the circuit depends on the actual inputs to the circuit and signal probabilities of the lines. Figure 1.6 Flow of PTMR Technique Roundin g probabilities Propagate probabilities Resolve logic on signals Obtain functionally equivalent reduced circuit Determine output from Partially evaluated Step 1 Step 2 Step 3 Step 4 Determine output from Temporal TMR circuit Step 5 Validation Step 6 Step 7 Selection of output from the two sets of values


17 1.8 Results By implementing the partial evaluation based redundancy technique, it was found that a good amount of area savings is achieved. The area savings are high for circuits with large number of gates and less number of outputs. In cases where there are less number of gates and relatively more number of outputs, the area savings are very less. In the worst case, it could so happen that overhead involved in the implementation of technique is greater than area savings. This occurs in circuits wher e the number of gates that can be eliminated is more than the overhead involved in the process. 1.9 Organization of the Thesis The rest of thesis is organized as follows: Chapter 2 deals with the background, and re lated work. It describes in detail the various effects of radiation, t echniques devised to provide radiation hardness, work done in partial evaluation, and the various kinds of majo rity voters developed. Chapter 3 describes in detail the pa rtial evaluation based triple modular redundancy. Chapter 4 discusses about the experimental setup, the results obtained, and analysis of the results. Chapter 5 gives the conc lusions of the thesis.


18 CHAPTER 2 BACKGROUND AND RELATED WORK One of the major concerns for reliable sp ace based computation is the occurrence of SEUs. This chapter deals with the various radiation effects on space electronics and the techniques employed to mitigate these errors. The mitigation techniques can be classified into three categories, namely radiation hardening by shielding, radiation hardening by fabrication, and radiation hard ening by design. Each of these techniques is explained in detail in the following sections The concept of partial evaluation, which is an efficient optimization technique in soft ware, is explained a nd its application in hardware is presented. The selection of proper majority voter is a key point in implementing techniques using redundancy. Th e voter circuit should be highly radiation hardened to ensure that no errors are introduced because of the voter circuit. Thus different kinds of voter circuits are explained. 2.1 Radiation Effects The earth is surrounded by particle charge d belts called Van Allen Belts. These belts mainly arise from the earth’s magnetosphere-field. They are: 1. Proton trapped belt which extends from 400 to 900 km above the earth’s surface. This belt consists of electrons and protons. 2. Electron trapped belt which extends up to 56,000 km. This belt is almost entirely composed of electrons [33].


19 However the height of these belts is not fixed. They could vary depending on the concentration of electrons and protons. For instance, when solar flares occur, there is intense burst in the number of high energy protons and heavy ions This increases the Van Allen belt by a factor of 1000. Similarly, Galactic cosmic rays are composed of heavy ions in varied abundance. The heavy radiation is due to the influence of increase in the number of high-energy protons from heavy solar flares formed during the excursion of satellite [33]. The generic circuits work without any faults in their intended environments. However, when they experience harsh conditions in outer space such as the ionizing radiation, they do not work properly. Electrica l parameters of the chip change when the ionizing radiation strikes the tran sistors in the circuit. This le ads to the generation or flow of extra electrical currents which alter th e operation of circuit. Also, poor design techniques cause chemical weaknesses in th e atomic structure of the transistors when they are exposed to ionizing radiation [41] Cosmic rays are high energy ions, protons, and neutrons. Interaction of ICs with cosmic rays leads to SEE [33]. Conversion of primary radiation causes the generation of el ectromagnetic rays when electrons and protons inte ract with any kind of materi al that they encounter. Heavy particles are generated by nuclear reaction wh en protons directly interact with the material. These heavy particles can induce vol ume effects [33]. Electromagnetic rays and electrons result in electron-hole genera tion which creates ionization in SiO2. Basic device characteristics such as threshold voltage and mobility change due to combination effects and mobility differences. Increase in dose causes sub-threshold currents to increase and affect NMOS structures by changing parame ters. These currents are induced by parasitic structures resulting from ionization [41]. Electron-hole pairs are created in device nodes and diffusions because of ionization. Latch-up effect causes the activat ion of parasitic SCR structures when the electron-hole pairs reach the P or N well. Th ermal destruction of the component occurs because of high currents. Soft errors are created when the current pulse appears in a depleted zone (drain of the transistor in the off-state mode) because of collection


20 phenomenon. When CMOS technology is being used, NMOS is more sensitive than PMOS. Geometry of the device plays a role in that the electrical ch arge required for bit flip decreases with scaling down [33]. Bipolar technologies are mainly affected by the degradation mechanisms of neutrons which involve atomic dislocation and nucle ar displacement and re duction of minority carrier lifetime and mobility. Electromagnetic pulses (EMP) is another factor against which the circuits must be protected. Hi gh currents and voltage s are generated in conductors and electronics because of EMP. Box and hardware shielding are the common precautionary measures for th is kind of radiation [33]. Configuration memory defines the function of logic resources and interconnections of these resour ces [4]. Configuration bit st ream determines the function of device. The design function can be change d by making changes to the bit stream. This has the advantage of adaptability. However it is this property that makes the device susceptible to SEUs [19]. Upsets in confi guration memory can be detected by comparing its contents with a known, good st ate and can then be corrected by refreshing the state of memory [4]. Static upsets in configuration memory do not affect functionality. Upsets need to be corrected only to ensure th at errors do not accumu late [29]. Virtex Configuration Memory is compos ed of static latch memory cells as shown in Figure 2.1. It is divided into frames and each frame is uniquely addressable. Design functionality for Static RAM based Programmable Logic De vices is defined by configuration SRAM contents [15]. The original behavior of a SRAM-based FPGA could be changed by the mapped design, when a flux of highly energized particles hits its surface. Thus any transient fault changes the mapped circuit permanently, when it hits the memory. In addition to affecting memory, charged partic les also change the logic function of the mapped circuit when they hit the on-chip c onfiguration SRAM [18] Single Event Upsets alter design functionality which leads to transient upsets that induce undesired logical conditions [15]. Upsets in configuration me mory cause a local high current because of driver contention when two inverter outputs of different states get connected. Errors in the configuration memory can be corrected w ith the process of non-intrusive scrubbing. In this process, partial re-configuration is used to correct the ups ets once the errors are


21 detected without interfering with the operati on of the loaded design [20]. Memory cells are anticipated to upset at a rate of 0.13 upsets/hour (3.2 upsets/day) in a normal sun environment and upset at a rate of 4.2 ups ets/hour during the peak upset rate [16]. Figure 2. 1 Configuration Memory [15] Constant ‘0’ and ‘1’ logic values in Vi rtex FPGA designs are usually generated by half-latches. This avoids the use of expe nsive logic resources, su ch as look up tables. These half-latches are neither initialized nor controlled with programming data. Thus reading back of the device’s programming data does not help in detecting half-latch inversions. Other techniques such as updati ng the FPGAs configuration memory or partial configuration are also in effective. Thus SEU effects in half-latches are not easily detectable and correctable [4]. The SEU mitigation techniques implemented for halflatches are discussed in [4]. SEUs affect software systems as well. Data and code of the application are affected by faults. SEUs may cause inform ation corruption, leadin g to a change in program flow or causing a program to execute an infinite loop. Transient faults occur because of radiations, electromagnetic inte rference, and power glitches [21]. Pure software methods are required for control flow error detection in cases where the hardware design is fixed and cannot be ch anged [22]. A correct control flow is a fundamental requirement for correct executio n of computer programs. Control flow


22 errors pose threat to the dependability of computer systems [21]. The use of software techniques to detect and tolerate faults in the hardware is termed as Software Implemented Hardware Fault Tolerance. This has the advantage of improving the availability of the system without introducing any hardwa re overhead [22]. Every circuit has some amount of inherent tolerance to mild ra diation. A radiation tolerant IC exhibits some de gree of radiation survivability. However a radiation hardened circuit is specifically designed to withstand cer tain radiation levels. Radiation hardness is ensured by using Radiation Hardness Assured (RHA) devices for electronic circuitry. These devices are process monitored, designed, and layout controlled [42]. Hardness of a circuit is a measure of the total dose of radia tion to which an IC can be subjected before critical parameters cross a predefined thresh old [31]. Various harden ing techniques have been proposed. The main concepts of SEU correction techniques are: 1. Re-configuration: This is a widely used technique for traditional FPGAs. It is capable of repairing all st atic upsets. However, ther e is a momentary loss of service. 2. Partial Configuration: This is used mainly for Virtex FPGAs only. The main advantage of this technique is that it can repair single upsets in individual frames with no loss of service an d functional disruptions. 3. Partial Configuration Cycles: Th ese act on single frames [15]. 2.2 Radiation Hardening by Shielding A simple technique of providing radiation hardness is shielding the circuits made from standard components by metals such as lead which attenuate radiation and electromagnetic pulses. It may seem obvious th at increasing the thickness of the package increases the radiation tolerance. This howev er is not practically true, as there is a


23 limitation on the attenuation of the signals beca use of shielding [42] Also, the increased packaging slows down the high-energy particles and gets charged resulting in TID [36]. 2.3 Radiation Hardening by Fabrication Techniques A few general chip fabrication techniques em ployed to make radiation hardened chips: 1. Change the method of manu facturing transistors in the circuit: The unwanted charge which changes the operating characterist ics of the devices is accumulated in the thin layers of oxide that are used to form the working and insulating parts of the devices and in the regions between the transistors. Thus reducing th e thickness of these layers without compromising reliabilit y decreases the sensitivity of transistors to ionizing radiation because of the limitation of unwante d charge. However the design of very thin oxide layers calls for high pe rfection, which would otherwise cause an electrical short. Additional processing steps afte r growing gate oxide layer onto the substrate must be carried out at lower temperatures, because higher temperatures would alter the gate oxide’s atomic structure. This leads to inefficient radiation resistance [41]. 2. Change the way that the transistors ar e combined to form working circuits: Parameters affecting the radiation resistance of the circuit could be varied and set such that immunity of the circ uit increases. For instance, increasing the width of interconnections helps in easy handling of radiation-induced currents which makes the circuit more radia tion tolerant [41]. 3. Another technique is implementation of certain rules such as stacking fewer transistors in logic gates, ci rcuit redundancy, and use of ra diation tolerant parts [42]. 4. Isolating a device form surrounding com ponents ensures that charged ions cannot travel far in the components. This eliminates the possibility of latchup and SEU. Four


24 schemes are implemented based on this prin ciple to make radiation-hardened devices. They are: 1. Junction Isolation (JI): This method is used for CMOS, and other unhardened bipolar designs. It is an electrical method which isolates on-chip components by reverse biasing the junction. However, this technique is not efficient for circ uits exposed to very high radiation levels as they would be susceptible to latchup due to their parasitic PNPN SCR structure. 2. Dielectric Isolation (DI): Component is olation is achieved by thermally growing thick layer of silicon dioxide between adjace nt devices. An oxidation mask is used to grow oxide on wafer only in chosen places. Di electric isolation is a better choice for stringent radiation hardness applications. 3. Silicon-on-Sapphire (SOS): SOS is a more complex form of dielectric isolation. A single-crystalline silicon film is grown over a sa pphire substrate which is a dielectric that has tolerance to radiatio n and protects the device from tr ansient, neutron, and single event effects. A bipolar or FET transistor is ma de by doping silicon. In SO S, active devices can be packaged closer together, as the transistors are built on an insulating substrate because of which leakage current cannot flow between the devices. Latchup cannot occur as there are no parasitic transistors and capacitances. 4. Silicon-on-Insulator (SOI): SOI technology is similar to the process used for SOS devices, except for the substrate used. Silicon-on-Insulator devices can take several forms; one common technique is the SIMOX which is Sepa ration by Implanted Oxygen. Heavy concentration of oxygen is deposited below the wafer’s surface by high current ion implantation system. The wafer is then heated to form SiO2 from Oxygen and also to anneal the damage caused by implant. This l eaves a thin, high-qual ity layer of silicon on top of an insulating layer of SiO2 which is used for device fabrication. Complete isolation of device is achieved by replac ing the silicon between active transistor areas with oxide.


25 This dielectric-isolation plane enables incr eased circuit speeds, a nd radiation hardness [42]. To make the integrated circuits inherent ly radiation hardene d, significant design changes are a better option than chip fabrication techniques [41]. 2.4 Radiation Hardening by Design Many techniques use hardware redundancy to reduce the probability of failure. The design hardening technique s can be categorized as: 1. System level design hardening techniques 2. Resistive or capacitive hardening techniques. 3. Circuit design / Logic design techniques [8]. 2.4.1 System Level Design Hardening Techniques System level design hardening solutions include coding tec hniques and current monitoring techniques. Coding techniques fo r error detection and correction can be adopted in high capacity memory arrays [7]. Coding Techniques Use data word coding techniques and a dditional error detecting and correcting (EDAC) circuitry. Table 2.1 gives the differe nt EDAC codes that are commonly used. The EDAC processor corrects all single bit errors by peri odically scrubbing the entire memory [3]. The technique of SEU scrubbi ng requires less overhea d as all the data frames are reloaded at a chosen interval. This avoids the use of processes such as readback detection, and data verification opera tions. Scrub rate is fixed such that any


26 SEU on the configuration memory is fixed befo re the next occurs. Time between the SEU occurrence and its subsequent correction should be reduced [29]. Maximum error latency is defined by the time interval between tw o successive accesses of the same memory word. Applications involving large memory systems have long error latencies which increases the probability of having multiple upsets on a single word. This makes the EDAC codes ineffective [3]. EDAC techni que is implemented on memory systems and uses checksum-based technique to detect and correct errors on the sa me processing cycle. System availability is improved by reducing transient error recove ry latency. This technique has the advantages of lower size, weight, power, cost, and improved reliability. However, it can be applied only to matrix multiply functions [23]. Table 2.1 Error Detection and Correction Codes [2] Types of Error Detection and Correction codes (EDAC codes) Capability of the EDAC code Parity Single bit error detect CRC code Detects if any error occurred in a given data structure Hamming code Single bit correct, double bit detect RS code Corrects consecutive and multiple bytes in error Convolution encoding Corrects isolated burst noise in communication stream Overlying protocol Specific to each system implementation In the self-checking scheme, computati ons are performed on data belonging to two independent memories. This is to ensure th at the errors are static ally independent as a


27 common error cannot be detected by the self -checking scheme. This technique has the advantage of 100% coverage of virtually any source of error. Howeve r, it can only detect an error, but cannot correct it. It also has the disadvantages of higher cost, weight, power, size, and low reliability [23]. 1. Use of parity checks: A single bit used to determine if the number of logic ‘1’s in the data structure are even or odd is termed as parity. Thus, if odd number of errors occurs in the structure then they can be detected by pa rity. This technique however cannot be used for mitigation. 2. Cyclic-redundancy check (CRC) coding: This scheme performs modulo-two arithmetic operations on a given data stream by considering N data bits as N-1 order polynomial. The result is again interpreted as a polynomial and is the CRC character that could be appended to the data stru cture. For decoding, the generating polynomial divides a bit structure cons isting of data and CRC bits. 3. Hamming code: This is a block error enco ding scheme that encodes entire block of data with a check code. This scheme can be used for sing le bit correcti on but double bit detection. The position of single error is determined by the syndrome represented by Q-digit word when the parity-check ma trix generates Q check bits. This is a commonly used scheme and is suitable for systems with low probabilities of multiple errors in a single data structure. 4. Reed-Solomon coding: This is a bloc k error-correcting coding scheme which groups the check bits into separate words and adds them at the end of the data structure. Multiple consecutive errors in a data structure can be detected and corrected. A single chip can im plement this R-S scheme. 5. Convolutional encoding: Convolutional encoding is a process of adding redundancy to a signal stream. This scheme is used to detect and correct multiple bit errors. It interleaves the check bits into the ac tual data stream and provides very good


28 immunity for mitigating isolated burst no ise. The scheme is highly suitable for communication systems. 6. System-level protocol: Errors in this me thod are detected using parity checks and detection of a non-valid Manchester encoding of data. In case an error is detected, error correction is done via re transmission. The system-level protocol retransmits the transaction for a maximum of three times [2]. Current Monitoring Techniques SEU detection and correction can be achie ved by using static RAM architectures which employ transient current sensing circui ts. In time-critical applications where reduction in error latency is a key requirement, current mo nitoring techniques are used [7]. A new technique is devi sed which uses CMOS static RAM and implements error detection based on current monitoring. Small fluctuations in power supply current are noticed when a heavy ion strikes and accumu lates upsets. Abnormal currents produced by upsets in RAM columns are detected by the built-in-current-sensors (BICS). The supply line of each memory column is provided with a BISC. An internal latch is set when an SEU error occurs in a memory column. Erro r correction sequence is started immediately by a logic signal output, thus ensuring zero fault latency. The affected word in the column is detected by a parity bit per word. Th is parity bit also he lps in error correction. Correction is performed only if after reading the memory wo rds an erroneous parity is discovered. The main advantage of this scheme is that upset dete ction is asynchronous, fully independent and concurrent to normal memory operation. A proper determination of BICS detection threshold is ve ry important. The BICS must be insensitive to the transient currents of active read or write cycles. They should not detect sm all transient currents induced by radiation which do not generate ups ets. Estimation of storage node sensitivity to upsets helps in avoiding false alarms. Slow rate of false alarms pose no problem [8].


29 2.4.2 Resistive or Capacitive Hardening Resistive hardening involves the use of passive, polysilicon intracell decoupling resistors in the cross-coupling segments of each SRAM cell. The cel l identifies an upsetcausing voltage transient by using decouplin g resistors that slow the regenerative feedback response of the cell. SEU hardness is provided by gate d resistors which are actively clocked, high resistance polysilicon resistors. They are built by two layers of polysilicon separated by a thin layer of thermal oxide. Th e high resistance of these gated resistors protects the stored cell data from SEUs. Figure 2.2 illustrates a resistively hardened CMOS SRAM cell sche matic. Resistors R1 and R2 are the intra-cell decoupling resistors that improve the SEU har dness of the CMOS SRAM cell design. Figure 2.2 Resistive Hardened CMOS SRAM Cell Design [36] SEU hardening by use of gated resistors is achieved in two ways: 1. Providing adequate off-state channel resistance 2. Maximizing transconductance to achieve ma ximum on-state channel conductance. This technique has the advantage of le ss area overhead. However, the circuit response is slowed down because of the increa se in switching time constants. The added intra-cell resistance in crease the minimum write-cell time. Dense designs require larger


30 intra-cell resistances. Higher polysilicon resist ivity increases temperature coefficients of resistors, thus the technique becomes inappl icable at higher temperatures. The technique cannot be used for designs with very less feat ure sizes. The use of resistive hardening for elements in critical signal paths is avoided, because the delays add up and affect the overall circuit response [10]. 2.4.3 Circuit and Logic Design Techniques Circuit design and processing techniques provide radiation hardness. This is achieved by making the stored information insensitive for the usual energy range of incident particles. However there are certa in disadvantages such as high cost, low performance, and high power dissipation [3]. Hardening techniques at circuit level en sure immunity against single node upsets and are fully compatible with standard CMOS technologies. They are based on storage latch duplication and use state-restoring feedback circuits which are more compact and add lower delays than the TMR circuits. However because of high area overhead and power dissipation, they are inapplicable to high density circuit architectures [7]. Various SEU tolerant SRAM cells such as Whitaker design, Dice design, HIT cells, Rockett cells, and Barry-Dooley design ha ve been developed [36]. These cells have better electrical performance and consume less silicon area [9]. All these designs are based on three main principles: 1. Information can be stored at two places thus providing a sour ce of uncorrupted data. 2. When an upset is detected, the informa tion from the uncorrupted part can be fedback to mitigate the error. 3. A p-transistor storing a logic ‘1’ cannot be upset to ‘0’ and a n-transistor storing a ‘0’ cannot be upset to a ‘1’ [36].


31 Ratioing Figure 2.3 Hardening by Use of Ratioing New logic configurations along with ratioi ng the strengths of transistors within the cell is a technique generally used. The de vice sizes are determined from the desired write time, read time and recovery time. The SEU immunity is independent of processing parameters. The main advantage of this method is that there is reduction of loading on the clock signal and transient faults will not pr opagate from the new flip flop. However, because of the body effects on the device thre shold voltage, there is degradation in internal voltage levels which reduces the availa ble noise margin and t hus affects the static power. Also, the use of ratioed devices progressively loses the immunity to upset propagation due to the accumulated total dos e effects [1]. Figure 2.3 illustrates this. 4.3.2 Rockett Cell Another cell which uses circuit design techniques is the Rockett cell shown in Figure 2.4 in which no ratioed situation exists between devices. The Rockett cell includes redundant storage nodes that are driven by an d connected to p+ di ffusion regions. Hence upset due to an ionizing part icle occurs only due to upset ting of a low voltage to high.


32 Since there is no n+ diffusion, logic ‘1’ cannot be upset to ‘0’. Thus the redundant storage is a source of uncorrupted ‘1’s. Radiation hard ening is achieve d by transistor sizing [12]. This causes the cell to consum e no static power. Also, only a single phase clock is required. The SEU immunity is independent of processing tolerances, voltage supply tolerances, and temperature. However, there are several disadvantages of this technique such as higher capacitance on clock inputs. The outputs of the Rockett cell will not swing rail to rail and, when configur ed as a flip flop, upsets can propagate from the cell into downstream logic and memory cells [1]. Figure 2.4 Rockett Cell [1] 2.4.4 Redundancy The basic idea of redundancy is to implement multiple copies of the same circuit, and compare the outputs of each of these circui ts. Disparity in these outputs indicates the occurrence of an error. Redundancy technique ca n be implemented at various levels such as circuits, systems etc. Autonomous or ground-controlled switching occurs from a prime system to redundant spare system. This proc ess of switching is a simple process when both the designs meet the system restrictions identically [2]. Logic paths in between the flipflops are composed of hard-wired, non-reco nfigurable gates. Hence they are immune to SEUs. However they are vulnerable to S ETs. Thus full module redundancy is essential to protect the device [30].


33 Lockstep System This technique involves operation of two identical circuits with synchronized clocking. A difference in the output values of the processor indicates the occurrence of SEU. Recovery actions such as reini tializing and switching to safe mode are implemented. False triggers could occur if the devices respond even slightly [2]. Triple Modular Redundancy TMR is the most reliable safeguard for total device failure as it rapidly detects and corrects SEUs. In the reconfigurable logi c devices, user logic and logic paths are susceptible to SEUs. This makes the triple modular redundancy an effective technique [20]. The basic concept of triple redundancy is that a sens itive circuit can be hardened to SEUs and SETs by implementing three copies of the same circuit and performing a bitwise “majority vote” on the output of the trip licate circuit. Impl ementation of TMR has the advantages of complete da ta retention and aut onomous recovery [30]. It operates with the main aim of removing all single points of failure from the design. Each set of the triplicated circuit ha s its own set of inputs as show n in Figure 2.5, to avoid errors occurring due to propagation of wrong inputs. However it requires external mitigation device. The sensitivity of the TMR de sign technique mainly depends on the characteristics of the adopted TMR architec ture in terms of placing and routing. Xilinx builds its majority voters from the Output Buffer Three-state cell (OBUFT) provided by Xilinx library primitives. Fault-tolerant memory elements in sequential digital logic are usually implemented by TMR technique [ 18]. Combination of TMR and scrubbing techniques offer an effective solution for SEU mitigation of SRAM-based FPGA design. By this technique single errors in the user or path logic and static errors in the memory can be corrected before the next error occurs The circuit is made immune to functional errors [20]. However, this method adds sy stem level overhead and increases the power dissipation [7]. TMR can be implemented in vari ous forms such as Simple TMR, TMR circuit with three voters, TMR circuit with three voters and clock, and feedback TMR [37].


34 Figure 2.5 Triple Redundant FPGA Inputs [18] Dual Voting Double Redundancy This technique provides reliable safeguard against total device failure and is used in designs that are less than half of the total device. Logic of the circuit is duplicated and the outputs are compared. An SEU or SEFI is sa id to occur if there is a difference in the output. This technique operates without exte rnal mitigation device and has independent input-output mitigation. This has the advantag e that in case of total device failure, the redundant device continues proce ssing. There is also no single point susceptibility [15]. The disadvantage of this techni que is that skew in the outp ut transitions times increases noise. Figure 2.6 shows the implementation. Selective Triple Modular Redundancy An effective technique employed for SEU mitigation is the Selective Triple Modular Redundancy. This technique selectivel y applies TMR on sensitive sub-circuits of a given combinational circuit. The sensitivity of a gate to an SEU is determined by the signal probability of its input li nes. A gate is said to be se nsitive if an SEU on any one of the inputs is likely to be propagated to the output of the gate. The advantage of this technique is less area overhead. [11].


35 Figure 2.6 Dual Voting Double Redundancy [15] 2.5 Miscellaneous Techniques 1. A new technique for designing radiation to lerant circuits is re-programmable FPGA technologies. There are two kinds; one of it uses a FLASH/ EEPROM configuration switch while the other uses an SRAM switch. Each design is implemented as a configuration of many switches. Latch type storage elements can be hardened by increasing the threshold. This can be done by increasing the criti cal charge by circuit design techniques or by reducing charge coll ection by technology changes such as SOI or by wafer fabricating process changes such as thin epitaxial silic on or double well. SRAM based Re-programmable FPGA for space a pplications are explained in [5]. 2. Watch dog timers are operated by internal microprocessor timers and pass Health and Safety messages between spacecraft systems. Th ey can be implemented in hardware or software and at many levels. If safety message is not received in certain amount of time, the system takes action on the device. There ar e many recovery actions such as resetting pulse, removing power, sending a telemetry message to the ground, or placing the spacecraft in safe mode. Watch dog timers could be active or passive. A passive watchdog does not send health messages but monitors normal operating conditions [2].


36 3. Separation of p-type and ntype diffusion nodes within a me mory circuit also helps in SEU mitigation [2]. SEU failure rate is a function of the logi c density. Size of the design and resources used by the design influence the failure rate of a design. The circuit f unctionality of larger designs which use more logic and routing reso urces is defined by a larger portion of the configuration bit stream. Smaller designs th at use fewer resources contain more “don’t care” configuration bits within the bit stream and can tolerate more configuration upsets [16]. Before the final launching of the object into space, ra diation risk assessment must be done. Definition: Radiation Risk Assessmen t: A radiation risk assessment for any electronic device includes the determination of total dose damage and SEE susceptibility of the device caused by the projected radia tion environment of the spacecraft [38]. High energy protons, electr ons and secondary radiation cause total dose damage, while heavy ions contribute to SEE. Syst em hardness for space devices is assured by analyzing all electronic parts susceptible to SEE. The radiation environment experienced by a device depends on orbital parameters, sola r activity level, shielding provided by the spacecraft walls and materials inserted between the device and outside environment [38]. Preliminary check on radiation hardened de vices includes evaluating the radiation environment that the device will be exposed to. Thus preliminary check should be made on the device before assessing th e total dose and SEE sensitiv ity of the devices selected for system design. Exposure of the device to gamma rays from a Cobalt-60 source helps in the evaluation of total dose test. Lo w dose rates are recommended for space applications as most device types and technol ogies exhibit higher to tal dose tolerance at low dose rates. The dose rates used depend on the predicted device radiation sensitivity and the projected mission total dose [38]. There are two kinds of testing. One is the static type and the other is of dynamic type. Static testing: This involves quantif ying upsets in the configuration memory elements without toggling clock, inputs and out puts of a fully-configured device during


37 irradiation. Dynamic testing: This require s observation of a functional design under irradiation to determine the sensitivity of th e combinatorial logic as well as upsets during transient signal propagation [20] The parameters to be considered for any fault injection experiment are selection of fault location, in jection time, fault dur ation and input stimuli for the application [22]. Space radiation is simulated within a natural space environment by using groundbased radiation sources to study the behavior of upsets within the FPGA configuration memory. Total dose response and proton-induced single-event effects are tested with the help of electron linear accelerators and prot on cyclotrons. This method, however, has several drawbacks: 1. Radiation testing is relatively expensive. 2. The number of radiation tests is limited by the availability of the testing facility and the need to travel. 3. Ground-based radiation tests insert high-en ergy particles into the device in a random, undirected manner. This does not allo w the ability to create targeted tests that evaluate the behavior of specific FPGA resources [16]. 2.6 Partial Evaluation Partial evaluation is an optimization technique commonly found in functional programming languages [24]. U nnecessary recalcul ation can be avoi ded in functional languages by making specialized versions of functions by partia l evaluation. Partial Evaluation simplifies code structure by eliminat ing statically determinable computations in the source code. This consists of sym bolically executing the program, unrolling loops, unfolding subroutine calls, and performing st atic data manipulations based on the program input [25]. These processes remove th e redundant elements and result in smaller, faster circuits which are more suited fo r implementation on an FPGA. Calling a function with too few parameters is called partial ap plication. The IO and routing resources are limited in FPGA. The requirement of IO and routing resources can be reduced by partial application of inputs. The difference betw een partial evaluation and other existing


38 simplification techniques is that partial eval uation performs simplif ication at run-time while others are implemente d at compile time [24]. The timing upper bound should be lower for a specialized circuit when compared to the general circuit. Partial evaluation re sults in faster execution and better space utilization. The need to hol d “constant” values is elimin ated and this reduces the complexity of the circuit. A lthough partial evaluation has prove n to be quiet effective for software problems, its use in hardware is highly restricted. The simplest form of hardware partial evaluation is Boolean optimizat ion. Not much research has been done to use partial evaluation in hardware optimizatio ns. The static nature of hardware makes partial evaluation an uninter esting technique. The circuit cannot be specialized dynamically at run time because of the static nature. However, dynamic reconfiguration of circuits is gaining impo rtance with the advent of SRAM based FPGAs [24]. The main techniques implemente d by partial evaluation are: 1. Symbolic computation which means to compute with expr essions involving variables along with values. 2. Unfolding which replaces call with instantiated body of function. 3. Program point specialization which creates new version of function specialized to some arguments. This technique can be thought of as a co mbination of two aspects, namely definition and folding. Definition means to introduce a new definition or extend an existing definition, while folding means to replace an expression with a function call. 4. Memoization which stores the result of some computation and reproduces it again when needed [34]. Advantages of implemen ting partial evaluation: 1. Speed up: Decreases the time to com pute and time to run the program.


39 2. Efficient and modular solution: Partial Ev aluation specializes generic program to specific instance [34]. Development in partial eval uation has allowed the followi ng improvements with respect to FPGA design: 1. Partially evaluated hardware description language design descrip tions to generate run-time specialized software for Xilinx FPGA devices. 2. Similar technique is applied for da ta encryption circuits as well. 3. A method was proposed for expressing dyna mic reconfiguration for FPGAs by means of partial evaluation. 4. Partially evaluatable circuit generators are developed for FPGA based circuits. 5. Transistor level partial evaluation has been developed for symbolic simulation. 6. A component reduction algorithm has been developed that partially evaluates a transistor level library element to generate a reduced version of the library component when some operands are tied to constants [28]. By using partial evaluation, partial evalua tors have been designed which generate efficient specialized programs for ray traci ng, FFT, circuit and pl anetary simulations. Partial evaluators also compile using in terpreters for programming languages and generate compilers from interpreters [34]. 2.7 Temporal Latch Consider 3 delay paths, one with no delay, the other with a delay dT added to it and the third with a delay 2dT added. Under the normal operation of a temporal latch it rejects transients by setting a pre-determin ed delay between these 3 delay paths [6]. Any transient occurring travels down the fi rst path with no de lay, the second path with a dT offset and the third with a 2dT offset. As long as th e transient is sh orter than dT only one path can have an error and the majo rity voter determines the correct output by


40 rejecting the transient. However, if the transi ent pulse width is grea ter than dT, then two paths of the temporal latch are corrupted a nd the majority voter outputs wrong value as the error is passed through it. Thus the variation in the valu e of dT makes the temporal latch fully SET/SEU immune. The value of dT should be larger than any transient possible. This makes the temporal latch immune to all possible soft errors. Figure 2.7 shows the relation between pulse width of transient and the delay time dT [6]. Figure 2.7 Effect of Transients With Low Pulse Widths Since the pulse width of the transient is smaller than the delay time dT, it is captured by the first path but rejected by the other two paths. Since two out of the three values give a correct output, the ma jority voter gives a correct output. When the pulse width of the transient is gr eater than the delay time, then it is highly likely that two out of the three paths allow the transient to pass th rough. Since two out of the three output values are wrong, the majo rity voter circuit finally outputs a wrong value. This is illustrated in Figure 2.8. Circuit Delay Unit Delay Unit Voter Circuit Transient pulse with low pulse width No change in the value of bit Delay caused by the delay unit Transient pulse


41 Figure 2. 8 Effect of Transients With High Pulse Widths The following changes are observed in a conventional static latch when it experiences SEU: 1. Junction collection in st atic latches and SRAMs 2. Voltage transient effects magnified by circuit feedback. 3. Data state of latch flips if voltage crosses switch point. 4. Critical charge decreases as square of feature size 5. Error rates are independent of clock frequency [14]. The following changes are observed in a conventional static latch when it experiences SET: 1. Junction collection in comb inational logic circuitry. 2. Voltage transients are no longer attenuated in submicron devices. Circuit Delay Unit Delay Unit Voter Circuit Delay caused by the delay unit Transient pulse Transient pulse with high pulse width


42 3. Incorrect data gets latched if tr ansient arrives at a clock edge. 4. Critical width for unattenuated propagation decreases as square of feature size. 5. Error rates increase in proporti on to clock frequency [14]. To overcome the above cha nges, temporal sampling la tches are designed: This can be done in two ways. One is by spatially reducing the circuit a nd the other technique is to temporally reduce the clocking [14]. 1. Spatially redundant circui t (as shown in Figure 2.9): 1. This circuit replaces th e conventional latches. 2. It uses three parall el sampling latches. 3. The resultant circuit is immune to la tch SEUs, clock SETs and set/reset SETs. 4. However there is an error latency of one clock cycle. Figure 2.9 Spatially Redundant Latch


43 3. Temporally redundant clocki ng (as shown in Figure 2.10): 1. This replaces the conventional clock. 2. It has three sampling clock phases. 3. One data voting occurs for each clock phase. 4. DICE latch with programmab le separation delay (0.8 micron). 5. Temporally redundant sampling latch for deep submicron [14]. Figure 2.10 Temporally Redundant Latch 2.8 Voter Circuits Single event errors are mitigated through a majority vote circ uit. Assuming the voter circuit itself will not upset, a majority vote circuit requires multiple upsets for a logical error to occur [15]. A simple implemen tation of voter circuit using the basic logic gates is shown in Figure 2.11. Voter circui ts used in traditional SRAM FPGAs are


44 implemented using LUTs as shown in Figure 2.12. Faster circuit implementation can be achieved by building majority voters using LUTs. Majority voters of this kind are used in designs where logic resources are not a constr aint but fastest possi ble timing performance is required. Combinatorial logic is usually implemented using LUTs. This helps in decreasing the propagation delay. However, immunity to SEUs is less [30]. Figure 2.11 Majority Vo te Circuit [15] Figure 2.12 Majority Vote Ci rcuit Using LUTs [15]


45 Virtex makes use of the abundant and fr ee BUFTs for voter to make the circuit immune. Virtex internal 3-state buffers ar e hard-wired OR-AND gates and are used to implement all Boolean functions in the use r’s design. They do not use any CLB (logic) resources [15]. In designs where the ava ilable logic resources are less in number, majority voters can be implemented using these active low enabled 3-state buffers. However, the full functionality of the major ity voter cannot be si mulated. All inputs to the voter should be same to ensure that th e user’s design is free from SEUs [30]. The BUFTs are instantiated as shown in Figure 2.13. A different kind of voter implementation is by use of simple logic module in which combinations of 4-input Boolean functi ons implement a logic equation as shown in the Figure 2.14. The register stag es are pipelined and this enhances the performance. However the disadvantage of this implementation is that LUTs and FFs are susceptible to SEUs and interconnects are suscep tible to transient upsets [15]. Figure 2.13 Majority Vote Ci rcuit Using BUFTs [15] Figure 2.15 shows the implementation of module redundancy and mitigation on a single chip. Redundant instances of the entire module are replicated and the final outputs of the modules are mitigated. This is a si mple method for implementing SEU mitigation on FPGA designs.


46 Figure 2.14 Majority Vote Circ uit Using LUT and FF [15] Figure 2.15 Implementation of Module Redundancy and Mitigation On Single Chip [15]


47 2.9 Summary We find that among the various radiation e ffects, single event upsets are of major concern. Protecting the space electronics fr om SEUs improves efficiency. There are various techniques to achieve this. Radiation hardening by fabrication is an effective technique but increases the cost of circuit. Thus radiation hardening by design is a better option. One of the techniques implemented in hardening by design is redundancy. The development of SRAM based FPGAs is encour aging the use of partial evaluation in hardware. By using partial evaluation an area, and performance effici ent technique can be devised for hardening by design.


48 CHAPTER 3 PARTIAL EVALUATION REDUNDANCY We present in detail the proposed Part ial evaluation Triple Modular Redundancy (PTMR) method. The overall idea is as follows: Given the primary input signal probabilities, the gates that can be eliminated are determined. Circuit that is obtained after eliminating the redundant gates is duplicated and outpu ts obtained from the original circuit and the two duplicated circuits are vo ted. This technique can be implemented if the actual inputs to the circuit are in accord ance with the assumptions made. In case the actual inputs are violating the assumptions ma de, temporal triple modular redundancy is implemented. Thus, efficiency of the te chnique depends on th e nature of input probabilities and the actual inputs to the circuit. This chapter is organized as follows: Si nce the technique is based on the input environment information, we first discuss how to obtain such information. We then talk about the basics involved in id entifying the gates that can be eliminated. Based on this we proceed to describe the technique in detail. Finally, we shall illustrate the idea using a small example. 3.1 Definitions Redundant Gate: A gate is said to be redundant if its output can be determined in advance based on the knowledge of its inputs. Rounding the Input Probabilities: Suppose the input probabil ity p is such that 0 p 0.2, we round the probability value to ‘0’ and in case the probability is such that 0.9 p 1.0, then we round the value to ‘1’.


49 Input Value in Accordance With / Against the Assumed Probability: If the input probability is in the range 0 p 0.2, it is rounded to 0.0. In cas e the actual value on this input line is a logic ‘0’ then it is said that input value is in accord ance with the assumed probability, while a logic ‘1’ means that the input value is against the assumed probability. Analogous definitions hold for probabilities p such that 0.9 p 1.0. Controlling Value of a Boolean Gate: If the presence of a valu e on at least one of the inputs of a gate forces th e output to a known value. For instance, the controlling value for “AND” and “NAND” gates is ‘0’ while for “OR” and “NOR” gates it is a ‘1’. Sensitizing Value: It is the Non-controlling value, i.e. the compleme nt of controlling value. For instance, the controlling value for “AND” and “NAND” gates is ‘1’ while for “OR” and “NOR” gates it is a ‘0’. 3.2 Characterizing Input Environment Input environment is characterized by a popular method called profiling. Software profiling techniques are widely used in the software development to identify the often executed portions of the code. Profile data is gathered from representative benchmarks. Low power systems are designed by profiling hardware design. The profiled data can be summarized either in the form of input signal probabilities or in terms of “representative” input sequence. In the latter case, vector-compaction based scheme has been proposed to reduce the length of such sequences. Representative sequence to input probabiliti es can be reduced by simula ting the circuit with the sequence. This justifies the assumption that the input environment information is available in the form of input signal probabilities. A method of calculating the probability of an upset due to a SET on a given combinational circuit was proposed in the c ontext of SEU-hardening synthesis technique.


50 The probabilities are determined based on the radiation environment it will be subjected to and the nature of the circuit. In the context of ASIC tes ting, the detection probabil ities of the lines can be determined from the signal probabilities. Th e detection probabiliti es of the lines are calculated by propagating the signal probabilit ies of the primary inputs through the circuit [36]. The signal probability at the output of a gate is determined as shown in table 3.1. Table 3.1 Computation of Output Probability Gate Type Output probability AND i iP NAND i iP 1 OR i i i iP P NOR ) ( 1i i i iP P XOR ) ( 1 ) (,j P i Pj i XNOR ) ) ( 1 ) ( ( 1,j P i Pj i 3.3 Implementation of the Partial Evaluation Based TMR Technique This technique is based on input environment as well as the truth table of basic logic gates. If one of the input s to a logic gate is a control ling value, then the gate is


51 redundant since its output can be determined in advance. Thus the gate is eliminated and its output value is set accordingly. The input probabilities are rounded and then propagated to the output. Now based on the above concepts, the redundant gates are identified and the output values are set accordingly. For instance, if one of the input probabilities of an AND gate is in the range 0.0 to 0.2, it is rounded to 0.0 and is identifi ed as redundant. The output net value is set to 0. The resultant circuit obtained after elimin ating all the redundant gates is functionally equivalent to the original circ uit but has less number of gates. This reduced circuit is then duplicated. The output value is determined by three circuits, the original circuit and the two sets of reduced circuits. The correct output is obtained by taking the majority vote of these three outputs. The implementation of re dundancy using partially evaluated circuit is shown in Figure 3.1. Figure 3.1 Implementation of Redundanc y Using Partially Evaluated Circuit This technique works fine if the actual i nputs to the circuit are in accordance with the rounded values in all cases Incase any one of the inputs is against the rounded value then Temporal TMR technique is used instead of the partial evaluation. To determine the outputs using temporal TMR, th e idea of delaying is used. Fi rst the outputs of the circuit are determined. These outputs are then passe d through a series of two delay elements. Original Circuit Reduced Circuit Reduced Circuit Majority Voter Circuit


52 Figure 3.2 Evaluation of Temporal TMR Thus, we have the outputs of the circu it determined at three different time instances. A majority vote of these three output s is taken in order to determine the correct output of circuit. The impl ementation of Temporal TMR is shown in Figure 3.2. Figure 3.3 Overall Implementation of the Technique Original Circuit Majority Voter Circuit Delay unit Delay unit Original Circuit Reduced Circuit Majority Voter Circuit Majority Voter Circuit Logic for Select line of Multiplexer delay unit delay unit Final output Reduced Circuit


53 1. Algorithm Reduced_Circuit ( gate G, input probability p) 2. Inputs: A Boolean Gate G, and 0 < p <1. 3. Output: copies of reduced circuit 4. begin 5. /* Levelize the circuit */ 6. Maxlevel Levelize (C) 7. /* Resolve logic on signals at each level */ 8. for each level l from 0 to Maxlevel do 9. for each gate g at level l do 10. Compute the signal probability of g’s output 11. end for 12. end for 13. /* At each level find if a gate can be eliminated */ 14. for each level from 0 to Maxlevel do 15. for each gate g at level l do 16. switch (gate type of G): 17. case: AND 18. call Algorithm AND 19. case: OR 20. call Algorithm OR 21. case: XOR 22. call Algorithm XOR 23. case: XNOR 24. call Algorithm XNOR 25. end switch 26. end for 27. end for 28. for each gate g that can be eliminated 29. C' C –gate g 30. C'' C' 31. end for 32. end Algorithm Reduced_Circuit Figure 3.4 Algorithm to Gene rate the Reduced Circuit Let there be n inputs with probabilities in the range 0 p 0.2 and m inputs in the range 0.9 p 1.0, then the select line for the multiplexer is n minp inp. When the temporal TMR circuit is used, th ere are more issues to be considered. There is delay in determining the output be cause of considering the output at three different time instances. The outputs are computed sequentially, hence the Temporal TMR technique takes more than thrice the time taken by sequential TMR technique. This


54 means that the next set of inputs have to wa it till the output is com puted. This requires the use of latches to store the inpu ts. A signal from the output circuit acts as a clock to this latch. The overall implementation of the technique is shown in Figure 3.3. 3.4 Algorithms used in Part ial Evaluation Based TMR Figure 3.4 gives the algorithm to generate a reduced circuit. The circuit is first levelized. Gates having primary inputs are level 1 gates. Gates having their inputs as the outputs of level n are ma rked as level n+1. At each level from 1 to maximum level, each gate at level l is consider ed. In lines from 8 to 12, the logic value of signals is resolved, i.e., for each gate, the pr obability of output net is computed. Lines 14 -27 call the necessary algorithm. The called algorithm determines if the gate can be eliminated or not. Lines 29 and 30 generate a ci rcuit that is functiona lly equivalent to the original circuit but with less number of gates. 1. Algorithm Identify-eliminatable-gat e (gate G, input probability p ) 2. Inputs: A Boolean Gate G, and 0 < p <1. 3. Output: sets the output probabi lity q and returns true if the gate can be eliminated else false 4. begin 5. /* First check if the gate is already identified as redundant or not */ 6. if retun_value FALSE 7. /* If the gate is not id entified as redundant */ 8. for each input i of gate G do 9. /* check if any one of the inputs is a control value */ 10. if (pi = = control value) then 11. /* identify the gate as redundant and set the output name appropriately */ 12. return_value TRUE 13. output net name appropriate input net name 14. endif 15. end for 16. endif 17. end Algorithm Identify-eliminatable-gate Figure 3.5 Algorithm to Identif y if a Gate is Redundant Figure 3.5 give an algorithm to identi fy if the gate is redundant or not.


55 Line 6 checks to see if the gate is already identified as redundant. If the gate is not identified as redundant the algorithm is execute d. Line 10 checks to see if any one of the inputs is a control value. If so, then lines 12 and 13 set the variables. Figure 3.6 gives the algorithm for overa ll implementation of the technique. Lines 4-11 determine the select line for the multiplexer. Line 15 determines the output from the technique of partial evaluation a nd line 16 determines the output of temporal circuit. Line 17 selects the correct output from the outputs determined from partially evaluated circuit and tem poral circuit using the select line from line 12. 1. Algorithm Partial_TMR (Circuit C) 2. Begin 3. /* determination of select line for multiplexer */ 4. for each input i do 5. if 0 < p(i) < 0.2 then 6. temp = temp + i 7. end if 8. if 0.9 < p(i) < 1.0 then 9. temp = temp + i 10. end if 11. end for 12. select line of multiplexer = temp 13. /* selection of correct output from the two sets of output */ 14. for each output o of the circuit do 15. partial (o) = majority voting from circuit C, C’ and C” 16. temporal (o) = majority vo ting of the outputs from temporal circuit 17. final output (o) = mux (parti al (o), temporal(o), select line, output(o)) 18. end for 19. end Algorithm Partial_TMR Figure 3.6 Algorithm That Describes the Overall Implementation of the PTMR Technique 3.5 Advantages and Disadvantages 3.5.1 Advantages 1. Less area overhead: In the full TMR, the entire circuit is tirplicated. This increases the area overhead. In case of pa rtial evaluation, two copies of reduced


56 circuit and one copy of or iginal circuit are used. This decreases the area overhead. 2. High tolerance to SEUs: Th e resultant circuit exhibits 100% SEU immunity. The efficiency of the technique is illust rated in Figure 3.7. Spatial TMR technique has high area overhead, since three copies of the circuit ar e implemented. However, there is no delay overhead as the impl ementation is done in parallel. In case of temporal TMR, outputs at three different instances of time are considered, hence there is high delay overhead. Area overhead is less. The proposed technique has area overhead and delay overhead values lying in between spatial and temporal TMR. Figure 3.7 Comparison of Ar ea/Delay Overhead For Differe nt Redundancy Techniques Spatial TMR Area overhead Temporal TMR Delay overhead PE based TMR


573.5.2 Disadvantages 1. In the worst case, no gates of the circuit co uld be eliminated. In this case, the area overhead is greater than the full TMR. 2. In cases when all the inputs to the circu its are against the assumed probabilities, Temporal TMR has to be used. This give s rise to certain delay complications. 3.6 Illustrative Example Consider the circuit gi ven in Figure 3.8. Figure 3.8 Original Circuit The circuit has 3 inputs, 1 output and 13 gates. The input probabilities of th e circuit are P(A) = 0.504, P(B) = 0.996 and P(C) = 0.174


58 Since 0.9 P(B) 1.0, we round the logic value of B to ‘0’ and 0 P(A) 0.2, we round the logic value of A to ‘1’. We now propagate the rounded values through the circuit as shown in Figure 3.9. The shaded gates indicate that they are redundant. Figure 3.9 The Rounded Values Ar e Propagated Over the Circuit Based on the inputs to the gate and type of gate we identify some gates to be eliminatable. On eliminating all the redundant gates we get a reduced circuit as shown in Figure 3.10. We now have the output determined by the original circuit and the reduced circuit. The lightly shaded gates represent gates of reduced 1, while the darkly shaded gates represent gates of reduced circuit 2. The correct output is obtained by taking the majority vote of these output s as shown in Figure 3.11.


59 Figure 3.10 Reduced Circuit Figure 3.11 Circuit That Co mputes Output From Partially Evaluated Circuit


60 The temporal part of the circuit uses tw o delay units. Thus we have the output of the circuit determined at three instances of time. The correct output is obtained by taking the majority of these outputs. The t echnique of temporal TMR is illustrated in Figure 3.12. To choose the correct output from the tw o sets of outputs, one computed using partially evaluated circuit and the other usi ng temporal circuit a multiplexer is used. If the output of the logic circuit is a ‘0’, then the output of partially evaluated circuit is chosen and if it is a ‘1’, then the output of temporal circuit is chosen. This is shown in Figure 3.13. Figure 3.12 Circuit That Computes Output From Temporal TMR Circuit 3.7 Summary We have presented a design technique fo r SEU mitigation. The partial evaluation based triple modular redundancy is an effi cient technique that provides 100% SEU mitigation in addition to savings in area. Inputs to this technique are given in terms of probabilities. A brief descripti on of characterizing the inputs is presented. The various


61 algorithms implemented for the technique are described in detail. The advantages and disadvantages of the technique are discussed. The different steps involved are described in detail using an il lustrative example. Figure 3.13 Circuit Showing The Overall Implementation


62CHAPTER 4 EXPERIMENTAL RESULTS The experimental flow used to validat e the proposed Partial evaluation based redundancy is presented. A SEU simulator is us ed to insert faults representing SEUs. A functional testing procedure to assess the Immunity of the circuit against SEU is presented. Finally the result s of applying the technique on standard benchmarks are analyzed. 4.1 Experimental Setup The experimental flow is as shown in Figure 4.1. 1. Inputs to the circuit are in the form of BLIF (Berkele y Logic Interchange Format) format. These netlists in blif format are converted to VHDL format. This is to allow the use of Xilinix Foundation tools 4.1i to map the designs onto Virtex FPGAs. 2. Insertion of Partial Evaluation Technique : Random numbers are generated using a a random function. These random numbers are given as inputs to the circuit. These input probabilities are propagated over the circuit by us ing the Table 3.1. Based on the algorithm described which is coded in C the gates that can be eliminated are identified. This gives a ci rcuit which is functionally equivalent to the original circuit but has less number of gates. The VHDL netlist generated from step 1 is given as the input to the C program. A new netlist is generated which represents the overall technique. Th is netlist is passed through the SEU


63 simulator. The results so obtained repres ent the behavior of the circuit before mapping. Figure 4.1 Experimental Flow 3. SEU Simulation: Faults due to SEUs are represented by a SEU simulator. It works on the following principles: i. The simulator randomly injects a fault on any one signal. This represents the situation that a fault due to SEU can occur on any line of the circuit. ii. An SEU at a node temporarily i nverts the value on that line. Duration of the SEU is controlled by the user. MCNC Benchmarks in .blif format Conversion from .blif format to .vhdl format Insertion of the partial evaluation techni q ue Insertion of faults using SEU simulator Estimation of faults inserted.


64 iii. SEU introduction is a random pro cess and can be done at any instant of time. The SEU simulator generates outputs equal to the number of signa ls in the circuit and each output has value of logic ‘Z’ or logi c ‘1’. Thus the nets of the circuit are driven by two sources, one is the original value and the other is simulator output. The final value imposed on th e net is determined by the resolution function given in Figure 4.2. The resolution function is illustrated in Figures 4.3 and 4.4. function resolve_std_logic (v alues: in std_logic_vector) return std_logic is variable result: std_logic; begin if values(0) = values(1) result := not values(0); end if; if (values(0) = 0 && values(1) = 1) result = not values(0); end if; if (values(0) = 1 && values(1) = 0) result = values(0); end if; for index in values’range loop if values(0) = ‘Z’ then result := values(1); elsif values(1) = ‘Z’ then result := values(0); end if; end loop; return result; end resolve_std_logic; Figure 4.2 Resolution Function Thus it is clear from the Figure above that when the simulator output is a ‘Z’, the original value of the net is imposed and when the simulator output is a ‘1’, the original value of the net is inverted.


65 Figure 4.3 Fault Insertion On Line “A” by SEU Simulator Figure 4.4 Fault Insertion On Line “A” by SEU Simulator SEU Simulator Resolved value on Line A SEU Simulator Output Value on Line A Z Time ‘0’ ‘0’ ‘0’ ‘0’ ‘1’ ‘1’ line A Z X X SEU Simulator Resolved value on Line A SEU Simulator Output Value on Line A Time ‘0’ ‘0’ ‘0’ ‘0’ ‘1’ ‘1’ line A 1 X X ‘1’


66 4. Error Calculation: To check the SEU im munity of the reduced circuit, the functional operation of this circuit after introduc ing errors is compared against the original unfaulted circuit. As discussed above, the redu ced circuit is faulted by introducing SEUs using the simulator. If the comparison process leads to a disparity, it indicates that the reduced circuit is unable to compute the correct output. Figure 4.5 shows the step s involved in error calculation. Reduced circuit Simulate faulted circuit Figure 4.5 Validation of the Technique Original circuit Implementation of partial evaluation redundancy Comparison SEU simulator Results


674.1 Results The partial evaluation redundancy technique is tested on different benchmark circuits. Table 4.1 Results for circuits with positive gains and rounding range as 0.0 < p < 0.2 => logic ‘0’ and 0.9 < p < 1.0 => logic ‘1’ Name of the circuit Total number of gates Number of redundant gates Total number of outputs % of redundant gates A TMR – A PE A TMR (as percentage) Count 105 62 1 59.04 33.54 Cm150a 58 32 1 55.17 26.40 9symml 166 73 1 43.97 25.69 Alu2 335 157 5 46.86 24.68 Alu4 674 270 6 40.05 22.87 Too_large 674 241 3 35.75 21.77 I2 161 60 1 37.26 21.14 Cm151a 29 18 1 62.06 20.87 Frg1 100 48 3 48 18.26 Mux 79 29 1 36.70 17.01 C432 209 77 4 36.84 16.17 term1 364 132 9 36.26 14.09 Cordic 64 28 2 43.75 14 F51m 123 65 7 52.84 11.83 Vda 805 362 39 44.96 11.24


68 Table 4.1 Continued Results for circuits with positive gains and rounding range as 1.0 < p < 0.2 => logic ‘0’ and 0.9 < p < 1.0 => logic ‘1’ Name of the circuit Total number of gates Number of redundant gates Total number of outputs % of redundant gates A TMR – A PE A TMR (as percentage) Cm152a 23 12 1 52.17 9.58 I4 154 59 6 38.31 9.46 C1908 398 196 25 49.24 8.57 I3 106 50 6 47.16 8.18 C2670 677 272 34 40.17 7.56 Z4ml 52 30 4 57.69 5.81 C880 317 159 24 50.15 4.58 Ttt2 204 133 21 65.19 4.16 Cmb 43 28 4 65.11 4.13 X1 307 177 28 57.65 3.87 I9 757 396 63 52.31 3.68 Cm85a 32 20 3 62.5 0.92


69 Table 4.2 Results for circuits with negative gains and rounding range as 0.0 < p < 0.2 => logic ‘0’ and 0.9 < p < 1.0 => logic ‘1’ Name of the circuit Total number of gates Number of redundant gates Total number of outputs % of redundant gates A TMR – A PE A TMR (as percentage) c8 162 89 17 54.93 -2.70 My_adder 146 84 17 57.53 -4.94 cm163a 43 26 5 60.46 -6.04 Parity 15 7 1 46.66 -6.12 cm162a 44 24 5 54.54 -8.55 X3 706 296 99 41.92 -20.00 i7 552 153 67 27.71 -22.71 x4 439 158 65 35.99 -25.68


70 Table 4.3 Results for circuits with positive gains and rounding range as 0.0 < p < 0.3 => logic ‘0’ and 0.8 < p < 1.0 => logic ‘1’ Name of the circuit Total number of gates Number of redundant gates Total number of outputs % of redundant gates A TMR – A PE A TMR (as percentage) Count 105 68 1 64.76 36.36 9symml 166 90 1 54.21 32.47 alu4 674 344 6 51.03 29.96 Alu2 335 180 5 53.73 28.87 Too_large 674 311 3 46.14 28.51 Cm150a 58 35 1 60.34 28.08 I2 161 72 1 44.72 25.46 Frg1 100 56 3 56 23.39 C432 209 95 4 45.45 21.30 Cordic 64 33 2 51.56 17.5 term1 364 168 9 46.15 20.47 Vda 805 478 39 59.37 20.26 Mux 79 34 1 43.03 19.91 Cm151a 29 19 1 65.51 19.78 Cordic 64 33 2 51.56 17.5 F51m 123 76 7 61.78 17.38 C2670 677 354 34 52.28 14.99


71 Table 4.3 Continued Results for circuits with positive gains and rounding range as 0.0 < p < 0.3 => logic ‘0’ and 0.8 < p < 1.0 => logic ‘1’ Name of the circuit Total number of gates Number of redundant gates Total number of outputs % of redundant gates A TMR – A PE A TMR (as percentage) C1908 398 229 25 57.53 13.67 I3 106 59 6 55.66 12.57 I4 154 68 6 44.15 12.55 Cm152a 23 14 1 60.86 10.95 I9 757 486 63 64.20 10.70 C880 317 188 24 59.30 9.83 X1 307 198 28 64.49 7.64 Ttt2 204 144 21 70.58 6.89 Z4ml 52 32 4 61.53 6.39 Cmb 43 30 4 69.76 4.82 Cm85a 32 22 3 68.75 1.85


72 Table 4.4 Results for circuits with negative gains and rounding range as 0.0 < p < 0.3 => logic ‘0’ and 0.8 < p < 1.0 => logic ‘1’ Name of the circuit Total number of gates Number of redundant gates Total number of outputs % of redundant gates A TMR – A PE A TMR (as percentage) c8 162 96 17 59.25 -0.72 My_adder 146 88 17 60.27 -3.95 cm163a 43 27 5 62.79 -6.71 cm162a 44 27 5 61.36 -6.57 X3 706 364 99 51.55 -14.71 i7 552 213 67 38.58 -16.63 x4 439 208 65 47.38 -19.53


73 Table 4.5 Results for circuits with positive gains and rounding range as 0.0 < p < 0.4 => logic ‘0’ and 0.7 < p < 1.0 => logic ‘1’ Name of the circuit Total number of gates Number of redundant gates Total number of outputs % of redundant gates A TMR – A PE A TMR (as percentage) Count 105 82 1 78.09 39.18 9symml 166 108 1 65.06 38.44 Too_large 674 398 3 59.05 37.06 alu4 674 403 6 59.79 35.72 Alu2 335 206 5 61.49 33.95 I2 161 85 1 52.79 30.80 Frg1 100 69 3 69 29.80 Vda 805 600 39 74.53 29.52 Cm150a 58 36 1 62.06 29.21 term1 364 215 9 59.06 28.28 C432 209 117 4 55.98 28.14 Mux 79 43 1 54.43 27.38 Cordic 64 41 2 64.06 25.5 F51m 123 94 7 76.42 24.93 C2670 677 450 34 66.46 23.85 C1908 398 288 25 72.36 22.33 Cm151a 29 20 1 68.96 21.97 I4 154 80 6 51.94 17.48


74 Table 4.5 Continued Results for circuits with positive gains and rounding range as 0.0 < p < 0.4 => logic ‘0’ and 0.7 < p < 1.0 => logic ‘1’ Name of the circuit Total number of gates Number of redundant gates Total number of outputs % of redundant gates A TMR – A PE A TMR (as percentage) I3 106 66 6 62.26 16.66 I9 757 561 63 74.10 16.64 C880 317 220 24 69.40 15.95 Ttt2 204 169 21 82.84 14.08 Cm152a 23 15 1 65.21 13.69 X1 307 226 28 73.61 13.06 Z4ml 52 36 4 69.23 11.04 Cmb 43 33 4 76.74 8.96 Cm85a 32 24 3 75 5.55 c8 162 110 17 67.90 4.33 My_adder 146 99 17 67.80 0.397


75 Table 4.6 Results for circuits with negative gains and rounding range as 0.0 < p < 0.4 => logic ‘0’ and 0.7 < p < 1.0 => logic ‘1’ Name of the circuit Total number of gates Number of redundant gates Total number of outputs % of redundant gates A TMR – A PE A TMR (as percentage) cm163a 43 29 5 67.44 -4.02 Parity 15 9 1 60 -4.08 cm162a 44 28 5 63.63 -5.26 i7 552 318 67 57.60 -5.71 X3 706 441 99 62.46 -8.59 x4 439 259 65 58.99 -13.06 Tables 4.1, 4.3 and 4.5 shows circuits that have a positive savings in area while Tables 4.2, 4.4 and 4.6 shows circuits with negative gains. This means that the overhead involved in the technique is grea ter than the number of gates th an can be eliminated. It is observed that, in general, circ uits with large number of ga tes or less number of outputs have positive gains. The number of delay units, majority voters, and multiplexers required increases with the number of output s. Thus, the area overhead involved in the technique increases with the number of outputs. Hence efficiency of the technique in terms of area decreases with in crease in the number of outputs. By increasing the rounding range, more nu mber of inputs can be given integer values. This increases the number of gates having integer inputs and thus the number of redundant gates increases. Hence, higher savi ngs in area are obtained. However, the disadvantage of increasing the rounding range is that the need to use Temporal TMR


76 technique increases. This increases the de lay overhead. Based on the area and delay requirements, the rounding range can be set to different values. For circuits with less number of gates or less number of inputs, the variation in rounding range has less impact.


77CHAPTER 5 CONCLUSIONS Based on the above results, it can be concluded that the proposed PTMR technique is effective in hardening th e FPGA circuits against SEUs. Along with achieving good area savings, the technique is capable of providing 100% SEU immunity. Thus high tolerance can be ach ieved with less area overhead. One limitation to this technique is that it is addressed for combinational circuits only. Assuming that a sequentia l circuit can be modeled as a synchronous state machine with a feedback path consisti ng of state registers, this technique can be extended to sequential circuits as well. Further work can be done in this area to improve the e fficiency of the technique. For instance, a study can be made to determin e the subset of inputs, rounding the values of which, causes more number of gates to be eliminated. The delay involved in the circuit and the power consumed by it can be evaluated to improve the efficiency of the tech nique in terms of power and delay.


78BIBLIOGRAPHY [1] S.Whitaker, J. Canaris, and K. Liu, “ SEU Hardened Memory Cells for a CCSDS Reed Solomon Encoder,” IEEE Transactions on Nuclear Science, vol. 38, no 6, pp. 14711477, December 1991. [2] K.A. LaBel and M.M. Gates, “SingleEvent-Effect Mitigation from a System Perspective,” IEEE Transactions on Nuclear Science, vol. 43, No. 2, pp. 654-660, April 1996. [3] T. Calin, F. Vargas, M. Nicolaidis, a nd R. Velazco, “A Low-Cost, Highly Reliable SEU-Tolerant SRAM: Protot ype and Test Results,” IEEE Transactions on Nuclear Science, vol. 42, no 6, pp. 1592-1598, December 1995. [4] P. Grham, M. Caffrey, D.E. Johnson, N. Rollins, and M. Wirthlin, “SEU Mitigation for Half-Latches in Xilinx Virtex FPGAs,” IEEE Transactions on Nuclear Science, vol. 50, No. 6, pp. 2139-2146, December 2003. [5] J.J. Wang, R.B. Katz, J.S. Sun, B.E Cronquist, J.L McCollum, T.M Speers, and W.C Plants, “SRAM Based Re-programma ble FPGA for Space Applications,” IEEE Transactions on Nuclear Science, vol. 46, no. 6, pp. 1728-1735, December 1999. [6] P. Eaton, J. Benedetto, D. Mavis, K. Avery, M. Sibley, M. Gadlage, and T. Turflinger, “Single Event Transient Puls ewidth Measurements Using a Variable Temporal Latch Technique,” IEEE Transactions on Nuclear Science, Vol. 51, No. 6, pp. 3365-3368, December 2004. [7] T. Calin, M. Nicolaidis, and R. Vel azco, “Upset Hardened Memory Design for Submicron CMOS Technology,” IEEE Transactions on Nuclear Science, vol. 43, no 6, pp. 2874-2878, December 1996. [8] T. Calin, F. Vargas, M. Nicolaidis, a nd R. Velazco, “A Low-Cost, Highly Reliable SEU-Tolerant SRAM: Protot ype and Test Results,” IEEE Transactions on Nuclear Science, vol. 42, no 6, December 1995. [9] R. Velazco, D. Bessot, S. Duzellier, R. Ecoffet, and R. Koga, “Two CMOS Memory Cells Suitable for the Design of SEU-Tolerant VLSI Circuits,” IEEE Transactions on Nuclear Science, vol. 41, no 6, pp. 2229-2234, December 1994.


79 [10] L.R Rockett, “Simulated SEU Harden ed Scaled CMOS SRAM Cell Design Using Gated Resistors,” IEEE Transactions on Nuclear Science, vol. 39, no 5, p.p 1532-1541, October 1992. [11] P.K. Samudrala, J. Ra mos, and S. Katkoori, “Select ive Triple Modular Redundancy (STMR) Based Single-Event Upset (SEU ) Tolerant Synthesis for FPGAs,” IEEE Transactions on Nuclear Science, vol. 51, no 5, pp. 2957-2969, October 2004. [12] J.W. Gambles, K.J. Hass, and S.R. Whitaker, “Radiation Ha rdness of Ultra Low Power CMOS VLSI.” 11th NASA Symposium on VLSI Design, May 2003. [13] K.A. Label, D.K. Hawkin s, J.A. Kinnison, W.P. Stapor and P.W. Marshall, “Single Event Effect Characteristics of CMOS Devi ces Employing Various Epi-layer Thickness,” IEEE Conference on Radiation and its Effects on Components and Systems, pp. 258-262, 1995. [14] D.G. Mavis and P.H. Eaton, “SEU and SET Miti gation Techniques for FPGA Circuit and Configuration Bit Storage Design,” Proceedings of the 2000 Military and Aerospace Applications of Programmabl e Devices and Technologies Conference, September 2000. [15] C. Carmichael, E. Fuller, P. Blain, M. Caffrey, and H. Bogrow, “SEU Mitigation Techniques for Virtex FPGAs in Space Applications,” Proceedings of the 1999 Military and Aerospace Applications of Programm able Devices and Technologies Conference, 1999. [16] M. Wirthlin, E. Johnson, and N. Rollin s, “The Reliability of FPGA Circuit Designs in the Presence of Radiation I nduced Configuration Upsets,” Proceeding of the IEEE Symposium on Field Programmable Custom Computing Machines, pp. 133-142, April 2003. [17] N. McKay and S. Singh, “Dynamic Specialization of XC6200 FPGAs by Partial Evaluation,” Proceedings of IEEE Symposium on FPGAs for Custom Computing Machines, pp. 308-309, April 1998. [18] P. Bernardi, M. Sonza Reorda, L. Ster pone, and M. Violante, “On the evaluation of SEU sensitiveness in SRAM-based FPGAs,” IEEE on-line Testing Symposium (IOLTS), pp. 115-120, July 2004. [19] F. Sturesson, S. Matt sson, and C. Carmichael, “Heavy Ion Characterization of SEU Mitigation Methods for the Virtex FPGA,” IEEE Conference on Radiation and its Effects on Components and System, pp. 285-291, September 2001. [20] C.C Yui, G.M. Swift, C. Carmichael, R. Koga, and J.S. George, “SEU Mitigation Testing of Xilinx Virtex II FPGAs,” IEEE Data Workshop on Radiation Effects, pp. 9297, July 2003.


80 [21] E.K. Heironimus and J. G. Tront, “Insertion of Fault Detection Mechanisms in Distributed ADA Software Systems,” IEEE Conference on System Sciences, vol. 2, pp. 260-266, January 1989. [22] A. Benso, S. Di Carlo, G. Di Natale, and P. Prinetto, “Static Analysis of SEU Effects on Software Application,” IEEE International Test Conference, paper 18.2, pp. 500-508, 2002. [23] J.R. Samson, L.D. Torre, P. Wiley, T. Stottlar, and J. Ri ng, “A Comparison of Algorithm-Based Fault Tolerance and Tradit ional Redundant Self-Checking for SEU Mitigation,” IEEE Conference on Digital Avionics Systems (DASC), vol. 2, pp. 8E4/18E4/11, October 2001. [24] S. Singh, J. Hogg, and D. McAule y, “Expressing Dynamic Reconfiguration by Partial Evaluation,” IEEE Symposium on FPGAs for Custom Computing Machines (FCCM), pp. 188-194, 1996. [25] Q. Wang and D.M. Lewis, “Automated Field-Programmable Compute Accelerator Design Using Partial Evaluation,” IEEE Symposium on FPGAs for Custom Computing Machines (FCCM), pp. 145-154, April 1997. [26] J. Hao and J. Chabrier, “Combining Par tial Evaluation and C onstraint Solving: A New Approach to Constr aint Logic Programming,” IEEE Conference on Tools for Artificial Intelligence, pp. 494-500, November 1990. [27] M. Writhlin, E. Johnson, and N. Rollins, “The Reliability of FPGA Circuit Designs in the Presence of Radiation I nduced Configuration Upsets,” Proceedings of the 11th Annual IEEE Symposium on Filed-Programma ble Custom computing Machines (FCCM), pp. 133-142, April 2003. [28] M. Mukherjee and R. Vemuri, “A N ovel Synthesis Strategy Driven by Partial Evaluation Based Circuit Re duction for Application Specific DSP Circuits,” Proceedings of the 21st International Confer ence on Computer Design, 2003. [29] C. Carmichael, “Correcting Single Event Upsets through Virtex Partial Configuration,” Xilinx, xa pp216 (v1.0) edition, 2000. [30] C. Carmichael, “Tripl e Module Redundancy Design Tec hniques for Virtex FPGAs,” Xilinx, xapp197 (v1.0) edition, November 2001. [31] P. Brinkley, Avnet, and C. Carmichael, “SEU Mitigation Design Techniques for the XQR4000XL,” Xilinx, xapp181 (v1. 0) edition, March 2000. [32] E. Dupont, O. Lauzeral, R. Gaillard, and M. Olmos, “Radiation Results of the SER Test of Actel, Xilinx and Altera FPGA instances,” NASA conference, March 2004.


81 [33] Aerospace Products Radiation Policy, [34] E. Visser, Partial Evaluation, [35] V. Tang, J.L.D Lam, L. McMurchie, and C. Sechen, High-Performance SEEHardened Programmable DSP Array, International Conference on Military and Aerospace Applications of Programmable Devices, 2004. [36] Praveen K. Samudrala, Selective Triple Modular Redundancy (STMR) Based Single Event Upset (SEU) Tole rant Synthesis for FPGAs. [37] M. Writhlin, N. Rollins, M. Caffery, and P. Graham, Hardness by Design Techniques for Field Progra mmable Gate Arrays. [38] Radiation Hardening by Space, icks/mse4206/projects97/group02/space.htm [39] Single Event Effects, http: // [40] A comprehensive method for the evalua tion of the sensitivity to SEUs of FPGAbased applications, eferences/Radiation/ve lazcofpgaseesymp02.ppt#256,1 [41] [42] Hardening Devices Against Radiation, oup02/hardening.htm#so i

xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001680974
003 fts
005 20060215071044.0
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 051206s2005 flu sbm s000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0001146
TK7885 (Online)
1 100
Kakarla, Sujana.
0 245
Partial evaluation based triple modular redundancy for single event upset mitigation
h [electronic resource] /
by Sujana Kakarla.
[Tampa, Fla.] :
b University of South Florida,
Thesis (M.S.C.P.)--University of South Florida, 2005.
Includes bibliographical references.
Text (Electronic thesis) in PDF format.
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
Title from PDF of title page.
Document formatted into pages; contains 90 pages.
3 520
ABSTRACT: We present a design technique, called partial evaluation triple modular redundancy for hardening combinational circuits against Single Event Upsets (SEU). The input environment is given in terms of signal probabilities of the lines. This is useful information to determine the redundant gates of the given circuit. The basic ideas of partial redundancy and temporal triple modular redundancy are used together to harden the circuit against SEUs. The concept of partial redundancy is used to eliminate the gates whose outputs can be determined in advance. This technique fails in cases when the actual inputs to the circuit are not in accordance to the rounded logic values. In such cases the technique of temporal TMR is used. However, there is some overhead in this process because of the voter circuits and the need to choose the outputs computed by partially evaluated circuit and circuit using temporal TMR.
Adviser: Dr. Srinivas Katkoori.
Redundant gates.
Dissertations, Academic
x Computer Engineering
t USF Electronic Theses and Dissertations.
4 856