USF Libraries
USF Digital Collections

Reliability-centric probabilistic analysis of VLSI circuits

MISSING IMAGE

Material Information

Title:
Reliability-centric probabilistic analysis of VLSI circuits
Physical Description:
Book
Language:
English
Creator:
Rejimon, Thara
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla
Publication Date:

Subjects

Subjects / Keywords:
Single-event-upsets
Soft errors
Dynamic errors
Error modeling
Bayesian networks
Dissertations, Academic -- Electrical Engineering -- Doctoral -- USF
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Abstract:
ABSTRACT: Reliability is one of the most serious issues confronted by microelectronics industry as feature sizes scale down from deep submicron to sub-100-nanometer and nanometer regime. Due to processing defects and increased noise effects, it is almost impractical to come up with error-free circuits. As we move beyond 22nm, devices will be operating very close to their thermal limit making the gates error-prone and every gate will have a finite propensity of providing erroneous outputs. Additional factors increasing the erroneous behaviors are low operating voltages and extremely high frequencies. These types of errors are not captured by current defect and fault tolerant mechanisms as they might not be present during the testing and reconfiguration. Hence Reliability-centric CAD analysis tool is becoming more essential not only to combat defect and hard faults but also errors that are transient and probabilistic in nature.In this dissertation, we address three broad categories of ^errors. First, we focus on random pattern testability of logic circuits with respect to hard or permanent faults. Second, we model the effect of single-event-upset (SEU) at an internal node to primary outputs. We capture the temporal nature of SEUs by adding timing information to our model. Finally, we model the dynamic error in nano-domain computing, where reliable computation has to be achieved with "systemic" unreliable devices, thus making the entire computation process probabilistic rather than deterministic in nature.Our central theoretical scheme relies on Bayesian Belief networks that are compact efficient models representing joint probability distribution in a minimal graphical structure that not only uses conditional independencies to model the underlying probabilistic dependence but also uses them for computational advantage. We used both exact and approximate inference which has let us achieve order of magnitude improvements in both accuracy and speed and have enabled us t o study larger benchmarks than the state-of-the-art. We are also able to study error sensitivities, explore design space, and characterize the input space with respect to errors and finally, evaluate the effect of redundancy schemes.
Thesis:
Dissertation (Ph.D.)--University of South Florida, 2006.
Bibliography:
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
Statement of Responsibility:
by Thara Rejimon.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 99 pages.
General Note:
Includes vita.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001910670
oclc - 173483336
usfldc doi - E14-SFE0001707
usfldc handle - e14.1707
System ID:
SFS0026025:00001


This item is only available as the following downloads:


Full Text

PAGE 1

Reliability-centricProbabilisticAnalysisofVLSICircu its by TharaRejimon Adissertationsubmittedinpartialfulllment oftherequirementsforthedegreeof DoctorofPhilosophy DepartmentofElectricalEngineering CollegeofEngineering UniversityofSouthFlorida MajorProfessor:SanjuktaBhanja,Ph.D. WilfridoMoreno,Ph.D. MuraliVaranasi,Ph.D. SrinivasKatkoori,Ph.D. ManishAgrawal,Ph.D. DateofApproval: July5,2006 Keywords:Single-Event-Upsets,SoftErrors,DynamicErro rs,ErrorModeling,BayesianNetworks crCopyright2006,TharaRejimon

PAGE 2

DEDICATION ToGodAlmighty

PAGE 3

ACKNOWLEDGEMENTS Iwouldliketoexpressmysinceregratitudetomymajorprofe ssor,Dr.SanjuktaBhanja,for hercontinuousguidanceandsupportthroughoutmydoctoral degreeprogram.Herencouragementand moralsupportwasastrongmotivatingfactorforthesuccess fulcompletionofmyPhD. IwouldalsoliketothankthemembersofmyPhDcommittee,Dr. MuraliVaranasi,Dr.Srinivas Katkoori,Dr.WilfridoMorenoandDr.ManishAgrawalforthe irtimelyhelpandsuggestionswhich improvedthequalitymydissertationsignicantly. MysincerethankstoDr.SudeepSarkarforhisvaluablesugge stionsatdifferentstagesofthis research.IwouldalsoliketoexpressmythankstoDr.Rangan athanforhissuggestionsandhelp. Thankstoallmembers(presentandpast)ofVLSIdesignlabin ElectricalEngineeringfortheirhelp andcooperation. Iamthankfultomylovingparentswhosecontinuousprayersh elpedmetocompletethischallenging task.ThankstomyhusbandRejiwhoseconstantencouragemen twasatremendoushelpthroughoutthe courseofmystudies.Specialthankstohisparentsfortheir support.ThankyouJuvanforyourloveand patience. Aboveall,IthankGODforHisabundantBlessings.

PAGE 4

TABLEOFCONTENTS LISTOFTABLES iii LISTOFFIGURES iv ABSTRACT vi CHAPTER1INTRODUCTION 1 1.1PermanentFaults 4 1.2Single-Event-Transients 5 1.3DynamicErrors 6 1.4CentralModelingTheme 8 1.4.1PermanentFaultModeling 9 1.4.2Single-Event-TransientModeling 9 1.4.3DynamicErrorModeling 11 1.5BayesianNetworkRepresentationofFault/ErrorDetect ionLogic12 1.5.1TheBayesianNetworkModels 13 1.6BayesianInference 15 1.7ImportantResults 15 1.8ContributionofthisDissertation 17 1.9Organization 18 CHAPTER2BACKGROUND 19 2.1FaultDetectionProbability 19 2.2TransientFaults:Single-Event-Upsets 21 2.3DynamicErrors 25 CHAPTER3MODELINGBASEDONBAYESIANNETWORKS29 3.1BayesianInference 30 3.2ClusterBasedInference 31 3.3StochasticInference 35 3.3.1ProbabilisticLogicSampling(PLS)353.3.2EvidencePre-PropagatedImportanceSampling36 CHAPTER4LIFE-DAG:ANACCURATEPROBABILISTICMODELFORERR ORDETECTIONPROBABILITIES 39 4.1Motivation 40 4.2LIFE-BN:Fault/ErrorModel 41 4.3ExperimentalResults 44 i

PAGE 5

CHAPTER5ATIMING-AWAREPROBABILISTICMODELFORSEUSENSIT IVITIES49 5.1TheProposedModel 53 5.1.1TimingIssues 53 5.1.2DelayModelingBasedonLogicalEffort585.1.3TALI:Timing-Aware-Logic-InducedSEUSensitivityM odel59 5.2ExperimentalResults 63 5.2.1ExactInference 65 5.2.1.1InputSpaceCharacterization65 5.2.2LargerBenchmarks 66 5.2.3ResultswithDelayModelBasedonLogicalEffort69 CHAPTER6PROBABILISTICERRORMODELINGOFNANO-DOMAINLOGI CCIRCUITS 73 6.1ProbabilisticErrorModel 75 6.1.1TheBayesianNetworkStructure 76 6.1.2BayesianNetworkQuantication 79 6.2ExperimentalResults 79 CHAPTER7CONCLUSIONANDFUTUREWORK 92 REFERENCES 94 ABOUTTHEAUTHOR EndPage ii

PAGE 6

LISTOFTABLES Table1.1.Probabilisticrepresentationofthe“truth-tab le”ofa(a)twoinputerror-freeAND gateand(b)anANDgatewithdynamicerrorprobability p 7 Table4.1.ISCASbenchmarksandthenumberoffaults 45 Table4.2.(a)Faultdetectionprobabilityestimationerro rsandtimefor1000samples(b) Faultdetectionprobabilityestimationerrorsandtimefor 3000samples46 Table4.3.Comparisonwiththestateoftheart 46 Table5.1.Gatedelaysbasedonlogicaleffort 58 Table5.2.Sizeoforiginalandtime-expandedISCAScircuit sforfanout-dependentdelay model 64 Table5.3.Estimated P(T j i)valuesofnodesinbenchmarkc17fromexactinference64 Table5.4.SEUsensitivityestimationerrorsandtimefor99 99samples68 Table5.5.SizeofTALImodelandestimationtimeforlogical -effortbaseddelaymodel70 Table6.1.Probabilisticrepresentationofthe“truth-tab le”ofatwoinput,error-free,AND gateandforanANDgatewithdynamicerrors73 Table6.2.Outputerrorprobabilities[fromexactinferenc e]80 Table6.3.ComparisonofmodelingusingBayesiannetworksa ndprobabilistictransfermatrix 81 Table6.4.OutputerrorprobabilitiesforISCAS'85circuit s-PLSwith1000samples82 Table6.5.ComparisonofBayesiannetworkmodelingandlogi csimulation82 Table6.6.Errorsensitivityforc17 84 Table6.7.c17-Selectiveredundancy 90 Table6.8.c432-Selectiveredundancy 90 iii

PAGE 7

LISTOFFIGURES Figure1.1.Threemajorclassesoferrors/faultsinlogicci rcuits4 Figure1.2.Stuck-at-faultmodel 9 Figure1.3.SETsensitivitymodel 10 Figure1.4.Dynamicerrormodel 11 Figure1.5.(a)ConditionalprobabilitytableofanANDgate (b)AsmallBayesiannetwork12 Figure2.1.RelatedworksonFDPanalysis 20 Figure2.2.SEUpropagation 22 Figure2.3.RecentworksonSETmodeling 23 Figure2.4.Backgroundworksondynamicerrors 26 Figure2.5.Modelingcapacitiesfordifferentprobabilist icdependencymodels27 Figure3.1.AsmallBayesiannetwork 29 Figure3.2.(a)Thebenchmarkcircuitc17(b)Bayesiannetwo rkrepresentationofc1732 Figure3.3.(a)Moralgraph(b)chordalgraph 32 Figure3.4.Junctiontree 33 Figure4.1.(a)Anillustrativefaultdetectionlogic(b)LI FE-DAGmodelof(a)41 Figure4.2.(a)Estimationtimevs.numberofsamplesinthed etectionlogicforbenchmark c880(b)Estimationtimevs.numberofnodesinthedetection logicforbenchmark c880 47 Figure4.3.DetectionprobabilityasSETsensitivityfordi fferentnodesinc88048 Figure5.1.SEUPropagation 50 Figure5.2.Time-spacetransformedcircuitofbenchmarkc1 7,modelingallSEUs54 Figure5.3.Modiedtime-spacetransformedcircuitofbenc hmarkc17,modelingonlythe possiblysensitizedSEUs 57 iv

PAGE 8

Figure5.4.Time-spacetransformedcircuitofbenchmarkc1 7withlogicaleffortbaseddelay model 60 Figure5.5.(a)AnillustrativeSEUsensitivitylogicforas ubsetofc17(b)Timing-awarelogic-induced-DAGmodeloftheSEUsensitivitylogicin(a) 61 Figure5.6.Inputprobabilitiesforachievingzerooutpute rrors(atnodes22 and 23inpresence ofSEU's:(a) SEU 0 atnode19(b) SEU 1 atnode19(c) SEU 0 atnode11(d) SEU 1 atnode11forc17benchmark 67 Figure5.7.(a)SEUlist-fanoutdependentdelaymodel(b)SE Usensitivityrange-fanoutdependentdelaymodel,withdelta=1;inputbias=0.568 Figure5.8.(a)SEUlist-logicaleffortdelaymodel(b)SEUs ensitivityrange-logicaleffort delaymodelwithdelta=1andinputbias=0.570 Figure6.1.(a)Conceptualcircuitrepresentationofthelo gicusedtodetecterrorsinvolving theerror-freelogicandtheunreliablelogiccomponents(b )Thecorresponding Bayesiannetworkrepresentation 76 Figure6.2.(a)Dynamicerrorsimulationcircuit(b)Trutht ableofaNANDgateintheerroneouslogicblock 83 Figure6.3.(a)Sensitivityofoutputerrorprobabilitywit hrespecttoindividualgateerrorsfor c880(b)Outputerrorprolesoftwoalternativelogicimple mentation(c499and c1355) 84 Figure6.4.c17-Inputspacecharacterizationbylikelihoo dratiofor(a)zeroerroratoutput node22(b)forzeroerroratoutputnode23 85 Figure6.5.Inputspacecharacterizationbylikelihoodrat ioforbenchmarks(a)c432(b)c190887 v

PAGE 9

RELIABILITY-CENTRICPROBABILISTICANALYSISOFVLSICIRCU ITS TharaRejimon ABSTRACT Reliabilityisoneofthemostseriousissuesconfrontedbym icroelectronicsindustryasfeature sizesscaledownfromdeepsubmicrontosub-100-nanometera ndnanometerregime.Duetoprocessing defectsandincreasednoiseeffects,itisalmostimpractic altocomeupwitherror-freecircuits.As wemovebeyond22nm,deviceswillbeoperatingveryclosetot heirthermallimitmakingthegates error-proneandeverygatewillhaveanitepropensityofpr ovidingerroneousoutputs.Additional factorsincreasingtheerroneousbehaviorsarelowoperati ngvoltagesandextremelyhighfrequencies. Thesetypesoferrorsarenotcapturedbycurrentdefectandf aulttolerantmechanismsastheymight notbepresentduringthetestingandreconguration.Hence Reliability-centricCADanalysistoolis becomingmoreessentialnotonlytocombatdefectandhardfa ultsbutalsoerrorsthataretransientand probabilisticinnature. Inthisdissertation,weaddressthreebroadcategoriesofe rrors. First ,wefocusonrandompattern testabilityoflogiccircuitswithrespecttohardorperman entfaults. Second ,wemodeltheeffectof single-event-upset(SEU)ataninternalnodetoprimaryout puts.Wecapturethe temporal natureof SEUsbyaddingtiminginformationtoourmodel. Finally ,wemodelthedynamicerrorinnano-domain computing,wherereliablecomputationhastobeachievedwi th”systemic”unreliabledevices,thus makingtheentirecomputationprocessprobabilisticrathe rthandeterministicinnature. OurcentraltheoreticalschemereliesonBayesianBeliefne tworksthatarecompactefcientmodels representingjointprobabilitydistributioninaminimalg raphicalstructurethatnotonlyusesconditional independenciestomodeltheunderlyingprobabilisticdepe ndencebutalsousesthemforcomputational advantage.Weusedbothexactandapproximateinferencewhi chhasletusachieveorderofmagnitude improvementsinbothaccuracyandspeedandhaveenabledust ostudylargerbenchmarksthanthe vi

PAGE 10

state-of-the-art.Wearealsoabletostudyerrorsensitivi ties,exploredesignspace,andcharacterizethe inputspacewithrespecttoerrorsandnally,evaluatethee ffectofredundancyschemes. vii

PAGE 11

CHAPTER1 INTRODUCTION Technologyscalingandincreasedpowerdensitiesarethema jorfactorsaffectingreliabilityoffuture highperformancemicroprocessors.Reductionindevicesiz escomeswithshrinkingofchannellength anddielectricthicknesseswhichincreasescurrentdensit yandhencetheoperatingtemperature.Transistorsindeepsubmicronandfuturenano-metertechnologiesw illalsohavehighleakagecurrentswhich exponentiallyrisewithtemperature.Sincedeviceswillop eratenearthermallimits,the intrinsic error rateswillincreaseresultinglessreliablecircuitoperat ion.Thefactthatsupplyvoltageandthreshold voltagearenotscalingappropriatelywiththescalingofde vicesizes,resultsinincreasedpowerdensity whichcausesfurtherreliabilityissues.Asaresult,accor dingtotheInternationalTechnologyRoadmap forSemiconductors(ITRS)[1]reliabilityofnano-meterci rcuitisgreatlyaffectedduetomoretransient andpermanentfailuresofsignals,logicvalues,devicesan dinterconnects.Signalintegrityissuessuch ascrosstalkandpowergridnoisewillbeintensiedbymanuf acturingdefects,processvariationsand temperatureuctuationsandhenceitisbecominghardertoa chievethedesiredlevelofnoiseimmunity whilemaintainingtheimprovementtrendsinperformancean denergy-efciency[2,3].Multiplenoise sourcesandoperationalvariationscandynamicallyintera ctwitheachothertofurtheraggravateerror effects[4].Withreducedfeaturesizes,smallersupplyvol tagesandhighertransistordensities,digital logicgatesbecomeinherentlynoisyduetotheabovemention edcompoundnoiseeffects,whichwerefer toas”dynamicerrors”.TheproblemofrealizingreliableBo oleanfunctionsusingnoisygateswasrst introducedbyVonNeumann[5]in1956.Errorboundsforachie vingreliablecomputationhavebeen estimatedin[5]and[6]. Anothermajorreliabilitychallengeissofterrortoleranc e.SofterrorsresultfromSingle-EventUpsetscausedbyradiationinducedparticlebombardment.H ighenergyneutronsandalphaparticles presentinatmosphericradiationswhenbombardedontheact ivesemiconductorregionsofCMOScir1

PAGE 12

cuits,electron-holepairsaregeneratedresultingintran sientsknownassingle-event-upsetsatthegate outputs.Infact,theexistenceofradiationeffectsonmicr ochipshavebeenknownandaddressedsince 1970's.Spaceprogramsreportedon-orbitoccurrenceofsin gleventupsetsin70's;IBMreportedthe useoferrorcorrectioncodes[ECC]inmid-60's;useofECCto masktheoccurrenceofSERinSDRAM wasthefocusinthe1980's[7].Reductionofsofterrorrates (SER)inmemorieswasanimportant researchfocusinthe1990's.Combinationallogiccircuits haveinherenttendencytomasktheeffect ofsingle-event-transients,duetothreemaskingphenomen a,namely,(1)logical(2)electricaland(3) latchingwindowmasking.Duetologicalmasking,agenerate dglitchmightnotpropagatetothecircuit output.Duetoelectricalmasking,ageneratedglitchmight getattenuatedbeforereachingaprimary outputandduetolatchingwindowmasking,aglitchmightnot becapturedbythelatchifitreachesthe latchinputoutsidethelatchingwindow.However,withredu ctionincircuitsize,increaseddevicedensityandreductioninsupplyvoltages,criticalchargepern odewillreducewhichwillreduceelectrical maskingeffect[8].Againduetodecreasingnumberofgatesi napipelinedstage,logicalmaskingas wellaselectricalmaskingeffectsaredecreasinginfuture technologies[9].Latchingwindowmasking effectisalsoreducingbecauseincreasingoperatingfrequ enciesreducethetimeperiodinwhichlatches donotacceptdata.Thiswillresultinunacceptablesofterr orratesinfuturetechnologies.Henceitis importanttounderstandtheoccurrenceanddetectionofthe setransientsandexploreeffectivemitigation schemesnotonlyformemoriesbutforlogiccircuitsaswell. Duetoprocessingdefects,compoundnoiseeffectsandincre asedsofterrorrates,itisalmostimpracticaltocomeupwitherror-freecircuitsinsub-100-nanome terandnanometertechnologies.Hencethe bestfeasiblesolutiontothisproblemistoincorporateef cienterrorcorrectionschemesinfuturehigh performancemicroprocessorsoperatingundersub-100-nan ometerandnanometertechnologies.Inorder toachievereliablecircuitoperation,itisimportanttoin vestigatevarioustypesoferrorspre-dominantin logicnetworkswiththescalingdownoftechnologyandtoimp lementcircuitdesignswithappropriate errormitigationschemes.Applicationofredundancyisone ofthemostcommonlyusederrormitigation technique.However,applyingredundancytoallgatesinaci rcuitwillresultinveryhighareaoverhead andexcesspowerdissipationinadditiontoincreasedcost. Henceitisessentialtoidentifygatesthat 2

PAGE 13

arehighlysensitivetoerrorsandapplyselectiveredundan cymeasurestoachievetrade-offbetween redundancy,reliability,areaoverheadandcost. Inthisdissertation,weinvestigatethreemajorclassesof errorsusingprobabilisticgraphicalrepresentationforvariouserror/faultmodelsthatarereleva ntinpresentandfuturenano-domainCMOS circuits.Specically,welookintothreebroadcategories offaults/errors:(1)permanenthardfaults,(2) single-event-transients-errorscausedbyradiationandp articlebombardmentand(3)dynamicerrorserrorsduetorandomnoisewhicharepredominantinnano-sca lecomputing.Fig.1.1.depictstheabove threetypesoferrors/faultsthatweanalyzeinthisdissert ation.IftheoutputlineofgateG1isshortedto groundduetoafabricationerror,itresultsinastuck-at-0 fault,whichisapermanentandlocalizedfault. Aparticlebombardmentontheactivesemiconductorareaofa transistorwithinthegateG2cancause anunwanted0 1 0transitionresultinginasingle-eventupsetasshowninFi g.1.1.Notethatthiswill causealogicalerroratthegateoutputonlyifbothinputsof thenandgateG2areatlogic1.Hencethe generationandpropagationofsingleeventupsetsthrougha logicgatedependsonthegatetypeandits inputsignalvalues.Likestuck-at-faults,singleeventup setsarealsolocalizedeventssinceitsoccurrence dependsonthelocationoftheparticlehit.Thethirdtypeof error-thedynamicerror-isillustrated byasignalerrorattheoutputlineofgateG3whichisassumed tohaveadynamicerrorprobability p duetocompoundnoiseeffectscausedbytechnologyscalingf actorsasdiscussedbefore.Thistypeof errorcanbemodeledbymeansofprobabilistictruthtablest hatcapturethegateerrorprobabilities.The truthtableofanerroneous and gateisshowningure1.1.Likesingleeventupsets,dynamic errorsare alsotransientinnaturesincetheyarecausedbytemporaryd evicemal-functions,butarenotpermanent failures.However,unlikestuck-at-faultsandsingle-eve nt-upsets,theyarenotlocalizedevents. Every gateinalogiccircuithasaniteerrorprobability. Thefundamentalthemeoftheseerrormodelingisdependency -preservingprobabilisticstructure. Theanalysisoftheseerrorsarecapturedin(1)fault/error detectionprobability,(2)SEUsensitivity and(3)overalloutputerrorestimationoflogiccircuitsha vingunreliablecomponents.Wealsoclassify gatesintermsoferrorsensitivityandanalyzetheeffectof selectiveredundancyapplicationoncircuit reliability. 3

PAGE 14

0 P(x2=1) p 1-p 1 1 1-p P 1 1-p P 1 0 1-p p 0 0 P (y=0) P (y=1) P(x1=1) Probabilistic Truth Table of Gate G3 Stuck-at-0 Fault 1 1 0 1 1 Single-Event-Transient Particle Bombardment G2 G1 Gate with Dynamic Error Probability, p G3 x1 x2 y Combinational Logic Transientinnature localized Figure1.1.Threemajorclassesoferrors/faultsinlogicci rcuits 1.1PermanentFaults Someofthemajorcausesofpermanentfaults/defectsinVLSI chipsareprocessingdefects,material defectsandpackagingfailures.Permanentfaultscanbecla ssiedintostuck-at-faults,transistoropen andshortfaults,bridgingfaults,functionalfaults,dela yfaults,etc.Mostofthesefaultscanbecovered bysingle-stuck-atfaulttests[10].Wemodelstuck-atfaul tsinlogiccircuitsformeasuringrandom patterntestability. FaultDetectionProbability(FDP)isanimportanttestabil itymeasurethatisusefulfornotonly generatingtestpatterns,butalsotovetdesignsforrandom -inputtestability. FaultDetectionProbability (FDP)ofastuck-atfaultf2F,whereFdenotesthesetofallfaults,inacombinationalci rcuit,C,isthe probabilitythatfisdetectedbyarandomlychosenequallylikelyinputpatterns .Signalprobabilities, aswellasFDP,areaffectedbyspatialcorrelationsinduced bycircuitre-convergence. MostoftheanalysisofFDPwasperformedin80's(asproposed bySeth etal. [11],Wunderlich[12], etc.)Duetothehighcomputationalcomplexityinvolvedinc omputingsignalandfaultdetectionprobabilitiestheyresorttoseveralapproximationstrategies. Inthiswork,were-visitthisoldproblemforthe followingtworeasons:(1)State-of-the-artalgorithmfor computationof exact faultdetectionprobabili4

PAGE 15

ties,basedonBinaryDecisionDiagrams[13]donotscalewel l,intermsoftimeandspacerequirements, withcircuitsize.Theyusuallyrequirevariousheuristicbasedapproximationsofthepuremodel.Ina laterdevelopment(1988),Bayesiannetworkwasintroduced forprobabilisticreasoningandbeliefpropagationandithasbeenappliedinarticialintelligencean dimageanalysis.Weproposean exact model basedonBayesiannetworks,forestimationofFDPandusesma rtandefcientstochasticinference schemeswhichhasexcellentaccuracy-timetrade-off.(2)T raditionally,FDPhasbeenusedfortestpoint insertions,however,itcanalsobeusedasanupperboundofr andomsingle-event-transient(SET)sensitivity,whichisimportantforcharacterizationofimpac tofsofterrorsinlogicblocks.Thusweprovide afresherlookintoanoldproblembyadoptinganovelandefc ientscheme. 1.2Single-Event-Transients High-energyneutronspresentincosmicradiationsandalph aparticlesfrompackagingmaterials giverisetosingleeventupsets(SEUs)resultinginsofterr orsinlogiccircuits.Whenparticleshita semiconductormaterial,electron-holepairsaregenerate d,whichmaybecollectedbyaP-Njunction, resultinginashortcurrentpulsethatcauseslogicupsetor SingleEventUpset(SEU)inthesignalvalue. AnSEUmayoccurinaninternalnodeofacombinationalcircui tandpropagatetoanoutputlatch. WhenalatchcapturestheSEU,itmaycauseabitip,whichcan alterthestateofthesystemresulting inasofterror.Sofar,softerrorsareofseriousconcerninm emories,whereasinlogiccircuitssofterror rateiscomparativelylowduetological,electricalandtem poralmaskingeffects.However,asprocess technologyscalesbelow100nanometersandoperatingfrequ enciesincrease,theabovemaskingbarriers diminishduetolowsupplyvoltages,shrinkingdevicegeome tryandsmallnoisemargin.Thiswillresult inunacceptablesofterrorfailureratesinlogiccircuitse venformainstreamapplications[14]. SoftErrorSensitivityofanodeinalogicblockdependsonth efollowingfourfactors:(1)the particlebombardmentrateonachipwhichisfairlyuniformi nspaceandtime.(2)Electricalmasking probability:probabilitythataparticlehitatthenodecau sesasingle-event-transientofaniteduration (itdependson V dd V th andalsoontemperature),(3)theSETsensitizationprobabi lity:theprobability ofanSETgeneratedanthenodeispropagatedtoanoutputnode and(4)latchingprobabilitywhichisa functionoflatchcharacteristicsandtheswitchingfreque ncy. 5

PAGE 16

Inthiswork,weexploreSETsensitivityofindividualgates inacircuitbyaccuratelyconsidering theeffectof(1) SEUduration ,(2) effectofgatedelays ,(3) re-convergence inthecircuitstructureand mostimportantly(4) inputs .Severalworksonsofterroranalysisestimatetheoverallo utputsignal errorsduetoSEUsattheinternalnodes[4,15,16,17,18].No tethatourfocusistoidentifythe SEUlocationsthatcausesofterrorsattheoutput(s)withhi ghprobabilitiesand not ontheoverallsoft errorrates.Knowledgeofrelativecontributionofindivid ualnodestooutputerrorwillhelpdesignersto applyselectiveradiationhardeningtechniques.Thismode lcaneasilybefusedwiththemodelingofthe latches[17,19]consideringparameterssuchaslatchingwi ndow,setup,holdtime, V th and V dd [15,16, 17]foracomprehensivemodelcapturingprocessing,electr icalandlogicaleffect. 1.3DynamicErrors TheITRSroad-mappredictsCMOSdevicedimensionstoreachc losetothedesignlimitof50nm by2020.Circuitsbuiltwithsuchnano-dimensionaldevices willfacedesignchallengesthathavenot beenmuchofanissuesofar.Onesuchchallengeinvolvedynam icerrorsintheinterconnectsandgates. Whatisadynamicerror? Theseerrorswillbetransientinnature,occurringanywher einthecircuit, buthardtodetectbyregulartestingmethodologies(sincet heyarenotpermanentdamages).Theycanbe characterizedonlyprobabilistically.Eachdevice(logic gate/interconnect)willhavecertain,non-zero, propensityforanoutputlineerror.Theseerrorprobabilit iescouldbeashighas10%[20].Traditional errormaskingbyextralogicmightnotbepossiblesincethes eextralogicwillitselfbeerror-prone. Whatcomplicatesthepictureisthatthispropensityforerr orswill,intrinsically,existateachgate. Hence,infuture,reliablecomputationhastobeachievedwi th”systemic”unreliabledevices[5].Thus makingtheentirecomputationprocessprobabilisticrathe rthandeterministicinnature.Forinstance, giveninputs1and0,anANDgatewilloutputthestate0,onlyw ithprobability1p ,where p isthe gateerrorprobability.Thus,traditional,deterministic ,truth-tablebasedlogicrepresentationwillnot sufce.Instead,theoutputneedstobespeciedintermsofp robabilities,conditionedonthestatesof theinputs.Table1.1.showssuchasspecicationfor(a)a2inputerror-freeANDgateusedincurrent CMOSdesignsand(b)2-inputANDgatewithdynamicerrorsfor next-generationtechnologies. 6

PAGE 17

Table1.1.Probabilisticrepresentationofthe“truth-tab le”ofa(a)twoinputerror-freeANDgateand (b)anANDgatewithdynamicerrorprobability p IdealANDgate X i 1 X i 2 P(X ojX i 1;X i 2)X o =0 X o =1 0010 0110 1010 1101 ANDgatewithdynamicerror X i 1 X i 2 P(X ojX i 1;X i 2)X o =0 X o =1 001pp 011pp 101pp 11 p 1p (a)(b) Therewillbeneedforformalismstocompare,evaluate,andv etcircuitdesignswiththesedynamic errorpronegates.Notethat thesenano-domaindynamicerrorsareunlikethetraditiona lpermanent hardfaultsinCMOScircuitsduetorun-timedevicefailureo rmanufacturingdefects,norisitsimilarto theconventionalsofterrorsthatcanariseduetoradiation anddevicenoise,whicharelocalized,purely random,andcanbereducedbyexternalmeans The sourcesofdynamicerrors canbedifferentforvariousemergingtechnologies.Theerr orprobability p (seeTable1.1.)willbedependentonthetechnology.Forins tance, 1.Innano-CMOS,dynamicerrorwillariseduetotheuseofult ralowvoltagedesign,resultingin supplytogroundbounce,leakageandcouplingnoiseandduet osmalldevicegeometryoperating withonlyahandfulofelectrons. 2.Incarbonnano-tube(CNT)[21,22]andresonanttunneldio des(RTD)[23],dynamicerrorswill ariseduetotheiroperatingconditionsnearthermallimit. Infact,predictionsalsosuggestthat increasedclockspeedandincreasedcomputationalrequire mentswouldmakeswitchingtransition energylimitstoafeworderofmagnitudeof kT [20],where T isthetemperatureinKelvinand k istheBoltzmann'sconstant. 3.Inquantum-dotcellularautomata(QCA)basedcircuits,e rrorscanbeclassiedasstatic(decay) errors,switching(thermal)errors,dynamicerrors,ander rorsduetobackgroundchargeuctuations[24].Decayerrorsoccurwheninformationstoredinal atchislostbeforetheendofaclock cycle.Thiscanhappenwhenanelectronlockedatthetopdot, tunnelsoutofitintothebottomdot orviceversa,whenthelatchisinitslockedstate.Switchin gerrorsoccurwhenagateswitches 7

PAGE 18

intothewronglogicstatewhenclockisapplied,becauseofe xternalthermalexcitation.Dynamic errorsoccurwhenclockfrequencyapproachestheelectront unnelingrateinthedevice.Inadditiontodynamicerrors,permanenthardfaultscanoccurinQC Adesignsduetocelldisplacement, cellmisalignmentandcellomission[25]. Errorprobability,pofagatedependsonmanyfactorssuchas switchingatthatnode,leakage, temperature, V th ,etc.Forexamplegateswithhighswitchingmayproducewron gsignalswhenthey operateneartheirthermallimitsduetoincreasedpowerdis sipation.Adeviceunderdynamicerror atanytimeinstantmayoperatecorrectlyafterashortperio d.Thusprobabilityofdynamicerrorin individualdevicesneedsextensivecharacterizationatqu antumlevelandithastobeobtainedfrom someindustrialdata.Giventhepvaluesofthegates,ourmod elaccuratelyestimatestheoveralloutput errorprobabilities.Inthiswork,weassumethatallgatesh avethesamedynamicerrorprobability, p tocompareourworkwiththestate-of-the-art.However,itc ouldbeeasilyextendedtoamoredevice friendlydiverseprobabilisticmodelwhereeachgatewould haveuniqueerrorprobability. 1.4CentralModelingTheme Wemodeltheerrors/faultsinalogiccircuitbycomparingth eoutputsfromtheideallogicand fromacorrespondingerror/faultencodedlogic.Wecompare theidealoutputswiththecorresponding error/faultencodedoutputsbypassingtheseoutputsignal sthroughXORgates.AnXORoutputof one indicatesthaterror/faultissensitizedtotheoutput.Hen ceprobabilityofXORoutputbeingequalto one givestheerror/faultdetectionprobability.Henceourmod elhasthreedistinctunits.(1)ideallogic block(2)errorencodedlogicblockand(3)detectionunit.T heideallogicblockisthesameforall ofourfault/errormodels,whereastheerrorencodedblocks areuniqueforeachmodel.Thestructure oftheerrorencodedlogicblockdependsonthenatureofthee rror/faultanditseffectontheprimary outputs.Thedetectionunitconsistsofasetofcomparatorn odes,thecardinalityofwhichdependson thenumberofprimaryoutputsandthenumberofsensitizatio npathsfromeachfault/errorlocation. 8

PAGE 19

Fault S ensitization Logics DetectionLogic T(X6 s @1 _Z2)T(X6 s@1 _Z1) T(X7 s @0 _Z1) Z2 X 6_s@1 X6 s@1 Z1 X 6_s@1 Injected s tuck-at-fault X7 s @0 Z1 X 7s@0 Injected s tuck-at-fault Z 1 Z 2 Original C ircuit (Ideal Logic) X 5 X 6 X 7 Internal Signals Primary I nputs I 1 I 2 I 3 I 4 I 5 I 6 X 1 X 2 X 3 X 4 Figure1.2.Stuck-at-faultmodel 1 .4.1PermanentFaultModeling Formodelingthepermanentstuck-atfaults,weusepartialduplicationoftheoriginalcircuitforthe faultdetectionlogic.ThisisshowninFig.1.2.Hereweduplicateonlythesensitizedpathsfromthe faultlocationtoprimaryoutput(s).Tomodeleachfault,thefaultdetectionlogicwillhaveonedistinct faultencodedlogicblock.Hencegiventhecircuitstructureandthefaultlist,weareabletomodel all thefaultsinthefaultlistbymeansofasetoffaultencodedlogicblocks.Tosimulatetheeffectof stuck-at-faults,weuseanadditionalinputforeachfaultencodedlogicblock,thatinjecttherespective faultatthefaultlocation,thusmakingthefaultynodepermanentlystuckat1or0.Wethenpropagate thisfaultthroughthefaultencodedlogicblockuntilaprimaryoutput(s)isreached. 1.4.2Single-Event-TransientModeling ThemajordifferencebetweenpermanentfaultmodelingandSEUmodelingisinthatforpermanent faultmodeling,wemodellogicalmaskingeffectwhereasforSEUmodelingwemodelbothlogical and temporal maskingeffect.DuetothetemporalnatureofSEUs,onlythoseSEUswhichreachthe 9

PAGE 20

Detection Logic T(X6 5_SEU1 Z2 )T(X6 5_SEU1 Z1 ) Primary Inputs I1 1 I2 1 I3 1 I4 1 I5 1 I6 1 X2 2 X3 2 X4 2 X1 2 Z1 6 Z2 6 Ideal Logic (Time-Space transformed Circuit ) X5 5 Internal Signals X6 5 X7 5 Z2 6_X6_5_SEU1 X6 5_SEU1 Injected SEU Z1 6_X6_5_SEU1 Error Sensitization Logic Comparator output = 1 SEU is propagated to output Figure1.3.SETsensitivitymodel primaryoutputduringthelatchingwindowarecapturedbyth eoutputlatch,whichcancauseabit-ipat theoutputandhencesofterror.Hencetomodelsingle-event -transients,weneedtotakeintoaccountof thetimingaspectssuchasgatedelays,pathdelaysandwidth oftheSEU.Weincorporatethesetiming featuresbyperformingatime-spacetransformationoftheo riginalcircuit.Thisisdonebyreplicating gatesinthecircuitsseveraltimesdependingonthetimefra mesatwhichnewgateoutputsignalsare evaluated.TomodelSEUs,weneedtoconsideronlythosegate sinthefan-inconesoftheprimary outputsthatareevaluatedduringlatchingwindow.Hencefr omtheexpanded(time-spacetransformed) circuit,wederiveasub-circuitthatpropagatesSEUstoout putlatches.Fromthiscircuit,weidentify thoseSEUsthatmightbepossiblysensitizedtoatleastonep rimaryoutputandderivea reduced SEU list.Fromthetime-spacetransformedcircuitandthereduc edSEUlist,weconstructtheSEUdetection logicexactlyinthesamemannerasthestuck-at-faultdetec tionlogicwhichisshowninFig.1.3.In thisgure,werepresentsignalsassubscriptedvariables, wherethesubscriptdenotesthetimeframeat whichsignalsareevaluated.EffectofanSEUismodeledbyin jectingafaultyinputsignal.EachSEU hasauniqueerrorencodedlogicblocktopropagatetheSEUef fecttoprimaryoutput(s).Thuswemodel all SEUsinthereducedSEUlistsimultaneouslybymeansofasing lecomprehensivemodel. 10

PAGE 21

Y1e Y2e E2 E1 I2 I3 Y1 Y2 I1 Primary Inputs Ideal Logic Error Encoded Logic (Gates with Dynamic Error) Detection Logic Figure1.4.Dynamicerrormodel Inthiswork,weusetwotypesofgatedelaymodels:(1)fanout dependentdelaymodelwheregate delaysareassumedtobeequaltoitsfanoutsand(2)logicale ffortbaseddelaymodelwheregatedelays dependnotonlyonfan-outbutalsooninputcapacitanceaswe llasparasiticcapacitance. 1.4.3DynamicErrorModeling Innano-domaincomputing,gatesinlogicnetworkshaveanin trinsicpropensitytooperateerroneously.Hencegatesaretoberepresentedby probabilistic truthtablessimilartothatshowninTable1.1..Inthiswork,weassumethatdynamicerrorprobabil ityofofagateis p .Tomodeltheseerrors, weduplicatethe whole circuit.Gatesintheideallogicblockarerepresentedbyth eidealtruthtable andthecorrespondinggatesintheerrorencodedlogicblock arerepresentedbyrespectiveprobabilistic truthtable.Bothidealanderrorencodedblocksarefedbyth esameprimaryinputs.Asintheother twomodelsdiscussedabove,thedetectionlogiccomparesou tputsignalsfromtheidealblockandthe errorencodedblock.Fig.1.4.showsthedynamicerrordetec tionlogic.Dynamicerrormodelisto bedistinguishedfromtheothertwopreviouslydiscussedmo delsinthat,hereweduplicatethe whole circuittocapturetheoveralleffectofindividualgateerr orsonthecircuitoutput. 11

PAGE 22

Input1 (X 1 )Input2 (X 2 )Output (X o ) P(X 1 =0)P(X 1 =1)P(X 2 =0)P(X 2 =1)P(X o =0)P(X o =1) 1100 0011 1010 0101 1110 0001 X 1 X 2 X 3 X 4 X 5 X 6 (a) (b) Figure1.5.(a)ConditionalprobabilitytableofanANDgate (b)AsmallBayesiannetwork 1.5BayesianNetworkRepresentationofFault/ErrorDetect ionLogic BayesianNetworks[26]aregraphicalprobabilisticmodels representingthejointprobabilityfunctionoverasetofrandomvariables.ABayesianNetworkisadi rectedacyclicgraphicalstructure(DAG), whosenodesdescriberandomvariablesandnodetonodearcsd enotedirectcausaldependencies.Adirectedlinkcapturesthedirectcauseandeffectrelationsh ipbetweentworandomvariables.Eachnode hasaconditionalprobabilitytable(CPT)excepttherootno des.Eachrootnodehasapriorprobability table.Alogiccircuitcanbetransformedintoanequivalent Bayesiannetworkbyrepresentingthestateof aline(logiczeroorone)withanodeandtheconditionalprob abilitytableofeachgatebydirectededges thatquantiestheconditionalprobabilityofthestateofa node given thestatesofitsparentsoritsdirect causes.Figure1.5.(a)givestheCPTofanANDgatederivedfr omitstruthtableandFigure1.5.(b)isa smallBayesiannetworkstructure.InaBayesiannetwork,th estateofanodeisindependentofthestates ofallothernodesinthenetwork, given thestatesofitsparentnodes.ThisuniquepropertyofBayes ian networkistermedas conditionalindependence .Exploitingthisconditionalindependenceamongthe variablesinaBayesiannetwork,thecompletejointprobabi litydistributionovertheentirenetworkcan besimpliedintoaminimalfactoredrepresentation. Theattractivefeatureofthisgraphicalrepresentationof thejointprobabilitydistributionisthatnot onlydoesitmakeconditionalindependencyrelationshipsa mongthenodesexplicitbutitalsoservesas acomputationalmechanismforefcientprobabilisticupda ting.ProbabilisticBayesianNetworkscanbe usednotonlytoinfereffectsduetoknowncauses(predictiv eproblem)butalsotoinferpossiblecauses forknowneffects(thebacktrackingordiagnosisproblem). 12

PAGE 23

1.5.1TheBayesianNetworkModels LogicInducedFaultEncodedDirectedAcyclicGraph(LIFE-D AG): Wemodelsinglestuck-at-faults (errors)inlargecombinationalcircuitsusingaLogicIndu cedFaultEncodedDirectAcyclicGraph (LIFE-DAG)graphstructurebytransformingthefaultdetec tionlogicintoaBayesianNetworkasdiscussedintheprevioussection.LIFE-DAGconsistsofnodesr epresenting(1)primaryinputsignals (2)injectedfaults(3)internalsignalsintheideallogic( 4)correspondinginternalsignalsinthefault encodedlogic(5)idealoutputsignalsfromtheideallogic( 6)correspondingfaultyoutputsignalsfrom thefaultencodedlogicand(7)faultdetectionsignalswhic hareoutputsfromdetectionlogic.Notethat onlythosenodesdescendingfromthefaultlocationarerepl icatedinthefaultencodedlogic.Someof thenodesinthefaultencodedlogicderivealloftheirparen tsignalsfromthesamefaultencodedlogic whereassomenodeshavetheirparentsbothfromtheidealcir cuitandthefaultencodedlogicblock. SizeoftheLIFE-DAGstructureisproportionalto(1)thenum berofnodesintheideallogicblock, determinedbythecircuitsize(2)numberof hard faults,forwhichthedetectionprobabilitiesaretobe estimated(3)numberofnodesinthefaultencodedlogicbloc k,whichdependsonthelengthofsensitizationpathsfromeachfaultlocationand(4)numberofdetec tionnodes(comparatornodes)cardinality ofwhichisdeterminedbythenumberoftestableoutputsfrom eachfaultlocation. Timing-AwareLogicInducedSETSensitivityModel(TALI-SE S): WemodelSingleEventTransients inlargecombinationalcircuitsusingaTimingawareLogici nducedSoftErrorSensitivitymodel(TALISES),whichisacompletejointprobabilitymodel,represen tedasaBayesianNetwork.Thisnetwork isderivedfromtheSETdetectionlogicthatwediscussedpre viously.InTALI-SES,thegraphstructure correspondingtotheideallogicconsistsofsignalsinthet ime-spacetransformedcircuit.Herenodes representsignalsthatcarrytiminginformationinadditio ntothesignallocation.Signalarerepresented intheformofcompositerandomvariables.Forexample(i;t)isasignalevaluatedattheoutputofgate i attime t .EffectofeachSEUintheSEUlistismodeledbyasub-graphth atcontainsnodesrepresentinginjectedSEU,nodesrepresentingsignalsdescendingfr omtheSEUlocation.Signalsintheerror encodedlogicarealsointheformofcompositerandomvariab lescarryingbothtimeandspaceinformation.SizeoftheTALI-SESBayesiannetworkstructureisp roportionalto(1)thenumberofnodesin theideallogicblock,determinedbythesizeofthetime-spa cetransformedcircuit(2)numberofSEUs 13

PAGE 24

intheSEUlist,(3)numberofnodesintheerrorencodedlogic block,whichdependsonthelengthof sensitizationpathsfromeachSEUlocationtoprimaryoutpu t(s)whichareevaluatedduringthelatching windowand(4)numberofdetectionnodes(comparatornodes) cardinalityofwhichisdeterminedby thenumberoftestableoutputs(outputsevaluatedduringth elatchingperiod)fromeachSEUlocation. Sizeofthetime-spacetransformedcircuit(ideallogicblo ck)isdeterminedbythenumberofgatesin thefaninconesofcircuitoutputsevaluatedduringthelatc hingwindow.Hencesizeoftheideallogic isdeterminedbythegateandpathdelaysandSEUduration.As sumingfanoutdependentdelaymodel andSEUdurationofonetimeunit,sizeoftheideallogic(tim e-spacetransformedcircuit)isdoublethat oftheoriginalcircuit,intheworstcase.Weexplainthisin detailinChapter5. LogicInducedProbabilisticErrorModel(LIPEM): Wemodeltheeffectofdynamicerrorsinnano devicesbyestimatingtheoverallerrorprobabilityattheo utputofalogicblockwhereindividuallogic elementsaresubjecttoerrorwithaniteprobability, p .WeconstructtheoverallBNrepresentation basedonlogiclevelspecicationsbycouplinggatelevelre presentations.Eachgateismodeledusinga conditionalprobabilitytable(CPT)whichmodelstheproba bilityofgateoutputsignalbeingatalogic state,givenitsinputsignalstates.Anideallogicgatewit hnodynamicerrorhasitsCPTderivedfrom thegatetruthtable,whereastheCPTofagatewitherrorisde rivedfromitstruthtableandtheerror probabilities,whichcanbeinputdependent.Theoveralljo intprobabilitymodel(LIPEM)isconstructed bynetworkingtheseindividualgateCPTs.Thismodelcaptur esallsignaldependenciesinthecircuit andisaminimalrepresentation,whichisimportantfromasc alabilityperspective.InLIPEM,the originalcircuitisduplicatedtomodeltheeffectofdynami cerrors.Hencebothidealanderrorencoded logichavethesamesize.Sizeofthedetectionisdetermined bythenumberofprimaryoutputs.Hence sizeoftheLIPEMstructureisafunctionofthecircuitsizea ndthenumberofprimaryoutputs.From Fig.1.4.itcanbeseenthatthesameprimaryinputsaregoing tobothidealanderrorencodedlogic blocksandtheyarere-convergingtothecomparatorgatesin thedetectionunit.HencetheLIPEM DAGstructurehasmorere-convergencecomparedtotheother twomodelsdiscussedbefore.Hence computationalcomplexityassociatedwithBayesianinfere ncinginthiscaseishigherthanthatofLIFEDAGandTALI-SES.However,complexityofothertwomodelsde epensonthenumberoffaultstobe modeledandtheirlocationaswell.Hencetheoverallcomput ationalcomplexityforeachmodelwill 14

PAGE 25

varydependingonthecircuitstructure,numberofoutputs, numberoffaultstobedetectedandtheir location.1.6BayesianInference Bayesianinferenceistheprocessofupdatingsignalprobab ilitiesbasedonthenetworkstructureand theknowledgeofafew evidence .Inpredictiveproblems,theevidenceisthesetofinputpro babilities whereasindiagnosticproblems,itcanbetheprobabilities ofanysetofnodesinthenetwork.In thiswork,weexplorethreeinferenceschemes:(1)Theexact inferencescheme,whichisalsoknown asclusteringtechnique[27],wheretheoriginalDAGistran sformedintospecialtreeofcliquessuch thatthetotalmessagepassingbetweencliqueswillupdatet heoverallprobabilityofthesystem.The computationalcomplexityoftheexactmethodisexponentia lintermsofnumberofvariablesinthe largestclique.Sinceexactinferenceinexpensiveinterms oftime,itissuitableforonlysmallcircuits. Tohandlelargercircuits,weexploretwostochasticsampli ngalgorithms,namely(2)ProbabilisticLogic Sampling(PLS)and(3)EvidencePre-propagatedImportance Sampling(EPIS)[28].Thesealgorithms arescalable,patterninsensitive,anytimemethodwithexc ellentaccuracy-timetrade-off. ThespacerequirementoftheBayesiannetworkrepresentati onisdeterminedbythespacerequired tostoretheconditionalprobabilitytablesateachnode.Fo ranodewith n p parents,thesizeofthetable is2 n p+1 .Thenumberofparentsofanodeisequaltothefan-inattheno de.IntheLIFE-BNmodelwe breakallfan-insgreaterthan2intoalogicallyequivalent ,hierarchicalstructureof2fan-ingates.For nodeswithfan-in f ,wewouldneeddf=2eextraBayesiannetworknodes,eachrequiringonly2 3 sized conditionalprobabilitytable.1.7ImportantResults FDP: Wedevelopedanaccurateandefcienttechniqueforestimat ionfaultdetectionprobabilities usingLIFE-DAGwhichisan exact model,unlikeseveralotherexistingtechniquesforFDPana lysis. Weoutperformthestate-of-the-artBDDbasedtechniqueinb othtimeandspacecomplexity.Moreover, ourmodelhandlesallbenchmarksuniformlywhereastheBDDb asedmodelrequirespecialheuristics anddecompositiontechniquestohandleindividualcircuit s. 15

PAGE 26

SETModelingandAnalysis: Weusebothexactandapproximateinferenceschemesforupda ting signalprobabilities.Forsmallcircuits,weusetheexacti nferencetopredicttheeffectofinputsand SEUatanodeontheoutputs.wealsoexploittheuniquebacktr acking(diagnostic)featureofBayesian networkstoanswerquerieslike:“Whatinputbehaviorwillm akeSEUatnodejdenitelycausinga bit-iptheatcircuitoutputs?”or“Whatinputbehaviorwil lbemoreconducivetonoerroratoutput giventhatthereisanSEUatnodej?”Forhandlinglargercirc uits,weusetheapproximateBayesian inferencescheme,namely,ProbabilisticLogicSampling(P LS)andcomputetheSEUsensitivitiesof individualgatesinthecircuit.Weclassifygatesintheord erofSEUsensitivityvalues.Theseresults canbeusedfordeterminingthegatestobechosenforselecti veredundancyapplication(oranyother softerrorhardeningtechniques).Notethatourmodelisthe onlynon-simulativemodel thatcaptures logicalandtemporalmaskingeffects.Wecompareourestima tionresultswithlogicsimulationandnd thatourmodelingtechniqueishighlyaccurateandhaslesst imeandspacecomplexitycomparedto simulativemethod. DynamicErrorModelingandAnalysis: Weusetheexactinferenceschemetocomputetheoverall outputerrorprobabilitiesofsmallbenchmarksandshowtha tourmodelisextremelytimeandspace efcientbycomparingwiththestate-of-the-arttechnique whichisaProbabilisticTransferMatrix[PTM] basedmodel.Intheabsenceofmediumsizedbenchmarksforna no-domaincircuit,weuseISCAS'85 benchmarkstodemonstratethescalabilityandaccuracyofo urmodelingscheme.ThePTMtechnique canhandleonlysmallcircuits,duetoextremelyhighcomput ationalcomplexity,whereasoursisthe unique probabilisticmodelthatcanhandlemid-sizecircuits.For handlingtheselargercircuits,we usetwoapproximateinferenceschemes,EvidencePre-propa gatedImportanceSampling(EPIS)and ProbabilisticLogicSampling(PLS).Ourmodelcanbeusedfo rrankingequivalentdesignsinterms ofpropensityofdynamicerrors.Weusethediagnosticaspec tsofBayesiannetworkstocharacterize theinputspacewhichresultsinadesiredoutputbehaviorev eninthepresenceofdynamicerrors. Finally,ourmodelingtoolcanbeusedforchoosingthegates tobeselectedforapplicationoferrortoleranttechniquesbasedontheirdynamicerrorsensitivi tyandalsofordeterminingtheamountofsuch techniquesrequiredforachievingadesirederrortoleranc e. 16

PAGE 27

1.8ContributionofthisDissertation Themajorcontributionsofthisresearchcanbesummarizeda sfollows: 1.Anaccurateandefcientnon-simulativeprobabilisticm odel(LIFE-DAG)forestimationoffault detectionprobabilitieswhichisameasureofrandompatter ntestability.Weuseusingtwocloseto-exactinferenceschemes.Ourmodeladvancesthestate-o f-the-artBDDbasedtechniquesin termsoftimeandspacecomplexityandinprovidingauniform model. 2.Timing-AwareLogic-InducedSingle-Event-UpsetSensit ivityModel(TALI-SES):Acomprehensivemodelfortheunderlyingerrorframework,whichisagra phicalprobabilisticthatiscausal, minimalandexact.Usingthismodelweareabletoaccurately estimatetheSEUsensitivitiesof individualnodesinacircuitcapturingspatialandtempora lsignalcorrelations,effectofinputs, gatedelay,SEUdurationandre-convergenceinthecircuit. Weusethismodel(1)toinfererror probabilitiesusinganexactinferencealgorithmthattran sformsthegraphintoaspecialjunction treeandreliesonlocalmessagepassingschemeandsmartsto chasticnon-simulativeinference algorithmsthathavethefeaturesofany-timeestimatesand excellentaccuracy-timetrade-offfor largercircuits(2)toclassifynodeswithrespecttoSEUsen sitivitiesforapplicationofselective redundancyschemesforsofterrorhardeningand(3)togener atetheinputspaceforwhichan SEUoccurringataninputnodehavenoimpactontheoutputbye xploitingtheuniquebackward reasoningpropertyofBayesiannetworks. 3.ALogic-InducedProbabilisticErrorModel(LIPEM)which isanexactprobabilisticrepresentationofthedynamicerrormodelthatcapturesthecircuitt opologyandindividualgateerror probabilityforestimatingtheoveralloutputerrorprobab ilitiesoflogiccircuitsgivendynamicerrorprobabilitiesineachgate.Weusebothexactandstochas ticinferenceschemesforpropagation ofprobabilitiesandshowthatexactinferenceisfastertha tthestate-of-the-artPTMtechnique.To handlelargercircuitsweusetwostochasticsamplingalgor ithmsanddemonstratetheefciency andaccuracyoftheseschemesbycomparingwithsimulationr esults. 17

PAGE 28

4.WealsousetheLIPEMmodelto(a)computedynamicerrorsen sitivitiesofindividualgatesin acircuit,whichisusefulforselectiveredundancyapplica tion(b)tocompareandvetequivalent designswithrespecttodynamicerrors(c)tocharacterizet heinputspaceforachievingadesiredoutputbehaviorintermsoferrortolerancebyutilizi ngtheuniquebacktrackingfeatureof BayesianNetworksand(d)toclassifynodeswithrespecttod ynamicerrorsensitivityandanalyze theeffectofselectiveredundancytechniquesoncircuitre liability. 1.9Organization Theremainderofthisdissertationisorganizedasfollows. Chapter2isasummaryofthepriorworks doneonFDPmodeling,softerrormodelingandanalysisandon inherenterrors.InChapter3,Bayesian Networksfundamentalsanddifferenttypesofinferencesch emesarediscussed.Chapter4describes modelingofStuck-at-FaultsusingLogicInducedFaultEnco dedDirectedAcyclicGraph(LIFE-DAG) model.InChapter5wediscussthemodelingofSingle-EventUpsetsincorporatingtimingissues.Here wedescribehowTiming-Aware-Logic-InducedSoftErrorSen sitivity(TALI-SES)modelcanbeused toestimatetheSEUsensitivitiesofindividualnodesinlog iccircuits.Chapter6dealswithmodelingofinherent/dynamicerrorsinnano-domainlogiccircui ts.TheproposedLIPEM(Logic-InducedProbabilisticErrorModel)canbeusedfor(a)estimationof overalloutputerrorprobabilitiesdueto dynamicerrors,(b)errorsensitivityofindividualgatesi nlogiccircuits(c)inputspacecharacterizationand(d)applicationofselectiveredundancymeasures. InChapter7wediscussresultsandfuture researchdirections. 18

PAGE 29

CHAPTER2 BACKGROUND WithincreaseinthecomplexityofdigitalVLSIcircuits,de signerrorsand/ormanufacturingdefects canoccurinachip.Webroadlyclassifyerrors/faultsintot hreecategories: Permanentfaults caused bymanufacturingdefects,materialdefectsandpackagingf ailures.(Examplesofsuchfaultsarestuckat-faults,openandshortfaults,bridgingfaults,ect.), transienterrors causedbyatmosphericradiation, knownassofterrorsand dynamicerrors whichareinherentinnano-domaindevicesduetotechnology scalinganddevicesoperatingnearthermallimits.2.1FaultDetectionProbability FaultDetectionProbability(FDP)isanimportanttestabil itymeasurethatisusefulfornotonlygeneratingtestpatterns,butalsotovetdesignsforrandom-in puttestability.Mostofthepermanentfaults canbecoveredbysinglestuck-at-faulttests[10].Inthisd issertation,wemodelsinglestuck-at-faultsby estimatingfaultdetectionprobability.Traditionally,F DPhasbeenusedfortestpointinsertions,however,itcanalsobeusedasrandomsingle-event-transient( SET)sensitivitymeasure,whichisimportant forcharacterizationofimpactofsofterrorsinlogicblock s. FaultDetectionProbability(FDP)ofastuck-atfaultf2F,whereFdenotesthesetofallfaults,ina combinationalcircuit,C,istheprobabilitythatfisdetec tedbyarandomlychosenequally-likelyinput patterns .Signalprobabilities,aswellasFDP,areaffectedbyspati alcorrelationsinducedbycircuit re-convergence. Figure2.1.enlistssomeofresearchworksdoneonFDPanalys is.Duetothehighcomputational complexityinvolvedincomputingsignalandfaultdetectio nprobabilities,severalapproximationstrategieshavebeendevelopedinthepast[12,29,30,31].Thecutt ingalgorithm[31],computeslowerbounds offaultdetectionprobabilitiesbypropagatingsignalpro babilityvalues.Thisalgorithmdeliversloose 19

PAGE 30

FDP Analysis Approximate ModelsFDP boundsExact Models W-B Jone et al. [CACOP] ’95 [23] Wunderlich [PROTEST] ’85 [4] Pathak ’93 [24] Savir et al. ’84 [25] Savir et al. ’84 [25] Pathak ’93 [24] Seth et al. [PREDICT] ’85 [3] Chakravarty et al. ’90 [26] Krieger et al. [PLATO] ’93 [5] Farhat et al. ’93 [60] Philips ’91 [59] Testability Analysis Seth et al. ’89 [60] Fang et al. ’95 [61] This Work ’06 Figure2.1.RelatedworksonFDPanalysis bounds,whichmayleadtounacceptabletestlengths.Also,c omputingcomplexityofthisalgorithmis O(n 2).Lowerboundsoffaultdetectionprobabilitywerealsoderi vedfromcontrollabilityandobservabilitymeasures[30].Thismethoddonotaccountforthecom ponentoffaultdetectionprobabilitydue tomultiplepathsensitizations.Theabovementionedmetho dsaresatisfactoryonlyforfaultsthathave singlesensitizingpathforfaultpropagationtoanoutputa ndhencewillnotgivegoodresultsforhighly re-convergentfan-outcircuitsthathavemultiplepathsen sitizations. PREDICT[11]isaprobabilisticgraphicalmethodtoestimat ecircuittestabilitybycomputingnode controllabilityandobservabilityusingShannonsexpansi on.Thetimecomplexityofexactanalysis bythismethodisexponentialinthecircuitsize.PROTEST[1 2],whichisatoolforprobabilistic testabilityanalysis,calculatesfaultdetectionprobabi litiesandoptimuminputsignalprobabilitiesfor randomtestpattern,bymodelingthesignalow.Faultdetec tionprobabilities,whicharecomputed fromsignalprobabilityvalues,areunderestimatedduetot hefactthatthealgorithmdoesnottakeinto accountmultiplepathsensitization.Anothermethod(CACO P)[29]isacompromisebetweenthefull rangecuttingalgorithmandthelineartimetestabilityana lysis,likethecontrollabilityandobservability program.Thisisalsoanapproximatescheme. 20

PAGE 31

[32]usessupergatedecompositiontocomputeexactfaultde tectionprobabilitiesoflargecircuits. PLATO(ProbabilisticLogicAnalyzingTool)[13]isatoolto computeexactfaultdetectionprobabilities usingreducedorderedbinarydecisiondiagrams(ROBDD)s.S pacerequirementforconstructingthe ROBDDoflargecircuitsisverylarge.Shannondecompositio nanddivide-and-conquerstrategiesare usedtoreducelargecircuitsintosmallsub-circuits.Comp utingcomplexityofthesedecomposition methodsarequitehigh.AnotherBDDbasedalgorithm[33]com putesexactrandompatterndetection probabilities.However,thisisnotscalableforlargecirc uits. State-of-the-artalgorithmforcomputationof exact faultdetectionprobabilities,basedonBinary DecisionDiagrams[13]donotscalewell,intermsoftimeand spacerequirements,withcircuitsize. Theyusuallyrequirevariousheuristic-basedapproximati onsofthepuremodel. MostoftheanalysisofFDPwasperformedin80's(asproposed bySeth etal. [11],Wunderlich[12], etc.)Inalaterdevelopment(1988),Bayesiannetworkwasin troducedforprobabilisticreasoningand beliefpropagationandithasbeenappliedinarticialinte lligenceandimageanalysis.Recently,in[34, 35],switchingprobabilitiesandsignalarrivalsinVLSIci rcuitshavebeenmodeledusingaBayesian Network,howevertheiruseinestimationoferrordetection probabilitiesindigitallogicisnew.Inthis dissertation,weprovideafresherlookintoanoldproblemb yadoptinganovelandefcientscheme. However,thisprobabilisticFDPanalysistechniquecanbea ppliedtomeasureSingleEventTransient (SET)sensitivityandsofterrorsusceptibility[14]oflog iccircuitswhicharefuturenanotechnology challenges.2.2TransientFaults:Single-Event-Upsets High-energyneutronspresentincosmicradiationsandalph aparticlesfrompackagingmaterials giverisetosingleeventupsets(SEUs)resultinginsofterr orsinlogiccircuits.Whenparticleshita semiconductormaterial,electron-holepairsaregenerate d,whichmaybecollectedbyaP-Njunction, resultinginashortcurrentpulsethatcauseslogicupsetor SingleEventUpset(SEU)inthesignalvalue. AnSEUmayoccurinaninternalnodeofacombinationalcircui tandpropagatetoanoutputlatch.When alatchcapturestheSEU,itmaycauseabitip,whichcanalte rthestateofthesystemresultinginasoft error.Incurrenttechnology,softerrorsareofseriouscon cerninmemories,whereasinlogiccircuits 21

PAGE 32

I 1 I 2 I 3 . . I N d 1 d 2 d m d n Particle Hit at node j (R H ) Q L In puts Combinational Logic Latches . P (SEU j ) SEU Width . . . j i P(T j_i ) SEU Propagated to the i th output P(Q L |T j_i ) SEU causing A bit-flip at latch output d n : delay associated with the n th gate Figure2.2.SEUpropagation s ofterrorrateiscomparativelylowduetological,electricalandtemporalmaskingeffects.However,as processtechnologyscalesbelow100nanometersandoperatingfrequenciesincrease,theabovemasking barriersdiminishduetolowsupplyvoltages,shrinkingdevicegeometryandsmallnoisemargin.This willresultinunacceptablesofterrorfailureratesinlogiccircuitsevenformainstreamapplications[14]. Therehavebeenseveralanalyticalandexperimentalworksonsofterrorrateanalysisanderror hardeningtechniques,atdifferentlevelsofabstraction,suchasarchitecturallevel,RTL,logicandcircuit level.UseofLaserFaultInjection(LFI)techniqueforsofterrortolerantdesignvericationonFPGAs andISICshasbeenproposedinworksdonebyWiley etal. [36]andFalquez etal. [37].Research donebyWiley etal. [36]demonstratethatLFIcanrepeatedlyinducesingleeventupsetsandlatchups inFPGAdesignswithoutcausingpermanentdamagetothechip. Mitra etal. [38]givesacomparisonofseveralsystemlevelsofterrormitigationschemesinterms ofsofterrorratereduction,powerconsumption,areaoverhead,performanceandcost.Themethodpresentedinthisdissertationmodelslogicalandtemporalmaskingatgatelevelanditcanbeintegratedwith theelectricalandlatchingwindowmaskingmodelsexistingincurrentliteraturetogetacomprehensive modeltocapturetheeffectofsupplyvoltage,thresholdvoltage,setup,holdtime,etc. Softerrorsusceptibilityofanode j withrespecttoalatch L SES j Q L i sthesofterrorrateatthe latchoutput Q L ,contributedbynode j .ThepropagationofanSEUgeneratedduetoaparticlehitatan internalnode j toanoutput i whichcausesabitipattheoutputofalatch L isdepictedinFig.2.2. 22

PAGE 33

SETModeling P(SEU j ) R H P(Q L | T j_i ) P(T j_i ) Simulative P robabilistic Other Mohanram et al. 03 [14] Mohanram et al. 03 [14] Krishnaswamy et al. 03 [54] Samudrala et al. 04 [43] Karnik et al. 04 [15] Zhang et al. 04 [17] Seifert et al. 02 [18] Karnik et al. 04 [15] Degalahal et al. 04 [16] Zhao et al. 04 [4] Sefert et al. 02 [18] Hazucha et al. 00 [45] Zhang et al. 04 [17] Karnik et al. 04 [15] Seifert et al. 02 [18] Zhao et al. 04 [9] This Work 06 Omana et al. 03 [44] Karnik et al. 04 [15] Degalahal et al. 04 [16] Zhang et al. 04 [17] Seifert et al. 02 [18] Mohanram et al. 03 [14] Mohanram et al. 03 [14] Alexandrescu et al. 02 [39] Hazucha et al. 00 [45] Violante 03 [41] Shivakumar et al. 02 [40] Zhang et al. 04 [17] Dhilon et al. 05 [9] Zhao et al. 04 [4] Hazucha et al. 00 [45] Hazucha et al. 04 [19] Dhillon et al. 00 [9] Dhllon et al. 00 [9] Figure2.3.RecentworksonSETmodeling W emodeltheSEUpropagationasfollows:Let T j i b eaBooleanvariablewhichtakeslogicvalue1 ifanSEUatanode j causesanerroratanoutputnode i .Then P(T j i)(measuredastheprobabilityof T j i b eingequalto1)istheconditionalprobabilityofoccurrenceofanerroratoutputnode i givenan SEUatnode j .Let P(SEU j)betheprobabilitythataparticlehitatnode j generatesanSEUofsufcient strengthandlet P(Q LjT j i)betheprobabilitythatanerroratoutputnode i causesanerroneoussignalat latchoutput Q L .Mathematically SES j Q L i sexpressedbyEq.5.1. SES j Q L=R H P(SEU j)P(T j i)P(Q LjT j i)(2.1) where R H istheparticlehitrateonachipwhichisfairlyuniforminspaceandtime. P(SEU j)depends on V dd V th andalsoontemperature. P(Q LjT j i)isafunctionoflatchcharacteristicsandtheswitching frequency. Figure2.3.givesalistoftherecentworksdoneonsofterroranalysis.Anestimationmethod forsofterrorfailureratesresultingfromSingleEventUpsetsproposedin[14]computessofterror susceptibilityofanodebasedontherateatwhichaSingleEventUpset(SEU)occursatthenode(R SEU),theprobabilitythatitissensitizedtoanoutput(P sensitized)andtheprobabilitythatitiscaptured byalatch(P latched).In[39],Alexandrescu etal. presentaSETfaultsimulationtechniquetoevaluate 23

PAGE 34

thesofterrorprobabilitycausedbytransientpulses.Amod elthatcapturestheeffectsoftechnology trendsintheSoftErrorfailureRates(SER),consideringdi fferenttypesofmaskingphenomenasuchas electricalmasking,latchingwindowmaskingandlogicalma sking,ispresentedin[40].Anothermodel toanalyzeSingleEventUpsetswith zero-delay logicsimulation,whichisaccurateandfasterthantiming simulators,ispresentedin[41].Asdiscussedintheprevio ussection,thismodelusesacircuitexpansion algorithmtoincorporategatedelaysandafaultlistgenera tionalgorithmtogetareducedlistofSETs. Alloftheabovemethodsusesimulationtechniqueswhichare highlyinputpatterndependant. Zhao etal. proposesamethodologytoevaluatesoftnessorvulnerabili tyofnodesinacircuitdueto compoundnoiseeffectsbyconsideringtheeffectsofelectr ical,logicalandtimingmasking[4].They useaprobabilisticapproachtoestimatetheeffectoflogic almasking.Howeverthismethodcannot beusedforcircuitswithre-convergentpathsandwillnotbe abletohandlelargercircuits.Also,this methoddoesn'tcapturetheeffectofgatedelays. Aselectivetriplemodularredundancytechnique(STMR)for achievingradiationtoleranceinFPGA designsisdiscussedin[43].Withthemethodpresentedinth eirwork,theyachieveareductioninarea overheadby2/3rdofthearearequiredbyordinaryTripleMod ularRedundancy(TMR).Anothercosteffectiveradiationhardeningtechniquebasedongate(tra nsistor)sizingtechniquehasbeendiscussed in[42]. Karnik etal. suggeststhatsofterrorrateshouldbeconsideredasadesig nparameteralongwith power,performanceandareaduetoitsincreasingimpactonc ircuitsandsystemswiththescaling ofprocesstechnology[15].Theyproposeamethodologytoqu antifytheimpactofsupplyvoltage, transistorsize,circuittopology,dopingaswellascircui tstructureontheSoftErrorRate(SER)ofa chip.effectofthresholdvoltageonSERofmemoriesandcomb inationallogichasbeenstudiedin[16]. Zhang etal. in[17]proposedacompositesofterrorrateanalysismethod (SERA)tocapturetheeffect ofsupplyvoltage,clockperiod,latchingwindow,logicdep th,circuittopologyandinputvectoronsoft errorrate.However,theyresorttologicandcircuitlevels imulationtocapturetheseprobabilities.This methodusesaconditionalprobabilitybasedparameterextr actiontechniqueobtainedfromdeviceand logicsimulation.Intheirwork,combinationalcircuitsar eassumedtohaveunbalancedre-convergent paths.However,otherdesignconsiderationsusuallydrive optimalcircuitdesigntohavebalancedpaths 24

PAGE 35

byaddingbufferswhereverre-convergenceisnecessary.Fo rcircuitswithbalancedpaths,softerror analysisbasedonapproximationsgivenin[17]mightnotbet hebestchoice. Dhillon etal. [9]hasproposedaMATLABbasedmodelforsofterrortoleranc eanalysisandoptimizationofnano-metercircuitsbystudyingtheglitchge nerationandpropagationcharacteristicsof gatesinlogicblocks.Anothermathematicalmodeltoestima tethepossiblepropagationofglitchesdue totransientfaultshasbeenpresentedin[44]. Seifert etal. discussestheimportanceoflatchdesignonthesofterrorra te(SER)ofcorelogic[18]. ItalsoanalysestheimpacttechnologyscalingonSERatdevi ce,circuitandchiplevel.Relationbetween softerrorrateandtechnologyfeaturesizebasedondevicel evelsimulationhasbeenstudiedin[45]. Sinceallthestate-of-the-arttechniqueshaveresortedto simulationforlogicalanddevicelevel effects(knowntobeexpensiveandpattern-sensitiveespec iallyforlowprobabilityevents),wefeltthe needtoexploretheinputdata-drivenuncertaintyinacompr ehensivemannerthroughaprobabilistic modeltocapturetheeffectofprimaryinputs,theeffectofg atedelayandtheeffectofSEUdurationon thelogicalmasking.Thereisfuturescopeforthesekindsof modelstobefusedwithothermodels[15, 16,17,18,45]forcapturingdeviceeffectssuchaselectric almasking,thresholdvoltageandsupply voltage.Ourmodelcanbeusedtoidentifygatesinalogiccir cuitthatarehighlysensitivetosingleevent-upsetsandclassifygateswithrespecttoSEUsensiti vity.Applicationofselectiveredundancy techniquestothesegateswillenhancecircuitreliability .Astudyofreliability/redundancytrade-offcan thenbedoneusingourTALI-SESmodel.2.3DynamicErrors AsmentionedinSection1.1.,thistypeoferrorswillbepredominantinnano-domaincomputing. Theyarecausedbytemporarydevicemalfunctionduetoreduc edfeaturesizeandoperationnearthermal limits.Henceoneofthemajorchallengeswillbereliableco mputationwithunreliabledevices.Thisissuehasbeenaddressedinmanypreviousresearches.Fig.2.4 .isalistofrelatedworksdoneondynamic errormodelingandreliabilityestimation.Indeedthebasi cmethodsofTripleModularRedundancy (TMR)andNANDMultiplexingwereproposedinthe1950s[5,46 ].Theseideashavebeenadapted fornano-computingarchitecturessuchasN-tuppleModular Redundancy[47],orNANDMultiplexing 25

PAGE 36

Dynamic Error Modeling Reliability Enhancement Reliability Estimation Error Bounds Exact Models Approximate Models Von Neumann ’54 [5] Bahar et al. ’03 [120 Krishnaswamyet al. ’05 [54] Norman et al. ’04 [50] Gao et al. ’05 [51] Han et al. ’05 [56] Han et al. ’05 [56] Gao et al. ’05 [51] Norman et al. ’04 [50] Bhaduri et al. ’04 [55] Bhaduri et al. ’04 [55] This Work ’06 This Work ’06 Pippenger ’88 [6] Figure2.4.Backgroundworksondynamicerrors andreconguration[48].Arecentcomparativestudyofthes emethods[49],indicatesthata1000-fold redundancywouldberequiredforadeviceerror(orfailure) rateof0.01 1 .Probabilisticmodelchecking forreliability-redundancytrade-offwasintroducedin[5 0].Thesetechniquesofprovidingerrorand faulttolerancemethods,rarelyutilizedthetopologyofac ircuitanddependencyoferrorsonthecircuit topology. Theanalysistechniquebasedonelementarybifurcationthe oryproposedin[51]forcomputingthe exacterrorthresholdvaluesfornoisygatesisapplicablef orbuildingfuturefault-tolerantnano-domain networks.H.J.Gao et.al in[52]givesacomparisonofdifferentredundancyschemess uchasVon Neumann'smultiplexinglogic,N-tuplemodularredundancy andinterwovenredundancyintermsof degreeofredundancyandreliabilitybyusingmarkovchainm odelsandbifurcationanalysis.Reliabilityofnand-multiplexednanoelectronicsystemsischaract erizedusingmarkovchainsandprobabilistic computationschemesin[53]. Inarecentwork,S.Krishnaswamy etal. [54]providedaprobabilistictransfermatrix-based(PTM) formalismformodelingdynamicerrorsatlogiclevel.Since computationalcomplexityofthismethod 1 Notethatthisdoes not mean1outof100deviceswillfail,itindicatesthedevicesw illgenerateerroneousoutput1outof 100times. 26

PAGE 37

DAGS (Causal Models) Undirected graphs (Markov Fields) Chordal Graphs (Decomposable Models) Probabilistic Dependencies Figure2.5.Modelingcapacitiesfordifferentprobabilist icdependencymodels isveryhigh,thismethodisusefulreliabilityestimationo fverysmallcircuits.Moreover,thetimeand spacecomplexityassociatedwiththismodelingschemevari eswithchangesingateerrorprobabilities. In[55],Bhaduri etal. proposeaMATLABbasedtooltoevaluateandcomparetherelia bilityof variousfaulttolerantdesignsemployingTMRredundancy,c ascadedTMR,etc.Inthisprobabilistic modelingtool,booleanlogicisreplacedbyenergydistribu tionfunctionbasedonGibbsdistribution. Probabilitiesofenergylevelsatgateinputsandoutputsar epropagatedtothecircuitoutputthrough beliefpropagation.Thismodelisusedforanalysisofrelia bility-redundancytradeoffsforalternative defecttolerantarchitecturesinthepresenceofsignalnoi se.However,resultsforknownbenchmark circuitsarenotgiveninthispaper. Han etal. in[56],discussanapproximatemethodbasedonProbabilist icGateModel(PGM)for reliabilityevaluation,wheregateinput/outputsignalsa reassumedtobestatisticallyindependentwhich isnottruewithcircuitshavingre-convergentfanouts.How everthismodelhaslesstimeandspace complexity,ascomparedtoPTMmethodandthereliabilityes timatesobtainedfromthePGMbased modelwasfoundtobeveryclosetothePTMestimates. BayesianNetworksandMarkovRandomFields: Beliefpropagationfornano-computingwasintroducedbyBahar etal. [20]usingMarkovRandomFields(MRF),butdemonstratedonl yforsimple 27

PAGE 38

circuits.Bayesiannetworks(BN)andMarkovRandomFields( MRF)aretwoprobabilisticmodeling tools.MRFsmodeldependenciesusing undirected graphstructure,whichmightbesuitablefornoncausallogicalmodelssuchasinmolecularcomputing.Bayes iannetworksmodeldependenciesusing directed graphicalstructures,whichareminimalforcausalmodels. Traditionalcomputingisandfuture computing,basedonRTDs,CNTs,orclocked-QCA's,willbeca usal,i.e.thereisowofinformation frominputtooutput.MRFsare,however,inefcientformode lingsuchcausalcomputing.Specically, twoconceptuallyimportantdependenciesexhibitedbycaus almodels,i.e.inducedandnon-transitive dependencies,arenotcapturedbyMRF's[26].Figure2.5.(t akenfrom[26]chapter3)showsthe limitationsofdifferentprobabilisticmodels;all causal dependenciescannotberepresentedbyMarkov networks.AnotherreasonforourchoiceofBayesiannetwork basedmodels,isthatalthoughtheMRF computingmodelhastheabilitytoincorporatedynamic(sig nal)errors,theerrorsareonlyimplicitly capturedinthechoiceofthepolynomialcoefcientsofthee nergypotentialexpressions.Forcertain computingelements,suchasthosebasedonmolecules,these errorsinthesecoefcientsmightbedirectlyrelatedtophysicalphenomenon.However,forothers ,suchasRTD,QCA,ornano-CMOS,the coefcientswillbecomplexfunctionsoftheunderlyingerr orcauses,makingithardertospecifythem directly.Forinstance,uniformerrorsacrossgateswillno ttranslateintouniformerrorsoverthecoefcients. Themodelproposedinthisdissertationistherststeptowa rdsunderstandingtheerrorsspecictoa circuit.WeknowthaterrorprobabilityinferenceisNP-har d,however,usingconditionalindependence andsmartapproximation,reasonableestimatescanbefound .Inthiswork,wepresentanexactalgorithm forerrorprobabilitycomputationofsmallcircuitsandana pproximatealgorithmforthemid-sized ISCASbenchmarks.Notethatasimulativeapproachisnotsel ectedasitisnotonlydrivenbyinput patterndependencebutalsoseparatesimulationandinputt racegenerationschemeshavetobeadopted forbiasedinputs. InthefollowingchapterwediscussthefundamentalsofBaye siannetworks,thegraphicalprobabilisticmodelingtoolweusedformodelingstuck-at-faul ts,single-event-upsetsanddynamicerrors. Wealsodiscussthecomputationalmechanismsusedofefcie ntupdatingofsignalprobabilitiesina Bayesiannetwork. 28

PAGE 39

CHAPTER3 MODELINGBASEDONBAYESIANNETWORKS ABayesiannetwork[26]isaDirectedAcyclicGraph(DAG)inw hichthenodesofthenetwork representrandomvariablesandasetofdirectedlinksconne ctpairsofnodes.Thelinksrepresentcausal dependenciesamongthevariables.Eachnodehasacondition alprobabilitytable(CPT)excepttheroot nodes.Eachrootnodehasapriorprobabilitytable.TheCPTq uantiestheeffecttheparentshave onthenode.Bayesiannetworkscomputethejointprobabilit ydistributionoverallthevariablesinthe network,basedontheconditionalprobabilitiesandtheobs ervedevidenceaboutasetofnodes. Fig.3.1.illustratesasmallBayesiannetwork.Theexactjo intprobabilitydistributionoverthe variablesinthisnetworkisgivenbyEq.3.1. P(x 6;x 5;x 4;x 3;x 2;x 1) =P(x 6jx 5;x 4;x 3;x 2;x 1)P(x 5jx 4;x 3;x 2;x 1)P(x 4jx 3;x 2;x 1)P(x 3)P(x 2)P(x 1) :(3.1) X 1 X 2 X 3 X 4 X 5 X 6 Figure3.1.AsmallBayesiannetwork 29

PAGE 40

InthisBN,therandomvariable, X 6 isindependentof X 1 X 2 and X 3 giventhestatesofitsparentnodes, X 4 and X 5 .This conditionalindependence canbeexpressedbyEq.3.2. P(x 6jx 5;x 4;x 3;x 2;x 1) =P(x 6jx 5;x 4)(3.2) Mathematically,thisisdenotedas I(X 6; fX 4;X 5g ; fX 1;X 2;X 3g ).Ingeneral,inaBayesiannetwork,given theparentsofanode n n anditsdescendentsareindependentofallothernodesinthe network.Let U bethesetofallrandomvariablesinanetwork.Usingthecond itionalindependenciesinEq.3.2,wecan arriveattheminimalfactoredrepresentationshowninEq.3 .3. P(x 6;x 5;x 4;x 3;x 2;x 1) =P(x 6jx 5;x 4)P(x 5jx 3;x 2)P(x 4jx 2;x 1)P(x 3)P(x 2)P(x 1) :(3.3) Ingeneral,if x i denotessomevalueofthevariable X i and pa(x i)denotessomesetofvaluesfor X i 'sparents,theminimalfactoredrepresentationofexactjo intprobabilitydistributionover m random variablescanbeexpressedasinEq.3.4. P(X) =m k=1 P(x kjpa(x k))(3.4) InFig.3.1.,itcanbeseenthatnodes x 4 and x 5 aredependentsincetheyhaveacommonparent. EventhoughthisdependencyisnotshownintheinitialBayes iannetgraphstructure,duringBayesian inferenceprocessthesedependenciesaretakencareofbyap rocessknownasmoralizationwhereeach pairofunconnectednodeshavingacommonchildnodeareconn ectedbyanundirectedgraph,making everyparentchildsetacompletesubgraph.Weexplainthisa spectofBayesianinferenceindetailunder Section3.1.3.1BayesianInference WeexplorethreemethodforcomputingprobabilitiesinBaye siannetworks.Oneisanexactscheme thatuseslocalmessagepassing,butplaceshighdemandsonm emoryforcircuitswithlotsofloopsinthe 30

PAGE 41

underlyingundirectedgraphstructure;wecallthistheclu sterbasedinferencescheme.Thesecondone isanapproximatealgorithmbasedonimportancesampling.A nd,thethirdoneisalsoanapproximate algorithmbasedonsamplingbutexploitingloopy,localmes sagepassing.Theseapproximateschemes canbeproventoconvergetothecorrectprobabilityestimat es[28,57]andthesearescalableany-time algorithmswithexcellentaccuracy-timetrade-off.Anela boratediscussiononthisisgivenin[58]. 3.2ClusterBasedInference Wedemonstratethisinferenceschemeusingasmallexample, thec17benchmarkcircuit(See Fig3.2.).ThecombinationalcircuitisshowninFig.3.2.(a )anditscorrespondingBayesiannetwork representationisshowninFig3.2.(b). Therststepoftheexactinferenceprocessistocreateanun directedgraphstructurecalledthe moralgraph (denotedby D m )giventheBayesiannetworkDAGstructure(denotedhereby D ).The moralgraphrepresentstheMarkovstructureoftheunderlyi ngjointfunction[59].Thedependencies thatarepreservedintheoriginalDAGarealsopreservedint hemoralgraph[59].FromaDAG,which hasthestructureofaBayesiannetwork,amoralgraphisobta inedbyaddingundirectededgesbetween theparentsofacommonchildnodeanddroppingthedirection softhelinks.Fig.3.2(a)showsthe undirectedmoralgraphandthedashededgesareaddedatthis stage.Thisstepensuresthatevery parentchildsetisacompletesubgraph.Moralgraphisundir ectedandduetotheaddedlinks,some oftheindependenciesdisplayedinDAGwillnotbegraphical lyvisibleinmoralgraph.Someofthe independenciesthatarelostinthetransformationcontrib utestotheincreasedcomputationalefciency butdoesnotaffecttheaccuracy[59].Theindependenciesth ataregraphicallyseeninthemoralgraph areusedininferenceprocesstoensurelocalmessagepassin g. Themoralgraphissaidtobetriangulatedifitischordal.Th eundirectedgraphGiscalledchordal ortriangulatedifeveryoneofitscyclesoflengthgreatert hanorequalto4possessesachord[59]that isweaddadditionallinkstothemoralgraph,sothatcyclesl ongerthan3nodesarebrokenintocycles ofthreenodes.Thechordalgraphderivedfromthemoralgrap hisshowninFig.3.2(b)Herewecan seethatanadditionallinkbetweennodes X 10 and X 11 hasbeenaddedsothatthecycleconsistingof nodes X 3 X 10 X 16 and X 11 isbrokendowntotwocycleseachhavingnodes3.Oncethechor dalgraph 31

PAGE 42

10 11 19 16 23 22 1 367 2 X 1 X 2 X 3 X 6 X 7 X 11 X 10 X 16 X 19 X 23 X 22 Figure3.2.(a)Thebenchmarkcircuitc17(b)Bayesiannetwo rkrepresentationofc17 X 1 X 2 X 3 X 6 X 7 X 11 X 10 X 16 X 19 X 23 X 22 X 1 X 2 X 3 X 6 X 7 X 11 X 10 X 16 X 19 X 23 X 22 Figure3.3.(a)Moralgraph(b)chordalgraph 32

PAGE 43

C1=[(X 10 ,X 16 ,X 22 )] C2=[(X 16 ,X 19 ,X 23 )] C9=[(X 1 ,X 3 ,X 10 )] C7=[(X 7 X 11 ,X 19 )] C8=[(X 3 ,X 6 X 11 )] C3=[(X 10 X 11 X 16 )] C4=[(X 3 ,X 10 X 11 )] C6=[( X 11 X 16 ,X 19 )] C5=[(X 2 X 11 ,X 16 )] {X 3 X 10 } {X 10 X 16 } Figure3.4.Junctiontree isconstructed,thenextstepiscreationofjunctiontree.T hejunctiontreeisdenedasatreewithnodes representingcliques(collectionofcompletelyconnected nodes)ofthechoralgraphandbetweentwo cliquesinthetreeTthereisauniquepath.Junctiontreepos sessesapropertycalledrunningintersection thatensuresthatiftwocliquesshareacommonvariable,the variableshouldbepresentinallthecliques thatlieintheuniquepathbetweenthem.Fig3.2showsthejun ctiontree.Notethateverycliqueinthe moralgraphisanode(example C 1= [X 10;X 16;X 22])inthejunctiontree.Also,C7andC8havenode X 11 incommon,hence X 11 ispresentinallthecliquesnamelyC4,C3,andC6thatliebet weentheunique pathbetweenC7andC8.Thispropertyofjunctiontreeisutil izedforprobabilisticinferencesothat localoperationbetweenneighboringcliquesguaranteesgl obalprobabilisticconsistency. Chordalgraphsareessentialastheyguaranteetheexistenc eofatleastonejunctiontree.Hence chordalizationisanecessarystep.Therearemanyalgorith mstoobtainjunctiontreefromchordal graphandweuseatoolHUGIN[27]thatusesminimum-ll-inhe uristicstoobtainaminimalchordal andjunctiontreestructure. Sinceeverychildparentteamispresenttogetherinoneofth ecliques,weinitializethecliquejoint probabilitiesbytheoriginaljointprobabilityofachildp arentteam.Wethenuseamessagepassing schemetohaveconsistentprobabilities.Supposewehavetw oleafcliqueinthejunctiontree,sayin ourexampleinFig3.2,C8andC9.Boththecliquesareinitial izedbasedonthechildparentteam(C8 bynodes3,6and11andC9bynode1,3,10).Similarly,otherle afnodesC1,C2,C5andC7are 33

PAGE 44

initialized.TheinitialcliqueprobabilityofcliqueCiis termedas f Ci andisalsocalledpotentialofa clique. Letusnowconsidertwoneighboringcliquestounderstandth ekeyfeatureoftheBayesianupdating scheme.Lettwocliques Cl and Cm haveprobabilitypotentials f Cl and f Cm ,respectively.Let S bethe setofnodesthatseparatescliques A and B (Example: S= fX 10;X 16gbetweencliquesC1andC3in Fig.3.2)Thetwoneighboringcliqueshavetoagreeonprobab ilitiesonthenodeset S whichistheir separator.Toachievethiswerstcomputethemarginalprob abilityof S fromprobabilitypotentialof clique Cl andthenusethattoscaletheprobabilitypotentialof Cm .Thetransmissionofthisscaling factor,whichneededinupdating,isreferredtoasmessagep assing.Newevidenceisabsorbedintothe networkbypassingsuchlocalmessages.Thepatternoftheme ssageissuchthattheprocessismultithreadableandpartiallyparallelizable.Becausethejunc tiontreehasnocycles,messagesalongeach branchcanbetreatedindependentlyoftheothers. Notethatsincejunctiontreehasnocycleanditisalsonotdi rectional,wecanpropagateevidence fromanynodeatanycliqueandthepropagatetheevidenceina nydirection.Itisinsharpcontrastwith simulativeapproacheswhereowofinformationalwaysprop agatefrominputtotheoutputs.Thus,we wouldbeabletouseitforinputspacecharacterizationfora chievingzerooutputerrorduetoSEUs.We wouldinstantiateadesiredobservationinanoutputnode(s ayzeroerror)andbacktracktheinputsthat cancreatesuchasituation.Iftheinputtracehaslargedist ancefromthecharacterizedinputspace,we canconcludethatzeroerrorisreasonablyunlikely.Noteth atthisaspectofprobabilisticbacktrackingis alreadyusedinmedicaldiagnosisbutarenewinthecontexto finputspacemodelingforerrorsinVLSI circuits. Complexity Thecomputationalcomplexityoftheexactmethodisexponen tialintermsofnumberof variablesinthelargestcliques.Spacecomplexityoftheex actinferenceis n:2jCmaxj[60],wherenisthe numberofnodesintheBayesianNetwork,andjCmaxjisthenumberofvariablesinthelargestclique. Thetimecomplexityisgivenby p:2jCmaxj[60]wherepisthenumberofcliques. Duetotheextremelyhighspacecomplexity,theexactinfere nceschemecanhandleonlysmall circuits.Henceweexploretwoapproximateinferenceschem esbasedonstochasticsampling,namely 34

PAGE 45

ProbabilisticLogicSampling(PLS)andEvidencePre-Propa gatedImportanceSampling(EPIS).These schemesareproventobeany-timescalablesamplingtechniq ueswithexcellentaccuracy-timetrade-off. 3.3StochasticInference Stochasticsamplingalgorithmsaretheprominentsubclass ofapproximateinferenceschemes.They generaterandomlyselectedinstantiationsofthenetworkv ariablesintopologicalorderaccordingto probabilitiesinthemodel,andthencalculatefrequencies ofinstantiationsofinteresttoobtaintheapproximateinference.Thesealgorithmsuseasamplingtechn iqueknownas ImportanceSampling Giventhestatesofcertainnodesinthenetwork(evidence)a ndtheconditionalprobabilitytableof eachnode,thisalgorithmgeneratessampleinstantiations ofthenetwork(i.e.,denesthestateofeach nodeinthenetwork).Fromthesesampleinstantiations,itc anndapproximateprobabilitiesofeach stateforeachnodeinthenetwork.Thissamplingisdoneacco rdingtoanoptimumimportancefunction g(x),thatcloselyresemblestheunderlyingjointprobabilityd istribution P(x)forwhichthevarianceof ˆ I N is zero where ˆ I N=1 N Ni=1 P(s i) g(s i)forNsamples s i8i=1; ;N obtainedbysamplingtheimportance function g(x)[58].ItisproventhatinaBayesiannetwork,theproductoft heconditionalprobability functionsatallnodesformtheoptimalimportancefunction [57,58].Let X= fX 1 X 2; ;X mgbethe setofvariablesinaBayesiannetwork, Pa(X k)betheparentsof X k ,and E betheevidenceset.Then,the optimalimportancefunctionisgivenby P(XjE) =m k=1 P(x kjPa(x k;E)(3.5) 3.3.1ProbabilisticLogicSampling(PLS) Probabilisticlogicsamplingistheearliestandthesimple ststochasticsamplingalgorithmsproposed forBayesianNetworks[28].Probabilitiesareinferredbya completesetofsamplesorinstantiations thataregeneratedforeachnodeinthenetworkaccordingtol ocalconditionalprobabilitiesstoredat eachnode.Theadvantagesofthisinferencearethat:(1)its complexityscaleslinearlywithnetwork size,(2)itisanany-timealgorithm,providingadequateac curacy-timetrade-off,and(3)thesamplesare 35

PAGE 46

notbasedoninputsandtheapproachisinputpatterninsensi tive.Thesalientaspectsofthealgorithm areasfollows. 1.Eachsamplingiterationstochasticallyinstantiatesal lthenodes,guidedbythelinkstructure,to createanetworkinstantiation. 2.Ateachnode, x k ,generatearandomsampleofitsstatebasedonthecondition alprobability, P(x kjPa(x k)),where Pa(x k)representthestatesoftheparentnodes.Thisisthelocal,i mportance samplingfunction. 3.Theprobabilityofallthequerynodesareestimatedbythe relativefrequenciesofthestatesinthe stochasticsamplingtrace. 4.Ifstatesofsomeofthenodesareknown(evidence),suchas indiagnosticbacktracking,network instantiationsthatareincompatiblewiththeevidenceset aredisregarded. 5.Repeatsteps1,2,3and4,untiltheprobabilitiesconverg e. Inpredictiveinference,logicsamplinggeneratesprecise valuesforallthequerynodesbasedon theirfrequencyofoccurrencebutwithdiagnosticreasonin g,thisschemefailstoprovideaccurateestimatesbecauseoflargevariancebetweentheoptimalimporta ncefunctionandtheactualunderlyingjoint probabilitydistributionfunction.Thedisadvantageofth isapproachisthatincaseofunlikelyevidence theydisregardmostsamplesandthustheperformanceofthel ogicsamplingapproachdeteriorates. 3.3.2EvidencePre-PropagatedImportanceSampling Toovercometheproblemofbacktrackinginthelightofouron -goingBNbasedinputcharacterizationproblem,weexploreanotherstochasticalgorithmthat usesLoopyBeliefPropagationtoupdatethe optimalimportancefunctionsuchthatthevarianceof ˆ I N isextremelysmall.Thismethodscaleswell withcircuitsizeandisproventoconvergetocorrectestima tes.Let X= fX 1 X 2; ;X mgbetheset ofvariablesinaBayesiannetwork, Pa(X k)betheparentsof X k ,and E betheevidenceset.Then,the 36

PAGE 47

optimalimportancefunctiongivenbyEq.3.5.canbeapproxi matedas P(XjE) =m k=1 a(Pa(X k))P(x kjPa(X k))l(X k)(3.6) Yuan etal. hasprovedthatinpoly-treeswecancalculatethemdirectly .TheEPISalgorithmuses loopybeliefpropagation[61,62]topre-computetheapprox imateimportancefunctionandthenuse e -cutoffheuristicstoadjustsmallprobabilitiesintheimp ortancefunction. LoopyBeliefPropagation: Thebeliefpropagationalgorithmcomputestheposteriorbe liefofeach node X inaBayesiannetworkas BEL(x) =P(X=xjE)where E denotesthesetofevidence.Thisis donebycombiningmessagesfromitschildrenandmessagesfr omitsparents. E= fE+ ;E gwhere E+isthesetofevidenceconnectedtotheparentsof X and Eisthesetofevidenceconnectedtoitschild nodes.Anynode X d-separates E intotwosubsets E+and E,orinotherwords,giventhestateof X thetwosubsets itfE+and Eareindependent.Themessagefrom E+isdenedas p(x) =P(xjE+ )(3.7) andthemessagefrom Eisdenedas l(x) =P(E jx)(3.8) Theposteriorbeliefofnode X iscomputedas BEL(X) =al(x)p(x);(3.9) where a isanormalizingconstant.Basedonthemessagesreceivedfr omitsneighboringnodes,each nodepassesnewmessagesbacktoitsparentandchildnodes.T hisway,messagesarepropagated throughoutthenetwork.In[61],Murphy etal. explainshoweachnodecomputesnew l and p messages basedonthereceivedmassages.Theloopybeliefpropagatio nalgorithmnormalizesboth l and p messagesaftereachiteration.Thisnormalizationdoesn't makeanydifferenceinthenalbeliefs,but avoidsnumericalunderow. 37

PAGE 48

Innetworkswithloops,gettingexact l messagesforallvariablesisanNP-hardproblem.Loopy beliefpropagationwillprovideuswithgoodapproximate l messageswhichcanbeusedtoobtaina goodapproximateimportancefunction. ThisstochasticsamplingstrategyworksbecauseinaBayesi anNetworktheproductoftheconditionalprobabilityfunctionsforallnodesistheoptimalim portancefunction.Becauseofthisoptimality, thedemandonsamplesislow.Wehavefoundthatjustthousand samplesaresufcienttoarriveatgood estimatesfortheISCAS'85benchmarkcircuits. InthenextchapterwediscussBayesiannetworkbasedmodeli ngofsinglestuck-at-faultswhichis usefulforestimationoffaultdetectionprobabilitiesinc ombinationalcircuits.WeusebothProbabilistic LogicSamplingandEvidencePre-propagatedImportanceSam plingforBayesianinference.Results fromourmodelshowsthatBN-basedmodelingiscomputationa llymoreefcientcomparedtothestateof-the-artBDDbasedtechnique. 38

PAGE 49

CHAPTER4 LIFE-DAG:ANACCURATEPROBABILISTICMODELFORERRORDETECT ION PROBABILITIES Permanentfaultssuchasstuck-at-faults,stuck-openfaul tandbridgingfaultscanoccurinalogic circuitduetofabricationerrorsoron-linedevicefailure s.Faultmodelingisimportantforrandompattern testabilityanalysis.Mostofthepermanentfaultscanbeco veredbysinglestuck-at-faulttests[10].In thischapter,wedescribethemodelingofstuck-at-faultsi ncombinationalcircuitsfortheestimationof faultdetectionprobabilitywhichisanimportantrandompa tterntestabilitymeasure. Wemodelsinglestuck-at-faults(errors)inlargecombinat ionalcircuitsusingaLogicInducedFault EncodedDirectAcyclicGraph(LIFE-DAG)graphstructure.W eprovethatsuchaDAGisaBayesian Network.BayesianNetworksaregraphicalprobabilisticmo delsrepresentingthejointprobabilityfunctionoverasetofrandomvariables.ABayesianNetworkisadi rectedacyclicgraphicalstructure(DAG), whosenodesdescriberandomvariablesandnodetonodearcsd enotedirectcausaldependencies.A directedlinkcapturesthedirectcauseandeffectrelation shipbetweentworandomvariables.InaLIFEDAG,eachnodedescribesthestateofalineandadirectededg equantiestheconditionalprobability ofthestateofanode given thestatesofitsparentsoritsdirectcauses.Figure1.5.(a )givestheconditionalprobabilitytableofanANDgate.Theattractivefeat ureofthisgraphicalrepresentationofthe jointprobabilitydistributionisthatnotonlydoesitmake conditionaldependencyrelationshipsamong thenodesexplicitbutitalsoservesasacomputationalmech anismforefcientprobabilisticupdating. ProbabilisticBayesianNetworkscanbeusednotonlytoinfe reffectsduetoknowncauses(predictive problem)butalsotoinferpossiblecausesforknowneffects (thebacktrackingordiagnosisproblem). ThediagnosticaspectsofBNsmakeitsuitableforfurtherus eintest-patterngenerators.Weusedtwo stochasticalgorithmsbasedonImportancesamplingtocomp utethefault/errordetectionprobability usingtheLIFE-DAGmodel.Animportancesamplingalgorithm generatessampleinstantiationsof 39

PAGE 50

the whole DAGnetwork,i.e.foralllinesinourcase.Thesesamplesare thenusedtoformthenalestimates.Ateachnode,thissamplingisdoneaccording toanimportancefunctionthatischosen tobeclosetotheactualjointprobabilityfunction.Speci cally,weexploretwostochasticinference strategies:ProbabilisticLogicSampling(PLS)[28]andEv idencePre-propagatedImportanceSampling (EPIS)algorithm[57],whicharediscussedinsection3.1.I tisworthpointingoutthatunlikesimulative approachesthatsampletheinputs,importancesamplingbas edproceduresgenerateinstantiationsforthe wholenetwork,notjustfortheinputs. Weadvancethestate-of-the-artinfaultanalysisintermso fspaceandtimerequirementsandinprovidingauniformmodel.Adetaileddiscussionabouttimeand spacecomplexityisgiveninsection3.1. 4.1Motivation Whenhigh-energyneutronspresentincosmicradiationsand alphaparticlesfromimpuritiesinpackagingmaterialshitasemiconductordevice,theygeneratee lectron-holepairsandcauseacurrentpulse ofveryshortduration,termedasSingleEventTransient(SE T).TheeffectoftheseSETsmaybepropagatedtoanoutputlatchandcauseabitipinlatch,resulti ngina softerror .Asthestoredcharge ineachlogicalnodedecreaseswithdecreaseddimensionsan ddecreasedvoltagesinnanometertechnology,evenweakradiationscancausedisturbanceinthesi gnals,whichresultsinincreasedsofterror failurerate. SETsensitivityofanodedependsontheprobabilitythatthe reisafunctionallysensitizedpathfrom thenodetoanoutputlatch(whichisthesameasthefaultdete ctionprobabilityofthenode),probability thatthegeneratedSETispropagatedtothelatchandtheprob abilitythatthelatchcapturesthetransitions arrivingatitsinput[14].Ifthelasttwoprobabilitiesare assumedtobeone,FDPofanodeisanaccurate measureoftheSETsensitivity.Innano-domaincircuits,th eseprobabilitiesareveryclosetoonedue toveryhighoperatingfrequencies,reducedvoltage,dimin ishingdimensionsandreducedpropagation delays.HenceFDPisatightupperboundofSETsensitivityin nano-domaincircuits.Designerscan selectivelyapplycorrectivemeasurestonodeswithhigher FDPthantheoneswithlowerFDPtoreduce softerrorfailurerates. 40

PAGE 51

23 f2 E 16 X Z Z 10 19 Z F1 F2 X 20 21 X 20 f1 21 X f1 f1 f1 23 Y f2 22 Y Y 23 22 Y Y 23 E 1 f1 E 2 f1 f2 1 E E 2 1 f1 f1 E f2 1 (a) (b) 10 19 20 21 22 23 16 16@1 19@1 (F2) (F1) 19@1 S 16@1 S F={16@1, 19@1} C Detection set 20 21 f1 1 22 f1 23 f1 Figure4.1.(a)Anillustrativefaultdetectionlogic(b)LIF E-DAGmodelof(a) 4.2LIFE-BN:Fault/ErrorModel Werstdiscussthebasicsoffault/errordetectionprobabilitiesforrandom-patterntestabilityanalysis.Notethattheprobabilisticmodelingdoesnotrequireanyassumptionintheinputpatternsand canbeextendedtobiasedtargetworkloadpatterns.Next,wesketchtheconceptofLogic-inducedFault-Encoded(LIFE)DirectedAcyclicGraphthatrepresentstheunderlyingfault/errormodel.Inthis probabilisticframework,weusepartialduplicationoftheoriginalcircuitforthefaultdetectionlogic. Onlythesensitizedpathsfromthefault/errorareduplicated.Asetofcomparatornodescomparesthe idealanderror-sensitizedlogic.Alogiconeinsuchcomparatoroutputsindicatestheoccurrenceof anerror.Fault/Errordetectionprobabilityoffaults/errorsthataffectmultipleoutputs,isthemaximum probabilityofthesensitizedcomparatoroutputs.Thesedetectionprobabilitiesdependonthecircuit structuraldependence,theinputsandthedependenciesamongsttheinputs.Inthiswork,however,we assumerandominputsforvalidationandexperimentationofourmodel. WefollowourdiscussionbyprovingthatsuchaLogic-induced-Fault-Encoded(LIFE)Directed AcyclicgraphisindeedtheminimalI-mapofthefaultmodelfortheand,hence,isaBayesian Network. 41

PAGE 52

Denition: TheLogicInducedFaultEncodedDirectAcyclicGraph(LIFEDAG)corresponding tothefaultdetectionmodelwhere C representsthecombinationalcircuitan F represents thefault/errorset,canbeconstructedasfollows(Weillus tratetheideaswiththehelpofasmallfault detectioncircuitandtheLIFE-DAGconstructedfromit,ass howninFigure4.1.): Themodelconsistsofthecombinationalcircuit C (fault/error-free)andasetoffault/errorsensitized logicblocks, S f8f2F ,whichpropagatetheeffectofthefaults.NodesofLIFE-DAGarerandomvariableswithtwopossibleval ues,0or1.Thereare7typesof nodes. 1.fZg:Primaryinputs. 2.fFg:Nodesrepresentinginjectedfaults. 3.fXg:Internalnodesinthefault-freecircuit C .Example: X 21 4.fX f8f2Fg:Correspondinginternalnodesintheduplicatedcircuit(d escendentsoffFg). Example: X f 1 21 5.fYg:Primaryoutputsoffault-freecircuit C .Example: Y 22 6.fY f8f2Fg:Correspondingfaultyoutputs.Example: Y f 1 22 7.fE f o8o2Outputset; 8f2Fg:Setofallthedetectionnodes(comparatoroutputs). Cardinalityofthissetisdeterminedbythenumberoftestab leoutputsforeachfaultinthe faultset.Example: E f 1 1EdgesofLIFE-DAGaredirectedanddenotedasorderedpair(u!v),denoting u causes v .The edgesetcanbeclassiedasfollows: 1.fZ!Xg:Edgesbetweenprimaryinputsandgatesin C ,whicharedirectlyfedbythe primaryinputs.Theseedgesarepartoftheoriginalcircuit C 2.fF!X fg:Edgesfromtheinducedfaultyinputstotheduplicategates inthefaultsensitized logicblock, S f 3.fZ!X fg:Edgesfromtheprimaryinputstotheduplicategatesin S f .Notethatthisedge indicatesthattheremustbeatleastoneparentof X f thatisinfFgorinfX fg. 42

PAGE 53

4.fX!Xg,fX f!X fg:Edgesbetweenthenodesrepresentingtheinternalsignals (Edges fromtheinputofagatetothecorrespondingoutput). 5.fX!X fg,fX!Y fg:Edgessuchthatchildnode v isin S f (infX fgorinfY fg)andthe parent u isintheoriginalcircuit C .Theseedgesarecalledbridgeedges.Notethatthisedge indicatesthattheremustbeatleastoneparentof v thatisinfX fgorinfFg.Example: X 21!Y f 2 23 6.fX!Yg,fX f!Y fg:Edgeswhereparentnode u isaninternalsignalfeedinganoutput gateandthechildnode v representsthecorrespondingoutputsignal.Example: X 21!Y 23 and X f 1 21!Y f 1 23 7.fY!E f ogandcorrespondingfY f!E f og:Edgeswhere u istheoutputnodein Y (or Y f ) and v isthedetectionnode E f o .Theseedgesarequantiedbyaxorgate.Example: Y 22!E f 1 1 and Y f 1 22!E f 1 1 Theorem: TheLIFE-DAGstructure,correspondingtothecombinationa lcircuit C andaFaultset F isaminimalI-mapoftheunderlyingdependencymodelandhen ceisaBayesiannetwork. Proof: MarkovBoundaryofavariable v inaprobabilisticframework,istheminimalsetofvariable s thatmakethevariable v conditionallyindependentofalltheremainingvariablesi nthenetwork. Letusordertherandomvariablesinthenodeset,ffZg ; fFg ; fXg ; fX fg ; fYg ; fY fg ; fE f oggsuch thatforeveryedge(u;v)inLIFE-DAG, u appearsbefore v Withrespecttothisordering,theMarkovboundaryofanynod e, v2 ffZg ; fFg ; fXg ; fX fg ; fYg ; fY fg ; fE f oggisgivenasfollows.If v representsaninputsignalline,thenitsMarkovboundaryis the nullset.If v isafaultnode f inthefaultset F ,thenalsoitsMarkovboundaryisthenullset(sincethis istreatedasaprimaryinputwithaparticularvalue).And,s incethelogicvalueofanoutputlineisjust dependentontheinputsofthecorrespondinggate(whether v isinfXg ; fX fg ; fYg ; fY fgorfE f og ),the Markovboundaryofavariablerepresentinganoutputlineco nsistsofjustthosethatrepresenttheinputs tothatgate.Thus,inLIFEstructuretheparentsofeachnode areitsMarkovboundaryelementshence theLIFEisaboundaryDAG.NotethatLIFEisaboundaryDAGbec auseofthecausalrelationship betweentheinputsandtheoutputsofagatethatisinducedby logic.Ithasbeenprovedin[26]thatifa 43

PAGE 54

graphstructureisaboundaryDAG D ofadependencymodel M ,then D isaminimal I mapof M .This theoremalongwithdenitionsofconditionaldependencies in[26](weomitthedetails)speciesthe structureoftheBayesiannetwork.ThusLIFEisaminimalI-m apandthusaBayesiannetwork(BN). QuanticationofLIFE-BN :LIFE-BNconsistsofnodesthatarerandomvariablesoftheu nderlying probabilisticmodelandedgesdenotedirectdependencies. Alltheedgesarequantiedwiththeconditionalprobabilitiesmakingupthejointprobabilityfunct ionoveralltherandomvariablesofLIFE-DAG. TheoveralljointprobabilityfunctionthatismodeledbyaB ayesiannetworkcanbeexpressedasthe productoftheconditionalprobabilities.Letussay, X0 = fX01;X02; ;X0mgarethenodesetinLIFE-DAG, thenwecansay P(X0 ) =m k=1 P(x0kjPa(X0k))(4.1) where Pa(X0k)isthesetofnodesthathasdirectededgesto X0k .Acompletespecicationoftheconditional probabilityofatwoinputANDgateoutputwillhave2 3 entries,with2statesforeachvariable.These conditionalprobabilityspecicationsaredeterminedbyt hegatetype.Byspecifyingtheappropriate conditionalprobabilityweensurethatthespatialdepende nciesamongsetsofnodes(notonlylimitedto justpair-wise)areeffectivelymodeled.4.3ExperimentalResults WedemonstratetheideasusingISCASbenchmarkcircuits.Th elogicalrelationshipbetweenthe inputsandtheoutputofagatedeterminestheconditionalpr obabilityofachildnode,giventhestates ofitsparents,intheLIFE-BN.Gateswithmorethantwoinput sarereducedtotwo-inputgatesby introducingadditionaldummynodes,withoutchangingthel ogicstructureandaccuracy. TheLIFE-BNbasedmodeliscapableofdetectingstuck-at-fa ultsaswellasthesofterrorscausedby single-event-transients.Inthiswork,wepresentresults forasetofstuck-at-faults( hard faults),which showstheefciencyofourmodel,intimeandspacerequireme nts,comparedtotheBDDbasedmodel. First,wedeterminethehardfaultsusing1024randominputv ectors[13].Faultsthatarenotdetected bythesevectorsarehardfaults.Wetabulatethehardfaults inTable4.1.forallthebenchmarkcircuits. Accuratedetectionprobabilitiesareneededforthesehard faults. 44

PAGE 55

Table4.1.ISCASbenchmarksandthenumberoffaults Circuits Faults Red.t HardFLT c432 524 4 5 c499 758 8 9 c880 942 0 32 c1355 1574 8 32 c1908 1879 9 114 c2670 2747 117 435 c3540 3428 137 218 c5315 5350 59 78 c6288 7744 34 34 c7552 7550 131 586 WeperformedexperimentsusingbothPLSandEPISinferences chemes.RecallthattheProbabilisticLogic-Sampling(PLS)schemeissimpleandtime-efcien t,butcannothandlediagnosticevidence efciently.However,theEvidencePre-propagatedImporta nceSamplingAlgorithm(EPIS)[57])can efcientlyhandlediagnosticevidence,butwithincreased timeforcaseswhenthereisevidence.We performedanin-houselogicsimulationwith500;000randomvectorstodetecttheexactfaultdetection probabilityofallthefaultsandusedtheseprobabilitiest ochecktheaccuracyofourmodel. TheresultsofdetectionprobabilitiescomputedbyProbabi listicLogicSampling(PLS)[28],and EvidencePre-propagatedImportanceSamplingEPIS[57]are showninTable4.2..Resultsarereported for1000samplesand3000samples.Inthistable, E istheaccuracyofmodelingintermsofaverage errorinFDPoverallthenon-redundanthardfaultscompared tosimulationresults. T(s)isthetotal elapsedtime, includingmemoryandI/Oaccess .Thistimeisobtainedbytheftimecommandinthe WINDOWSenvironmentonaPentium-42.0GHzPC..Wereportthe E and T(s)forEPISincolumns 2and3,respectively,andthe E and T(s)forPLSincolumns4and5,respectively. Wepartitionedthefaultsincircuitsc3540,c6288andc7552 intothreesubsetsanddeterminedthe detectionprobabilitiesineachsetbyparallelrunningoft hecircuitsforallthefaultsets. Weempiricallydemonstratethelineardependenceofestima tiontimeonnumberofsamplesin Figure4.2.(a)forbenchmarkc880.Wealsopresentempirica levidenceoflineardependenceofFDP estimationtimeonthenumberofnodesinthefaultdetection logicforc880benchmarkinFigure4.2.(b). Weconsidereddifferentsubsetsoffaultsinthefaultlisto fthecircuit.Figure4.3.showstheFDP,which canalsobeusedtocharacterizesingle-event-transient(S ET)errorsensitivity,forthenodesinc880. 45

PAGE 56

Table4.2.(a)Faultdetectionprobabilityestimationerro rsandtimefor1000samples(b)Faultdetection probabilityestimationerrorsandtimefor3000samples BayesianNetworks(1000samples) EPIS PLS E T(s) E T(s) c432 0.00012 1.09 0.00032 0.24 c499 0.00064 5.11 0.00131 1.00 c880 0.00042 16.70 0.00070 2.00 c1355 0.00059 27.28 0.00093 3.00 c1908 0.00071 61.39 0.00079 5.00 c2670 0.00003 145.19 0.00029 28.00 c3540 0.00034 807.77 0.00215 34.00 c5315 0.00024 186.88 0.00036 12.00 c6288 0.00000 1779.93 0.00929 65.00 c7552 0.00034 863.73 0.00069 37.24 BayesianNetworks(3000samples) EPIS PLS E T(s) E T(s) c432 0.00012 2.17 0.00039 1.26 c499 0.00012 7.52 0.00094 3.48 c880 0.00033 21.04 0.00051 6.58 c1355 0.00032 32.76 0.00083 8.50 c1908 0.00052 71.28 0.00048 13.60 c2670 0.00001 615.88 0.00026 52.12 c3540 0.00020 848.69 0.00163 64.69 c5315 0.00013 205.42 0.00031 27.00 c6288 0.00000 1823.04 0.00874 109.87 c7552 0.00030 902.24 0.00081 69.14 Table4.3.Comparisonwiththestateoftheart BDD[13] BN Ratio PSG SG PLS T CPU (s)(t 1) T total (s)(t 2) R(t 1=t 2) R/16 (Y93) Y(04) c432 139 71 0.24 295.83 18.49 c499 80 44 1.00 44.00 2.75 c880 328 1132 2.00 164.00 10.25 c1355 157 79 3.00 26.33 1.65 c1908 686 288 5.00 57.60 3.6 c2670 2051 – 28.00 73.25 4.58 c3540 23630 1732 34.00 50.94 3.18 c5315 82 31 12.00 2.58 0.16 c6288 – – 65.00 – – c7552 1281 – 37.24 34.34 2.15 Average 83.22 5.20 46

PAGE 57

0 5 10 15 20 25 50010001500200025003000 No. of SamplesEstimation Time (sec) 0 2 4 6 8 10 12 14 16 180 5 00 1 00 0 1500 2 00 0 2500 3 0 00No. Of Nodes in BNEstimation Time (sec) (a) (b) Figure4.2.(a)Estimationtimevs.numberofsamplesinthed etectionlogicforbenchmarkc880(b) Estimationtimevs.numberofnodesinthedetectionlogicfo rbenchmarkc880 Notethatsomenodesthathavehighdetectionprobabilities whichshouldhavebeencapturedbythe initialsimulation,howeverwasundetected.Hence,charac terizationbasedonsimulationsuffersfrom pattern-dependenceofthesimulativemethods. InTable4.3.,wecompareLIFE-BNfaultmodelingwiththeper formanceofapproachesbasedon theBinaryDecisionDiagram(BDD)model,asreportedbyKrie ger etal. [13]forthesesamecircuits. Theyreportedresultsusingfourtypeoffaultpartitioning .Wecompareourtime(column4)withthe timetakenbytheirtwobestmethods,namelyandPSG(column2 )andSupergateSG(column3).In column5,wereporttheratiobetweentheminimumtimetakenb ytheBDDbasedmethodandthetime takenbyourBayesiannetworkbasedapproach.Theaverageim provementseemstobe83times.Taking intoaccountoftheimprovementinprocessorspeedascompar edtotheyear1993,weassumedascaling factorof16andreportthescaledtimeperformanceratio(Ac tualRatio/16)incolumn6ofTable4.3.. Eventhough[13]explainsBDDbasedmethodofFDPcomputatio n,thecompletealgorithmfor differenttypesofdecompositiontechniquescouldnotbeob tainedfromthispaper.Hencewecould notgiveadirectcomparisonofcputimebetweentheBDDbased modelandtheLIFE-DAGmodelby re-implementingtheiralgorithminthesamecomputerweuse dforexperimentingourmodel. UsingLIFE-DAG,wemodelthelogicalmaskingeffectcapturi ngcircuitstructure,re-convergence andtheinputs.Inthefollowingchapter,wediscussanother classoferrors,knownasSingle-Event47

PAGE 58

c880 0 0.0005 0.001 0.0015 0.002 0.0025 0.003 0.0035 0.004 0.00452 04_1 3 14_1 5 27_1 5 28_1 5 29_1 6 0_1 6 9_1 7 3_1 3 15_1 4 89_1 4 4_1 7 2_1 3 13_1 4 91_1FaultsDetection Probability Figure4.3.DetectionprobabilityasSETsensitivityfordi fferentnodesinc880 Upsets.Tomodelthistypeoferrors,weneedtotakeintoacco untoftheir temporal nature.InChapter5, wedescribethemodelingofSingle-Event-Upsetsbycapturi nglogicalandtemporalmaskingeffects. 48

PAGE 59

CHAPTER5 ATIMING-AWAREPROBABILISTICMODELFORSEUSENSITIVITIES Alphaparticlesandheavyneutronspresentincosmicradiat ionscause softerrors inmicroelectronic chips.Duetoreductionindevicesizesandscalingdownofop eratingvoltages,storedchargesper nodeinlogiccircuitswillreduceconsiderablyinfuturete chnologies.Henceevenaweakradiation cancausesingle-event-upset(SEU)withinacircuit.TheSE Uwhenpropagatedtoacircuitoutputand capturedbyanoutputlatchduringthelatchingwindowwillc ausesofterror.Henceindeepsubmicron andfuturenano-metertechnologies,thesofterrorfailure ratewillbeamajorreliabilitychallengenot onlyinembeddedmemoriesbutalsoinlogicblocks.Henceiti simportanttoanalyzetheeffectof singleeventupsetsonthecircuitoutputandidentifygates thatarehighlysusceptibletosofterrors.By doingso,designerscanapplyselectivemitigationschemes targetingonlythosegateswhicharehighly sensitivetosingle-event-upsets.inthischapter,wedesc ribeaprobabilisticmodelforestimationof single-event-upsetsensitivitiesofgatesinlargecombin ationalcircuits. Softerrorsusceptibilityofanode j withrespecttoalatch L SES j Q L isthesofterrorrateatthe latchoutput Q L ,contributedbynode j .ThepropagationofanSEUgeneratedduetoaparticlehitata n internalnode j toanoutput i whichcausesabitipattheoutputofalatch L isdepictedinFig.5.1. WemodeltheSEUpropagationasfollows:Let T j i beaBooleanvariablewhichtakeslogicvalue1 ifanSEUatanode j causesanerroratanoutputnode i .Then P(T j i)(measuredastheprobabilityof T j i beingequalto1)istheconditionalprobabilityofoccurren ceofanerroratoutputnode i givenan SEUatnode j .Let P(SEU j)betheprobabilitythataparticlehitatnode j generatesanSEUofsufcient strengthandlet P(Q LjT j i)betheprobabilitythatanerroratoutputnode i causesanerroneoussignalat latchoutput Q L .Mathematically SES j Q L isexpressedbyEq.5.1. SES j Q L=R H P(SEU j)P(T j i)P(Q LjT j i)(5.1) 49

PAGE 60

I 1 I 2 I 3 . . I N d 1 d 2 d m d n Particle Hit at node j (R H ) Q L In puts Combinational Logic Latches . P (SEU j ) SEU Width . . . j i P(T j_i ) SEU Propagated to the i th output P(Q L |T j_i ) SEU causing A bit-flip at latch output d n : delay associated with the n th gate Figure5.1.SEUPropagation w here R H istheparticlehitrateonachipwhichisfairlyuniforminspaceandtime. P(SEU j)depends on V dd V th andalsoontemperature. P(Q LjT j i)isafunctionoflatchcharacteristicsandtheswitching frequency. Inthiswork,weexplore P(T j i)byaccuratelyconsideringtheeffectof(1) SEUduration ,(2) effect ofgatedelay and(3) timing ,(4) re-convergence inthecircuitstructureandmostimportantly(5) inputs SeveralworksonsofterroranalysisestimatetheoveralloutputsignalerrorsduetoSEUsattheinternal nodes[15,16,4,17,18].NotethatourfocusistoidentifytheSEUlocationsthatcausesofterrors attheoutput(s)withhighprobabilitiesand not ontheoverallsofterrorrates.Knowledgeofrelative contributionofindividualnodestooutputerrorwillhelpdesignerstoapplyselectiveradiationhardeningtechniques.Thismodelcaneasilybefusedwiththemodelingofthelatches[17,19]considering parameterssuchaslatchingwindow,setup,holdtime, V th and V dd [15,16,17]foracomprehensive modelcapturingprocessing,electricalandlogicaleffect. Wemodelinternaldependencyofthesignalstakingintoconsiderationtimingissuessothatthe SEUsensitizationprobability( P(T j i))capturestheeffectofcircuitstructure,circuitpathdelayand alsotheinputspace.Weuseacircuitexpansionalgorithmsimilartothatpresentedin[41,63]toembed time-relatedinformationinthecircuittopologywithoutaffectingitsoriginalfunctionality.Afan-out dependentdelaymodelisassumedwheregatedelayofeachnodeisequaltoitsfan-out.Wealsouse logicaleffortbaseddelaymodelwheregatedelaysaredependentnotonlyonfan-outbutalsooninput capacitanceaswellasparasiticcapacitance.DuetothetemporalnatureofSEUs,notalloftheSEUs 50

PAGE 61

causesofterrors.Fromtheexpandedcircuit,wegenerateal istofSEUs(possibleSEUlist)thatare possiblysensitizedtothecircuitoutputsatthetimeframe whenoutputsignalsarelatched.Fromthe expandedcircuitandthepossibleSEUlist,weconstructane rrordetectioncircuitandmodelSEUin largecombinationalcircuitsusingaTimingawareLogicind ucedSoftErrorSensitivitymodel(TALISES),whichisacompletejointprobabilitymodel,represen tedasaBayesianNetwork. Wemodeltheeffectofsingleeventupsetsproducedataninte rnalnodeofacircuitonthecircuit output,bycomputingthejointprobabilitydistributionde scribedbyEq.5.2. P(T j i) = j; fI lg ;X k;k6 =j P(T j i;I 0; I l; I N;X 1; ;X k; X M)(5.2) where P(T j i)istheprobabilitythatanSEUgeneratedataninternalnode j causesanerroneoussignalatoutput i T j i isatestsignalwhichcomparestheidealoutputsignalatthe i th outputwiththe correspondingerror-sensitizedoutputcausedbyanSEUatt he j th node.If T j i=1,itindicatesthe occurrenceofanerroratoutputiduetoanSEUatj. P(T j i)dependsontheNinputsignals I 0; ;I N ,M internalsignals X 1; ;X M ,andthetypeofSEUat j ( SEU 1 causedby0-1-0transitionor SEU 0 caused by1-0-1transition).Ideally,therealeffectofSEUat i th outputisproductof P(T j i)and P(SEU j), where P(SEU j)istheprobabilitythataparticlehitatanode j producesanSEUatthatnodeandit dependsonprocessparameterssuchas V dd and V th andalsodependdynamicallyontemperature.With reducedsupplyvoltagesanddiminishingdimensions,thisp robabilitywillbeveryclosetoone.Inthis work,weassumethataparticlehitoccurringatanodegenera tesanSEUandhence P(SEU j) =1.In Eq.5.2,theprobability P(T j i)doesnotconsiderthetransientnatureofSEU.Forexample,t heSEU effectmayreachtheoutputforashorttimespan,buttheoutp utsignalcanbereinforcedtoitscorrect valuebeforeitissampledbythelatch.SEUpropagationdepe ndsonthegatedelaysandSEUduration. Let t h bethetimewhenanSEUoriginatesatanode, d betheSEUduration, t s bethetimewhenoutputs aresampledand P bethesetofpropagationdelays(t d)ofsensitizedpathsfromthenodetothecircuit outputs.Nodessatisfyingthefollowingconditionsdonotc ausesofterror[41]: t h+d+t d
PAGE 62

Eventhoughtheaboveempiricalformuladoesn'ttakeintoac countofsetupandholdtimerequirements whichaffectlatchingwindowmasking,weusethisequationf orourmodelingbecausethisispretty accurateasfaraslogicalmaskingeffect,circuitstructur eandgatedelaysareconcerned. TocapturetheeffectofgatedelaysandSEUduration,wedoat ime-spacetransformationofthe originalcircuit,bymeansofacircuitexpansionalgorithm similartothatpresentedin[41].Ourmodel capturesnotonlytheeffectofgatedelays,butalsoeffecto fdifferenceinpathdelays(arrivaltimes) betweentheinputsignalsofgatesassuminggatedelayisequ altoitsfan-out.Intheexpandedcircuit, eachgateisreplicatedseveraltimescorrespondingtothet imeinstantsatwhichthegateoutputis evaluated.Thecircuitoutputsarealsoreplicated. ThuseachoftherandomvariablesinEq.5.2representasetof variablesatdifferenttimeframes. I i= fI i;0;I i;1gwhere I i;0 istheinputsignalvalueof I i attimeinstantjustbeforetheoccurrenceofaclock cycleand I i;1 isthenewinputsignalaftertheclockpulseisapplied.Sign alfI i;1gremainsthesame throughouttheclockcycle. X i= fX i;t kg8t k where t k isthesignalevaluationtime. Onlythenaloutputvalues-outputsignalsarrivingatthel atchingwindow-arecapturedbythe latch.AnSEUiseffect-lessifitdoesn'tcauseabitipinth enaloutputs.WearriveatareducedSEU listbyconsideringonlythoseSEUswhoseeffectreachthen aloutputs-outputsatthesamplingtime frames.Wealsomodifytheexpandedcircuitbyremovingpart softhecircuitwhichdonotgenerateand propagatesoft-error-causing-SEUs(discussedinSection 5.1.1). TALI-SESisaDirectedAcyclicGraphwebuildfromtheexpand edcircuitandthereducedSEUlist tocapturetheeffectofeachSEUatanodetotheoutput.Thism odelconsistsoftheidealtime-space transformedcircuitwithoutanySEUsandasetofduplicatel ogicblockstopropagatetheSEUeffects. OutputsfromtheSEUsensitizedduplicateblocksarecompar edwithcorrespondingoutputsoftheideal circuit.Ifthosesignalvaluesarenotthesame,itindicate sthattheSEUcausesanerrorattheoutput. WediscussTALI-SESconstructioninSection5.1.3.Thesali entfeaturesofmodelingSEUbyBayesian Networkareasfollows. 1.Weprovideacomprehensivemodelfortheunderlyingerror frameworkusingagraphicalprobabilisticBayesianNetworkbasedmodelTALI-SESthatiscaus al,minimalandexact. 52

PAGE 63

2.Wecanmodeltheeffectoftimingandtransientnatureofth eSEU'salongwiththeaccurate modelingofre-convergenceinthecircuit. 3.Thismodelcapturesthedata-drivenuncertaintyinthemo delingofsofterrorthatcanbeused whereexactinputpatternsarenotknownaprioriandalsocan beusedbybuildingaprobabilistic modelincasedatatracesareavailablebylearningalgorith m[65,66]. 4.Weinfererrorprobabilitiesby(1)exactinferencethatt ransformsthegraphintoaspecialjunction treestructureandreliesonlocalmessagepassingschemean dalsoby(2)smartstochasticnonsimulativeinferencealgorithmsthathavethefeatureofan y-timeestimatesandgeneratesexcellent accuracytimetrade-offforlargercircuits. 5.BayesianNetworksareuniquetoolwhereeffectofanobser vationatachildnodecanbeused togetaprobabilityspaceoftheparents.Thisiscalledback wardreasoning.Ourmodelcanbe usedtogenerateinputspaceforwhichtheSEUoccurringatap articularnode j mighthaveno impactontheoutputs.Notethatinsuchcase,hardeningtech niqueswillnotbeneededfornode j .Similarly,wecanndinputspaceforwhichSEUatanode j causehigherrorprobability atoutputs.Ifthedatatraceissimilartothesecondtypeofi nputspace,extensivehardening techniquesneedtobeappliedto j 5.1TheProposedModel Inthissection,werstfocusonhandlingthetimingawarefe atureofourprobabilisticmodel,followedbythefaultlistconstruction.Weconcludethesectio nwithdiscussionaboutthemodelitself, giventhetiming-awaregraphandthefaultlist.5.1.1TimingIssues Werstexpandthecircuitbytime-spacetransformationoft heoriginalcircuit,withoutchanging itsfunctionality.Theapproachissimilartothemethoddis cussedin[41,63].Fig.5.2.istheexpanded circuitofbenchmark c 17.Agateintheoriginalcircuit C willhavemanyreplicategatesintheexpanded circuit C0,correspondingtodifferenttime-framesatwhichthegateo utputisevaluated.Theoutput 53

PAGE 64

22,4 22,6 16,3 3,1 2,1 22,3 1,1 19,2 10,2 11,0 16,0 19,0 7,0 6,0 3,0 22,0 10,0 10,3 7,1 23_0 19,5 t = 6 t = 2 t = 2,0 3 t = 4 t = 5 1,0 16,5 t = 1 t = 0 19,3 10,4 10,5 19,4 23,6 11,3 23,3 6,1 23,4 22,523,5 Figure5.2.Time-spacetransformedcircuitofbenchmarkc1 7,modelingallSEUs 54

PAGE 65

evaluationtimefTgofeachgateinthecircuitiscalculatedbasedonvariablede laymodel.Weassume thatthedelayassociatedwithagateisequaltoitsfan-out. Foreachgate g whoseoutputisevaluated attime t2 fTgareplicatenode g;t isconstructed.Inadditiontothesereplicategates,weins ertsome duplicategates(shownbylledgatesymbolsinFig.5.2.)We explainthereasonsforaddingthese duplicategateslaterinthissection. Theinputsof g;t arethereplicatenodesofthegates,whicharetheinputsof g intheoriginal circuitandbelongstothetime-frames t0
PAGE 66

Stepsforconstructingthetiming-awareexpandedcircuit, basedonfan-outdependentdelaymodelare thefollowing: 1.Arrangegatesintheorderoflevels,withthelevelofinpu tgatesequaltozero. 2.Includeallgatesthatarepresentintheoriginalcircuit .Outputsignalsofthesegatesrepresentthe steadystatesignalvaluesat t=0,beforetheapplicationofnewinputs. 3.Addadditionalinputnodesrepresentingnewinputsignal valuesat t=1; 4.Foreachlevelofthecircuitstartingfromlevel l i=1,repeatthefollowingstep: Foreachgate g inlevel l i ,createreplicategatesattimeframe t = t p + f g ,where t p isthemaximum timeframeofthepreviouslyinsertedparentgatesof g and f g isthefan-outofgateg.Updatetime framesofgate g Outputsignalsofacircuitaresampledat t=t s ,where t s isthemaximumofthelatestsignalarrival timesoftheoutputsignals.SEUswhichdonotsatisfyEq.5.3 affectcircuitoutputsresultinginsoft errors.TheseSEUsaretheupsetsgeneratedattheoutputofg ates,whichareinthefan-inconesofnal outputs(outputsevaluatedattime t s ).SEUsoccurringatcertainothergates,whicharenotinthe fan-in conesofthenaloutputs,mayalsoaffectcircuitoutputs.T hesenodesariseduetotheSEUduration time d .ForexampleinFig.5.2.,weseethatthenaloutputsaregen eratedattimeinstantt=6.Ifan SEUoccursatsignal19at4nsandlastsforonetimeunit,itwi llessentiallybecapableoftampering thevalueofnode23at6ns.Notethatweassumethat d isonetimeunit.Thefaultlistwillbedifferent ifwechangethevalueof d .ThuswecanseethatSEUswhicharesensitizedtooutputsatt imeframes between t s and t sd maycausesofterrors,dependingontheinputsignalsandcir cuitstructure. Consideringtheabovefactors,wemodifytheexpandedcircu itbyincludingonlythosegatesthat propagateSEUstotheoutputsbetweentimeinstants, t s and t sd .Thuswegetaconsiderablereduction inthecircuitsize.Fig.5.3.isthemodiedexpandedcircui tofc17,whichmodelsallSEUspossibly sensitizedtoanaloutput. Next,wediscusshowtogeneratealistofpossibleSEUsaffec tingthecircuitoutputs.Notallgates inFig.5.3.areSEUsensitive.Asdiscussedabove,aduplica tenodeintroducesanadditionaldelayof 56

PAGE 67

3,0 22,6 16,3 11,0 6,1 3,1 2,1 t = 6 t = 2 t = 19,5 3 t = 4 t = 5 16,5 10,5 t = 1 t = 0 10,4 19,4 11,3 7,1 23,6 22,5 23,5 1,1 6,0 Figure5.3.Modiedtime-spacetransformedcircuitofbenc hmarkc17,modelingonlythepossibly sensitizedSEUs 57

PAGE 68

atleastonetimeunit.Ifthedelayintroducedbyaduplicate gateisgreaterthanorequalto d ,theSEU durationtime,theeffectofSEUsoriginatedatanyofthegat esinthefan-inconesoftheduplicategate isnulliedandcorrectsignalvalueisrestoredattheoutpu toftheduplicategate,andhencethoseSEUs areeffect-less.ThuswecreateareducedlistofSEUsbytrav ersingthemodiedextendedcircuitfrom eachofthecircuitoutputsattimeinstantsbetween t s and t sd ,untiladuplicategateoraninputnode isreached.5.1.2DelayModelingBasedonLogicalEffort Weextendthisworkbyusinglogicaleffortbasedmodelwhich isdependentonfan-out,input capacitanceaswellasparasiticdelay.Inthissectionweex plainhowgatedelaysarecalculatedbased onlogicaleffort[64].Delayofalogicgatecanbeexpressed asthesumoftwocomponents,effortdelay andparasiticdelay.effortdelayistheproductoflogicale ffortandelectricaleffort,wherelogicaleffort isdenedastherelativeabilityofagatetopologytodelive rcurrentandelectricaleffortistheratioof outputcapacitancetoinputcapacitance.Electricaleffor tissometimescalledfan-out.Mathematically, gatedelayisexpressedas d=f+p=gh+p where f iseffortdelay, p istheparasiticdelay, g isthe logicaleffortand h iselectricaleffort.Logicaleffortisdenedtobe1forani nverter.Hencelogical effortistheratioofinputcapacitanceofagatetotheinput capacitanceofaninverterdeliveringthe sameoutputcurrent.Itcanbeestimatedcountingcapacitan ceinunitsoftransistorwidth.Parasitic delayrepresentsdelayofagatedrivingnoloadanditdepend sondiffusioncapacitance.parasiticdelay ofaninverter, P inv1.Fromtheaboveconsiderations,wecomputebasicCMOSgate delaysanduse thesedelayvaluesinourmodel.Table5.1.showsthedelayex pressionsforbasicgates. Table5.1.Gatedelaysbasedonlogicaleffort GateType Delay Inverter fanout+P inv n-inputNAND n+2 3fanout+nP inv n-inputNOR 2 n+1 3fanout+P inv 2-inputXOR 4fanout+4 nP inv 58

PAGE 69

Circuitexpansionisperformedinasimilarwayasexplained intheabovesection.Eachgateis replicatedseveraltimescorrespondingtothetimeframesa twhichnewgateoutputsignalsareevaluated. Here,gateoutputevaluationtimeisbasedondelayvaluesca lculatedasabove.Thisisillustratedin Figure5.3.whichshowshowbenchmarkcircuit c 17isexpandedwithlogicaleffortbasedgatedelay model.Delayofa2-inputnandgatewithonefan-outiscalcul atedas3.33timeunitsandthatofa2-input gatenandgatewith2fan-outis4.67timeunits.Finaloutput isevaluatedattimeunit T s=13:67.From thisexpandedcircuit,wearriveatareducedcircuitbytrav ersingbackwardfromoutputsevaluatedat T s and T sd untiladuplicategateoraninputisreached,therebymodeli ngonlythepossiblysensitized SEUs.5.1.3TALI:Timing-Aware-Logic-InducedSEUSensitivityM odel Inthissection,werstdescribetheproposedBayesiannetw orkbasedmodel,whichcanbeusedto estimatethesofterrorsensitivityoflogicblocks.Thismo delcapturesthedependenceofSEUsensitivity ontheinputpattern,circuitstructureandthegatedelays. Notethatthisprobabilisticmodelingdoesnot requireanyassumptionsontheinputsandcanbeusedwithany biasedworkloadpatterns.Theproposed model,Timing-Aware-Logic-Induced-Soft-Error-Sensiti vity(TALI-SES)ModelisaDirectedAcyclic Graph(DAG)representingthetime-spacetransformed,SEUencodedcombinationalcircuit,where C0istheexpandedcircuitcreatedbytime-spacetransformati onasdiscussedinsection.5.1.1and J isthesetofpossibleSEUs(alsodiscussedinsection5.1.1) .Theerrordetectioncircuitconsistsof theexpandedcircuit C0,anerrorsensitizationlogicforeachSEUandadetectionun it T consistingof severalcomparatorgates.Weexplainitwiththehelpofasma llexampleshowninFig5.5.(a),whichis theerrordetectioncircuitforasmallportionofbenchmark c17.TheerrorsensitizationlogicforanSEU atnodejconsistsoftheduplicatedescendantnodesfrom j .InFig.5.5.(a),theblockwiththedotted squareisthesensitizationlogicfor16;5 1s [An SEU 1 atnode16attime t=5].Itconsistsofnodes22;5 s and22;6 s descendingfromnode16;5ofthetime-spacetransformedcircuit.Forsimplicity,we showthe modelingofonlyoneSEUinthisexample.Ourmodelcanhandle anynumberofSEUssimultaneously. EachSEUsensitizationlogichasanadditionalinputtomode ltheSEU.Example:input SEU 1 16;5 .This 59

PAGE 70

22,13.67 19,10.34 t = 1.0 1,0 23,12.33 t = 12.33 11,0 19,5.67 10,10.34 t = 10.34 10,5.67 t = 4.33 19,4.33 t = 0.0 16,10.34 7,0 10,4.33 7,1 6,1 16,0 19,9.00 23_9.00 3,1 2,1 1,1 19,0 t = 9.00 22,9.00 11,5.67 16,5.67 , t = 5.67 6,0 23_7.66 22,7.66 22,0 10,0 3,0 t = 7.66 2,0 23_0 t = 13.67 23_13.67 Figure5.4.Time-spacetransformedcircuitofbenchmarkc1 7withlogicaleffortbaseddelaymodel 60

PAGE 71

16,5 19,5 22,6 23,6 10,5 T T ((16,5)-22,6)) 23,6 s 22,6 s X X 22, 6 s s X T(16,5)-(23,6)) T((16,5)-22,6)) X 19, 5 X X 16, 5 10, 5 X X 22, 6 X 23, 6 ((16,5)-23,6)) 16,5 1 s 23, 6 165 X s 1 Figure5.5.(a)AnillustrativeSEUsensitivitylogicforasu bsetofc17(b)Timing-aware-logic-inducedDAGmodeloftheSEUsensitivitylogicin(a) inputsignalvalueissettologiconeinordertomodeltheeffectofa0-1-0SEUoccurringatnode16at timeframe5. Asdiscussedpreviouslyinsection5.1.1,anSEUlastingforaduration d cancauseanerroneous outputifitseffectreachestheoutputatanyinstantbetweenthesamplingtime t s andtimeframe t s)Tj/T1_1 10.909 Tf1 0 0 1 540.96 440.4 Tm(d Inthisworkweassume d tobeone.Hencewegeterrorsensitizedoutputsattimeframe t s andforsome SEUsat t s)Tj/T1_0 10.909 Tf1 0 0 1 157.08 396.6 Tm[(1also,ifthereexistre-convergentpathsbetweenSEUlocationandanoutput.Weneed tocomparetheSEU-freeoutputsignalsevaluatedatthesamplingtime, t s withthecorrespondingSEUsensitizedoutputsignalsarrivingat t s)Tj/T1_0 10.909 Tf1 0 0 1 273.6 352.8 Tm[(1and t s .Hencethesesignalsaresenttoadetectionunit T .The comparatorsinthedetectionunitcomparetheidealanderrorsensitizedoutputswiththecorresponding error-freeoutputsandgeneratetestsignals.Forexample,thetestsignalsforanSEUatnode j attime t are T(j;t) (i;t s)and T(j;t) (i;t s)Tj/T1_0 7.97 Tf1 0 0 1 223.32 284.88 Tm(1).Ifanyofthesethetestsignalvalueis1,itindicatestheoccurrenceofan error.Theprobability P(T(j;t) i),whichisameasureoftheeffectofSEU(j;t)s ontheoutputnode i is computedasajointprobabilitywhichisexplainedbelow: Let A beaneventthatanSEUatnode j causesabit-ipatoutput i attime t s andlet B beanevent thatanSEUatnode j causesabit-ipatoutput i attime t s)Tj/T1_0 10.909 Tf1 0 0 1 360.24 199.32 Tm[(1. P(A=1)istheprobabilityofoccurrence oferrorandattime t s P(A=0)istheprobabilitythatSEUdoesn'tcauseanerrorat t s P(B)canbe explainedinasimilarway.TheErrorprobabilityduetoanSEUatnode j attime t w.r.t.output i isthe 61

PAGE 72

jointprobability P(A[B) =P(A=1;B=0)+P(A=0;B=1)+P(A=1;B=1)(5.4) whichisexpressedas: P(T(j;t) i) =P(T(j;t) (i;t s) [T(j;t) (i;t s1) ) :(5.5) AnSEUcanhaveeffectonmorethanoneoutput.Theoveralleff ectofanSEU(j;t)s ontheoutputs iscomputedas P(T(j;t) ) =max8ifP(T(j;t) i) g.IntheexampletheSEU(16;5)s issensitizedtooutputs 22,6and23,6.HencethetwotestsignalsforthisSEUare T(16;5) (22;6)and T(16;5) (23;6). AnSEUoccurringatnode j attime t ,whichiseither SEU 1 or SEU 0 (butnotboth) ,cancause abit-ipattheoutputwithprobability P(T 1 j;t)or P(T 0 j;t).InordertocomputetheSEUsensitivity ofanode,wetaketheworstcaseprobability,whichisthemax imumoftheabovetwoprobabilities. P(T j;t) =maxfP(T 1(j;t) ) ;P(T 0(j;t) ) gMorethanoneSEUcanoriginateatanodeatdifferenttimefra mes.Consideringtheeffectof SEUsatnodejatalltimeframes,wecomputetheworstcaseout puterrorprobabilityduetonodejas P(T j) =max8tfP(T(j;t) ) g,whichisthemaximumprobabilityoveralltimeframes. Thesedetectionprobabilitiesdependonthecircuitstruct uraldependence,theinputs,dependencies amongsttheinputs,gatedelaysandtheSEUduration.Inthis workweassumerandominputsfor experimentationandvalidationofourmodel. WeconstructtheTALI-SESBayesianNetworkoftheSEUdetect ioncircuitbynodeswhichare randomvariablesrepresentingsignalvaluesoftheSEUdete ctioncircuit.Asignal i inthedetection circuitisrepresentedbytherandomvariable X i intheBayesianNetwork. InTALI-SESDAGstructuretheparentsofeachnodeareitsMar kovboundaryelements.Hencethe TALI-SESisaboundaryDAG.FordenitionofMarkovBoundary andboundaryDAG,pleasereferto [26].NotethatTALI-SESisaboundaryDAGbecauseofthecaus alrelationshipbetweentheinputs andtheoutputsofagatethatisinducedbylogic.Ithasbeenp rovenin[26]thatifgraphstructureis aboundaryDAG D ofadependencymodel M ,then D isaminimalI-mapof M ([26]).Thistheorem 62

PAGE 73

alongwithdenitionsofconditionalindependencies,in[2 6](weomitthedetails)speciesthestructure oftheBayesiannetwork.ThusTALI-SESDAGisaminimalI-map andthusaBayesiannetwork(BN). QuanticationofTALI-SES-BN :TALI-SES-BNconsistsofnodesthatarerandomvariablesof the underlyingprobabilisticmodelandedgesdenotedirectdep endencies.Alltheedgesarequantiedwith theconditionalprobabilitiesmakingupthejointprobabil ityfunctionoveralltherandomvariables.The overalljointprobabilityfunctionthatismodeledbyaBaye siannetworkcanbeexpressedastheproduct oftheconditionalprobabilities.Letussay, X0 = fX01;X02; ;X0mgarethenodesetinTALI-SESBayesian Network,thenwecansay P(X0 ) =m k=1 P(x0kjPa(X0k))(5.6) where Pa(X0k)isthesetofnodesthathasdirectededgesto X0k .Acompletespecicationoftheconditional probabilityofatwoinputANDgateoutputwillhave2 3 entries,with2statesforeachvariable.These conditionalprobabilityspecicationsaredeterminedbyt hegatetype.Byspecifyingtheappropriate conditionalprobabilityweensurethatthespatialdepende nciesamongsetsofnodes(notonlylimitedto justpair-wise)areeffectivelymodeled. WedemonstratethemodelingofSEUbasedonTALI-SESusingIS CASbenchmarkcircuits.The logicalrelationshipbetweentheinputsandtheoutputofag atedeterminestheconditionalprobability ofachildnode,giventhestatesofitsparents,intheTALI-D AG. 5.2ExperimentalResults InTable5.2.wereportthetotalnumberofgatesintheactual circuit(column2),totalnumber ofgatesinthemodiedexpandedcircuit(column3),andthet otalnumberofnodesintheresulting TALI-SES(column4).Column5liststhemaximumtime-frames ofthecircuits. WecomputetheSEUsensitivityofanindividualnode P(T j)inacircuitasfollows: 1.Computetheoutputerrorprobabilityatoutputnode i duetoanSEUatnodejattimetbytaking thejointprobabilitiesasdiscussedinsection5.15.1.3. P(T(j;t) i) =P(T(j;t) (i;t s) [T(j;t) (i;t s1) )(5.7) 63

PAGE 74

Table5.2.Sizeoforiginalandtime-expandedISCAScircuit sforfanout-dependentdelaymodel Gates Gatesex-panded #ofnodes(TALI) Timeframes c432 196 476 1989 55 c499 243 464 1596 30 c880 443 729 2552 51 c1355 587 1440 3388 55 c1908 913 1524 18118 79 c2670 1426 2584 4097 81 c3540 1719 3795 15670 93 c5315 2485 4887 13228 90 c6288 2448 30113 31157 263 c7552 3719 10006 45907 88 Table5.3.Estimated P(T j i)valuesofnodesinbenchmarkc17fromexactinference nodej SEU 1 SEU 0 P(T j 22) P(T j 23) P(T j 22) P(T j 23) 10 0.2813 0 0.4375 0 11 0.0625 0.2344 0.3125 0.6563 16 0.3125 0.1875 0.4375 0.4375 19 0 0.375 0 0.4375 22 0.4375 0 0.5625 0 23 0 0.4375 0 0.5625 2.ConsideringtheeffectofallSEUsatnodejatallpossible timeframes,computetheprobability ofoccurrenceofanerroratthe i th outputdueSEUsatnodejbyEq.5.8. P(T j i) =max8tfP(T(j;t) i) g(5.8) 3.ComputetheworstcaseSEUsensitivityofanodejduetoan SEU 1 and SEU 0 andallforoutputs byEq.5.9 P(T j) =max8ifP(T 1 j i) ;P(T 0 j i) g(5.9) 64

PAGE 75

5.2.1ExactInference Inthissection,weexploreasmallcircuitc17,withexactin ferencewherewetransformtheoriginal graphintojunctiontreeandcomputeprobabilitiesbylocal messagepassingbetweentheneighboring cliquesofthejunctiontreeasoutlinedinsection3.13.2.N otethatthisinferenceisproventobeexact[26,59](zeroestimationerror). Table5.3.tabulatestheresultsoftheTALI-SESofbenchmar kc17usingtheexactinference.In thistable,wereporttheprobabilitiesoferroratoutputno des22and23dueanSEUateachnode j (column1)namely(10;11;16;19;22 and 23).Column2and3ofTable5.3.giveerrorprobabilities dueto SEU 1 (0-1-0transition)atoutputnodes22 and 23respectively.Similarly4and5giveerror probabilitiesdueto SEU 0 (1-0-1transition)atoutputnodes22 and 23respectively.Wecomparethe error-freeoutputsat22and23atsamplingtime t s withcorrespondingerrorsensitizedoutputsarriving attimeframes t s1and t s duetoSEUsgeneratedatanodeatallpossibletimeframes(as discussed insection5.15.1.3).Columns2,3,4and5ofTable5.3.repor tsthemaximumoferrorprobabilities duetoSEUsoriginatedatindividualnodesatalltimeframes .Fromthistableitcanbeseenthatfor thisbenchmarkcircuit SEU 0 shavehighimpactontheoutputerrorprobabilitiesthan SEU 1 s.Error probabilityatoutputnode22duetoan SEU 1 atnode11,isverylow(0.0625)whereaserrorprobability atoutput22dueto SEU 0 at11is0.3125.ItalsoshowsthattheeffectofSEUsarenotth esameoverall outputs.Forexample,an SEU 1 atnode19causesnoerroratoutput22whereaserrorprobabil itydueto thisSEUatoutputnode23is0.4375.Notethatnodes22and23a retheoutputnodes.SEUsoccurring atthesenodesatsamplingtime t s ortime t s1willbelatchedbyanoutputlatch,andareexpectedto causeveryhigherrorprobability.HoweverfromTable5.3., itisobservedthatprobabilityofoccurrence ofanerrordueto SEU 1 atnode23isonly0.4375.Similarly,probabilityofoccurre nceofanerrordue to SEU 1 atnode22isalso0.4375.Thisisduetothetypeofinputpatte rn.Inthiswork,weassume randominputs.Thisresultshowsthedependenceofinputpat ternon P(T j i). 5.2.1.1InputSpaceCharacterization Inthissection,wedescribetheinputspacecharacterizati onforaparticularobservationexploring thediagnostic(backtracking)featureoftheTALI-SESmode l.Notethatthisfeaturemakesitreally 65

PAGE 76

uniqueasinsteadofpredictingtheeffectofinputsandSEUa tanodeontheoutputs,wetrytoanswer querieslike“WhatinputbehaviorwillmakeSEUatnodejden itelycausingabit-iptheatcircuit outputs?”or“Whatinputbehaviorwillbemoreconduciveton oerroratoutputgiventhatthereisan SEUatnodej?”Resolvingquerieslikethis,aidsthedesigne rinobservingtheinputspaceandhelps performinputclusteringormodeling. Letustakeanexampleofc17benchmark.Weexploretheinputs paceforstudyingtheeffectof SEU 0 and SEU 1 atnode19onerrorsonboththeoutputs(22and23).Onecancha racterizeinput spaceforanyoneoftheoutputs(oringeneraleffectofSEUat anynodeonanyothersubsetofnodes). Fig5.6.acharacterizestheinputspaceforan SEU 0 atnode19suchthatnobit-ipoccurattheoutputs. Thisisdonebysettingtheoutputerrorprobabilityatzero( bygiving“evidence”tothedetectionnodes intheBayesianNetwork)andthenbackpropagatingtheproba bilities.Weplottheprobabilitiesofeach inputs1;2;3;6 and 7thatgivesnooutputerrorforan SEU 0 at19.Eachcolumnintheplotrepresents aninput.Thelightercolorrepresentstheprobabilityofth atinput=0andtheblackcolorrepresentsthe probabilityofinput=1(sumofthesetwopartshouldalwaysbe one ).Onecanseethatforobtaining zerooutputerrorwithan SEU 0 at19,input1canberandom,input2and7have65%probability of beingatlogiconeandnode3and6hasprobabilityof30%forlo gic1.Notethattheinputspaceis nearlyrandom(p(1)=p(0)=0.5)when SEU 1 atnode19produceszerooutputerroratboththeoutputs. SimilarcharacteristicsareshowninFig.5.6.c,5.6.dforc haracterizingtheinputspacewithrespectto outputerrorswhile SEU 0 or SEU 1 occursatnode11.Onceagainitcanbeseenthatzerooutputer ror for SEU 1 canbemorelikelybyarandominputsthanfor SEU 0 5.2.2LargerBenchmarks WeuseapproximateinferenceforlargercircuitusingProba bilisticLogicsampling[28]whichis patternindependentrandommarkovchainsamplingandhassh owngoodresultsinmanylargeindustrysizeapplications. InFig.5.7.(a),weplotthenumberofgatesandthenumberofp ossiblysensitizedSEUsforISCAS benchmarks.ThisreducedSEUlistwascreatedbasedonfanou t-dependentdelaymodelandassuming anSEUduration d equaltoonetimeunit.Wegetaconsiderablereductioninthe numberoflisted 66

PAGE 77

SEU0(1-0-1) AT NODE 190 0.2 0.4 0.6 0.8 1 1.2 In 1In 2In 3In 6In 7INPUTSPROBABILITIES P(in) =1 P(in) = 0 SEU1(0-1-0) AT NODE 190 02 04 06 08 1 12In 1 In 2 In 3 In 6 In 7 INPUTSPROBABILITIES P(in) =1 P(in) = 0 (a) ( b) EU0 (101) at Node 11 0 0 .2 0.4 0.6 0.8 1 1.2 In 1In 2In 3In 6In 7 INPUTSPROBABILITIES P(in) =1 P(in) = 0 SEU1(0-1-0) at Node 110 0.2 0.4 0.6 0.8 1 1.2 In 1In 2In 3In 6In 7INPUTSPROBABILITIES P(in) =1 P(in) = 0 (c) ( d) Figure5.6.Inputprobabilitiesforachievingzerooutputerrors(atnodes22 and 23inpresenceofSEU's: (a) SEU 0 atnode19(b) SEU 1 atnode19(c) SEU 0 atnode11(d) SEU 1 atnode11forc17benchmark SEUscomparedtothenumberofgatesinacircuit.ThisisbecausereducedSEUlistisgenerated bytraversingbackwardfromthenaloutputsevaluatedatsamplingtime t s and t s)Tj/T1_1 10.909 Tf1 0 0 1 475.2 257.88 Tm[(1andonlythose gatesthatliebetweenthenaloutputsandduplicategatesneedtobeconsideredforSEUsensitivity analysis.Dependingontheinputpatternandthecircuitstructure,onlyafewoftheseSEUsactually causesofterrors.BasedontheestimatedSEUsensitivity P(T j)calculatedasinEq.5.9weclassifythe SEUsensitivegatesinacircuitintothreecategories,gateswhere P(T j)is(i)lessthanorequalto0.3(ii) between0.3and0.6and(iii)above0.6.ThisisplottedinFig.5.7.(b)Theseresultsarehelpfultoapply selectiveredundancymeasuresortomodify P(SEU j)(bychangingdevicefeatures)bygivinghigher 67

PAGE 78

0 500 1000 1500 2000 2500 3000 3500 4000c432c499c880 c1355c1908c2670c3540c5315c6288c7552ISCAS BenchmarksNo. of gates/SEUs Listed SEUs Gates 0 50 100 150 200 250 300c432c499c880 c1355c1908c2670c3540c5315c6288c7552ISCAS BenchmarksNo. of gates 0.00.6 (a) (b) Figure5.7.(a)SEUlist-fanoutdependentdelaymodel(b)SE Usensitivityrange-fanoutdependentdelay model,withdelta=1;inputbias=0.5 Table5.4.SEUsensitivityestimationerrorsandtimefor99 99samples (E mean) (E max) T bn(sec) c432 0.0031 0.0069 18.57 c499 0.0024 0.0198 13.43 c880 0.0027 0.0090 27.58 c1355 0.0027 0.0120 28.84 c1908 0.0028 0.0120 176.63 c2670 0.0034 0.0130 34.70 c3540 0.0023 0.0101 148.07 c5315 0.0045 0.0112 121.62 c7552 0.0035 0.0100 513.05 prioritytonodesthoseareinthehighsensitivityrangetha nthoseinthelowersensitivityranges.From Fig.5.7.(b),itcanbeseenthattheSEUsensitivenodesofci rcuitc432areequallydistributedwithinthe threeprobabilityranges(i),(ii)and(iii),whereasallth eSEUsensitivenodesincircuitc1355liewithin themiddlerangewhere P(T j)isbetween0.3and0.6.Resultsofc7552showsthat P(T j)ofmostofthe SEUsensitivenodesisinthelowestrange(lessthanorequal to0.3),whichindicatesthatgatesinthis circuitdonotrequireextensivehardeningtechniques,whe reasmajorityofSEUsensitivegatesinc2670 requiresextensivehardeningtechniquessince P(T j)isveryhigh(above0.6)forthesenodes. WeimplementedtheSEUsimulatorbasedontheworkdonein[41 ]withafanout-dependentdelay modelforthegroundtruth.Weperformedthesimulationwith 500;000randomvectorsobtainedby 68

PAGE 79

changingseedafterevery50000vectorstogettheground-tr uthSEUprobabilities.Forourprobabilistic framework,weuseProbabilisticLogicSampling[28]infere ncescheme.WecomputetheSEUsensitivities P j ofgatesinISCASbenchmarkcircuitsusingProbabilisticLo gicSampling(PLS)[28]with9999 samplesandcompareourresultswithground-truthsimulati onresults.Table5.4.givestheaverage estimationerror E mean [incolumn2]andmaximumestimationerrors E max [incolumn3].Here E mean ofacircuitistheaverageofdifferencebetweenthe SEU detectionprobabilities(or SEU sensitivities) obtainedfromsimulationandestimatedprobabilitiesfrom PLSsamplingoverallpossibleSEUsensitivenodesinthecircuit.Similarly E max ofacircuitisthemaximumofdifferencebetweenthe SEU sensitivitiesobtainedfromsimulationandestimated SEU sensitivitiesfromPLSsamplingoverallpossibleSEUsensitivenodesinthecircuit.Estimationtime, T bn [column4]isthetimetakenbythePLS schemeforbeliefpropagation.WeestimatedtheSEUsensiti vitiesalltheISCAS'85benchmarkswith anaveragebeliefpropagationtimeof140.49sec,whereasth eaveragetimetakenforlogicsimulationof thesecircuitsis33hours.Estimationerroroverallbenchm arksisbelow0.0034whichshowsexcellent accuracy-timetrade-off. T bn isthetotalelapsedtime, includingmemoryandI/Oaccess .ThistimeisobtainedbytheftimecommandintheWINDOWSenvironmentonaPe ntium-42.0GHzPC.Itisevident fromtheresultsthatusingagraph-basedcausal,compactpr obabilisticframework,BayesianNetwork, weareabletoaccuratelymodeltheSingle-event-upset(SEU )sensitivitiesoflogiccircuitsignalsaccountingfortemporalandspatialdependencies.Theexciti ngfeatureofthisstimulus-freeapproachis thatitusesconditionalindependenciesinmodelingspatia lcorrelationsandtime-spacetransformation forcapturingtemporaldependencies.5.2.3ResultswithDelayModelBasedonLogicalEffort Inthissectionwegiveestimationresultsfromourmodelwit hlogicaleffortbasedgatedelaymodeling.InTable5.5.,welistthenumberofnodesinTALIBayes iannetworkandtheestimationtimein secondsforsomeoftheISCASbenchmarks.NumberofTALInode sdependsontheSEUlistaswell asthecircuitsize,whereasestimationtimedirectlydepen dsonthenumberofnodesandthenumberof samples.WeshowresultsforProbabilisticLogicSampling( PLS)with9999samples. 69

PAGE 80

Table5.5.SizeofTALImodelandestimationtimeforlogicaleffortbaseddelaymodel #ofnodes ( TALI) Estimation T ime(s) c432 2390 22.32 c499 7814 65.75 c880 1097 12.49 c1355 1773 15.092 c1908 2279 22.22 c3540 14370 135.79 0 200 400 600 800 1000 1200 1400 1600 1800 2000c432 c499 c880 c1355 c1908 c3540 c3540ISCAS BenchmarksNo. of gates/SEUs Listed SEUs Gates 0 10 20 30 40 50 60 70 80c432 c499 c880 c1355 c1908 c3540ISCAS BenchmarksNo. of gates 0006 (a) ( b) Figure5.8.(a)SEUlist-logicaleffortdelaymodel(b)SEUsensitivityrange-logicaleffortdelaymodel withdelta=1andinputbias=0.5 70

PAGE 81

Figure5.8.(a)showsthenumberofpossiblysensitizedSEUs vs.thenumberofgatesinISCAS benchmarks.Fromthisgraph,itcanbeseenthatthenumberof SEUsinthereducedSEUlistislow comparedtofanoutdependentdelaymodel.Thisisduetohigh gatedelayvalueswithlogicaleffort baseddelaymodelingsincewetakeintoaccounttheinputcap acitanceaswellasparasiticdelayin additiontofanout.Duetoincreasedgatedelaystherelativ eeffectofanSEUataninternalgateona primaryoutputduringlatchingperiodislesssincemostoft hesignalsgetenoughtimetorestoretotheir idealvalues.Figure5.8.(b)showstheSEUsensitivityrang esofgatesinthecircuits,withaninputbias of0.5andSEUwidthequaltoonetimeunit.Aswithfanout-dep endentdelaymodeling,herealsowe classifytheSEUsensitivegatesinacircuitinto3categori es.Gateswithestimatedsensitivityvalues (1)lessthan0.3,(2)between0.3and0.6and(3)above0.6.Gi venanydelaylibraryforalogiccircuit, ourmodelcanbeusedtoclassifythegatesinthecircuitinth eorderoftheirSEUsensitivityvalues capturinglogicalmaskingeffect,circuitstructure,inpu tpatternandSEUduration. Pleasenotetheaboveestimatedprobabilityvaluesarerela tivelyhighwhenweconsidertheoverall softerrorsusceptibilityofindividualgates.Togetacomp rehensivemodel,theelectricalmasking effect,latchingwindowmaskingeffectandalsotheSEUgene rationandpropagationcharacteristicsof individualgatesaretobeincorporatedwithourmodel.Mode lingelectricalmaskingeffectneedscircuit levelsimulationtechniques,whichwearetryingtointegra tewithourcurrentapproachasafuture direction. WeareabletoeffectivelymodelSingle-event-Upsetsinlog iccircuits(ISCASbenchmarks)toestimatetheSEUsensitivityofindividualnodesinacircuitc apturingspatialandtemporalsignalcorrelations,speciallyemphasizingtheeffectofinputs,gated elay,SEUdurationandcircuitstructure.We showresultswithexactandapproximateinferences.Usinge xactinferencewecharacterizeinputspace whichgiveszerooutputerroreveninthepresenceofsomeSEU s.Resultsfromapproximateinference showsexcellentaccuracy-timetrade-offs.Futureefforti ncludesmodelingwithbiasedinputpatterns, fordifferentSEUwidth d andalsoforotherdelaymodels,tostudytheeffectofthesef actorsonSEU sensitivities.Wearealsoinvestigatingonthetheeffecto fthresholdvoltageandsupplyvoltageonthe electricalmaskingeffectontransientpulsescausedbypar ticlebombardment. 71

PAGE 82

Inthefollowingchapter,wediscussthemodelingandanalys isofthethirdclassoferrors,known asdynamicerrorswhichwillbepredominantinnano-meterte chnologies.SimilartoSingle-EventUpsets,dynamicerrorsarealsotransientinnature.Thesea recausedbytemporarymal-functionof devicessincetheyareforcedtooperateneartheirthermall imits.However,dynamicerrorsarenot localizedeventsasSEUs.Theyexistseverywhereinthecirc uits,sinceeverygatewilloperatewitha niteerrorprobability.Henceinfuturetechnologies,com putingwillbecomeprobabilisticratherthan deterministic.InChapter6,wediscussthemodelingofdyna micerrorsusingBayesiannetworksand analyzetheeffectofindividualgateerrorsontheoverallc ircuitoutputs. 72

PAGE 83

CHAPTER6 PROBABILISTICERRORMODELINGOFNANO-DOMAINLOGICCIRCUIT S Astechnologyscalesdownfromdeepsubmicrontonano-meter levels,circuitreliabilityisgreatly affectedduetovariousnoisesources.Deviceswilloperate erroneouslyduetosupplyvoltagebounds, highsubthresholdleakagecurrents,highswitching,higho peratingtemperatures,etc.Werefertothis compoundnoiseeffectsas”dynamicerrors”.Everygateinal ogiccircuitwillhaveanitedynamicerror probability.Weproposeanoveltechniquetomodeltheeffec tofdynamicerrorsinscalednano-meter circuitsandtoestimatethedynamicerrorsensitivityofin dividualgatesinthecircuit. Weestimatetheoverallerrorprobabilityattheoutputofal ogicblockwhereindividuallogicelementsaresubjecttoerrorwithaniteprobability, p .WeconstructtheoverallBNrepresentationbased onlogiclevelspecicationsbycouplinggatelevelreprese ntations.Eachgateismodeledusingaconditionalprobabilitytable(CPT)whichmodelstheprobabilit yofgateoutputsignalbeingatalogicstate, givenitsinputsignalstates.Anideallogicgatewithnodyn amicerrorhasitsCPTderivedfromthegate truthtable,whereastheCPTofagatewitherrorisderivedfr omitstruthtableandtheerrorprobabilities,whichcanbeinputdependent.Theoveralljointproba bilitymodelisconstructedbynetworking theseindividualgateCPTs.Thismodelcapturesallsignald ependenciesinthecircuitandisaminimal representation,whichisimportantfromascalabilitypers pective. Table6.1.Probabilisticrepresentationofthe“truth-tab le”ofatwoinput,error-free,ANDgateandfor anANDgatewithdynamicerrors IdealANDgate X i 1 X i 2 P(X ojX i 1;X i 2)X o =0 X o =1 0010 0110 1010 1101 ANDgatewithdynamicerror X i 1 X i 2 P(X ojX i 1;X i 2)X o =0 X o =1 001pp 011pp 101pp 11 p 1p 73

PAGE 84

Wemeasuretheerrorwithrespecttotheideallogicalrepres entation.Suppose,wehavelogicblock withinputs Z 1; Z N ,internalsignals X 1; X M ,andoutputs Y 1; Y K .Letthecorrespondingversions oftheinternallinesandtheoutputswithdynamicerrorsbed enotedby X e 1; X e M and Y e 1; Y e K ,respectively.Thus,theerroratthe ith outputcanbemathematicallyrepresentedastheXORoftheer ror-free andtheoutputwitherror. E i=Y e iY i (6.1) Weproposetheoutputerrorprobability P(E i=1) =P(Y e iLY i=1)asadesignmetric,inaddition toothertraditionalonessuchasareaordelay,tovetdiffer entdesignsinthenano-domain.Notethat theprobabilityofoutputerrorisdependentontheindividu algateerrorprobability p andalsoonthe internaldependenciesamongthesignalsthatmightenhance orreducetheoveralloutputerrorbasedon thecircuitstructure.Inthiswork, weprovethatforcausallogicalcircuits,thesedependenci escanbe modeledbyaBayesianNetwork,whichisknowntobetheexact, minimalprobabilisticmodelforthe underlyingjointprobabilitydensityfunction(pdf) .ProbabilisticbeliefpropagationontheseBayesian Networkscanthentobeusedtoestimatethisprobability. Contributionsofthisworkaresummarizedasfollows: 1.Weproposeanexactprobabilisticrepresentationofthee rrormodelthatcapturesthecircuittopologyforestimatingtheoverallexactoutputerrorprobabili tiesforsmallcircuits 2.Weusescalablepatterninsensitivestochasticinferenc eschemesforestimationofapproximate outputerrorprobabilitiesofmediumsizedbenchmarks(ISC AS'85).Theseinferenceschemes giveexcellentaccuracy-timetrade-off. 3.Tocomputesensitivitiesofindividualgateerrorstothe overalloutputerrorofacircuit,whichis usefulforselectiveredundancyapplication. 4.Weusetheestimatedoveralloutputerrorprobabilitiesa sadesignmatrixforcomparingequivalent designs. 5.Tocharacterizetheinputspaceforachievingadesiredou tputbehaviorintermsoferrortolerance byutilizingtheuniquebacktrackingfeatureofBayesianNe tworks. 74

PAGE 85

6.Forreliabilityenhancementbyapplyingselectiveredun dancytechniquesandtoevaluatereliability/redundancytrade-off. RationalforusingBayesianModel:WeproposetouseBayesianmodelingandinferencingforthef ollowingreasonsConventionallogiccomputingisinherentlycausal.Forexa mpleoutputsofagateisinuenced bytheinputswhereasinputsarenotchangedbytheoutputs.T hisparadigmofconventional computingtranslatesintoacausalprobabilisticmodelwhe rethecausalityforcestheprobabilistic modeltoa directedgraphstructure .Nano-CMOS,CNT,RTDandClockedQCAdoexhibitcausalowinp rocessinginformation.Certaindependenciesobservedincausalnetworksinduceda ndnon-transitivedependencies(more ontheoreticalformalismsection),arenotcapturedmostef fectivelyByBayesianNetworks[26] andmakesitasuperiormodelforcausalnetworksoverMRFand othernongraph-basedalgorithms.AsweknowalltheseprobabilisticmodelsareNP-hard,avera gecasecomplexityofBayesian Networkistheleastbecauseitnotonlymodelsthecondition alindependenciesbutexploitsit toitsadvantageinprobabilisticinference.Moreinterest ingly,theapproximateBNmodelshave foundfasterandeasierconvergenceandhaspotentialtoinf erforrealisticnetworksizes.LastbutnottheleastBayesianNetworkareuniquelycapable tosolvetheinverseproblemand arecapableofansweringquestionslike“Whatareprobabili sticinputprolethatguaranteeszero errorattheoutput?”.Thismakesthemostdominantprobabil istictoolfordiagnosis. 6.1ProbabilisticErrorModel Wecomputetheerrorprobability P(e i) =P(Y e iLY i=1)bymarginalizingthejointprobability functionovertheinputs,internallines,andtheoutputs 1 P(e i) = z 1; z N P(e ijz 1; z N)P(z 1) P(z N)(6.2) 1 Inthiswork, P(x)denotestheprobabilityoftheevent X=x ,i.e.P(X=x). 75

PAGE 86

Y E Z Z 16 21 e 20 e e 23 e Y 22 2 E 1 23 E 20 21 16 10 21 23 2 1 e X e e 20 X Z 19 Y e 22 C 22 10 19 Error-Encoded circuitIdeal logic circuit 22 23 21 20 E X X Y Figure6.1.(a)Conceptualcircuitrepresentationofthelo gicusedtodetecterrorsinvolvingtheerror-free logicandtheunreliablelogiccomponents(b)Thecorrespon dingBayesiannetworkrepresentation=P(z 1) P(z N) z x;x e P(e ijz 1; z N)(6.3)=P(z 1) P(z N)8z 8x; 8x e P(e i;y i;y ei;z 1; z N;x 1; x M;x e1; x eM)(6.4) whereEq.6.4showsthatthejointdensityfunctionthatisne cessarytocomputethedynamicerror exactly.Summingoverallpossiblevaluesofalltheinvolve dvariablesiscomputationallyexpensive (NP-hard),hencewerequireagraphicalmodelthatwoulduse thecausalstructureandconditionalindependencestoarriveattheminimaloptimallyfactorizedrep resentationofthisjointprobabilityfunction asaBayesiannetwork.6.1.1TheBayesianNetworkStructure Wemodel,boththeerrorfreelogicmodelandtheonewithdyna micerrors,asaDirectedAcyclic Graph(DAG).Thesetwomodels,whichwewillrefertoastheid eallogicmodelandtheerror-encoded model,arethenconnectedattheoutputsbycomparators.The comparatoroutputisthevariable E i=Y e iY i inEq.6.1.Thecomparatoroutputoflogic1indicatesthatth eideallogicmodelanderrorencodedmodeloutputsaredifferent.Theprobabilityofthe comparatoroutputsbeinginstate ”1” providestheoverallcircuiterrorprobability, P(E i=1). 76

PAGE 87

Figure6.1.(a)showsthe(conceptual)representationofae rrordetectioncircuitforasimplelogic involvingtwoNANDgates,representedbyblock C .Theotherblockinvolvesthesamelogic,butbuilt withunreliablecomponents.Thesegatesareassumedtohave gateerrorprobabilityof p .Theinputs toboththeblocksarethesame.Thetwooutputsareconnected totwocomparators.Theoutputof thecomparatorrepresentserrorincomputation.Notethatt hisisjustaconceptualrepresentation,we donotactuallyproposesynthesizingthecircuit.Fromthec onceptualcircuitdesign,wecanconstruct theBayesiannetworkrepresentation,whichwecalltheLIPE M-DAGmodel.EachnodeintheLIPEM isalineincircuitandthelinksdenoteaconnectionbetween thelinesviagates.Figure6.1.(b)shows theLIPEMcorrespondingtothecircuitinFigure6.1.(a).In therestofthissection,wepresentamore formaldenitionandprovethattherepresentation,thusob tained,isminimal. Denition: TheLogicInducedProbabilisticErrorModel(LIPEM)corres pondingtoacombinationalcircuit,where C isthecircuitandfpgareindividualgateerrorprobabilities,canbe constructedasfollows.Nodesarerandomvariableswithtwo values:0or1.Thereare6typesofnodes. fZg:Primaryinputs fX eg:Internalsignalswitherrorprobability p fXg:Internalsignalsunderideallogicalcondition. fY eg:Erroneousoutputsignals. fYg:Primaryoutputunderideallogicalcondition. fEg:Errornodesrepresentingtheerrorateachoutput. EdgesofLIPEMaredirectedanddenotedbytheorderedpair(u!v)where u causesa v .Theedgeset canbeclassiedasfollows: (Z!X e):Edgesbetweennodesrepresentingprimaryinputsandoutpu tsoferroneousgatesthat aredirectlyfedbytheprimaryinputs.(Z!X):Edgesbetweennodesrepresentingthesame primaryinputsandoutputsofcorrespondingideallogicgat es. 77

PAGE 88

(X e i!X e j),(X i!X j):Edgesbetweenrandomvariablesrepresentinginternalsig nals(Edgesfrom theinputofagatetothecorrespondingoutput).Ifthereisa nedge(X i!X j),thentheremust beamirroredge(X e i!X e j).Thesetwoedgesdifferintheconditionalprobabilitydisc ussedlater duringthequanticationofLIPEM.Edges(X!Y)and(X e!Y e) Edges(Y!E)andcorresponding(Y e!E)Theorem: TheLIPEM-DAGstructure,correspondingtothecombination alcircuit C isaminimal I-mapoftheunderlyingdependencymodelandhenceisaBayes iannetwork. Proof: Markovboundaryofarandomvariable v inaprobabilisticframework,istheminimalset ofvariablesthatmakethevariablevconditionallyindepen dentofalltheremainingvariablesinthe probabilisticnetwork. Letusordertherandomvariablesinthenodeset,suchthatfo reveryedge(u;v)inLIPEMDAG, u appearsbefore v .Withrespecttothisordering,theMarkovboundaryofanyno de, v2 ffZg ;X;Y; ffX eg ; fY eg ; fEggisgivenasfollows.If v representsaprimaryinputsignalline,thenits Markovboundaryisthenullset.And,sincethelogicvalueof anoutputlineisjustdependentonthe inputsofthecorrespondinggate(whethervisinffXg,orfX eg,orffYg,fY eg,orffEg)theMarkov boundaryofavariablerepresentinganoutputlineconsists ofjustthosevariablesthatrepresenttheinputstothatgate.ThusinLIPEMstructuretheparentsofeach nodeareitsMarkovboundaryelements. HencetheLIPEMisaboundaryDAG.NotethatLIPEMisaboundar yDAGbecauseofthecausalrelationshipbetweentheinputsandtheoutputsofagatethatisi nducedbylogic.Ithasbeenprovenin[26] thatifgraphstructureisaboundaryDAG D ofadependencymodel M ,then D isaminimalI-mapof M ([26]).Thistheoremalongwithdenitionsofconditionali ndependencies,in[26](weomitthedetails) speciesthestructureoftheBayesiannetwork.ThusLIPEMi saminimalI-mapandthusaBayesian network(BN). 78

PAGE 89

6.1.2BayesianNetworkQuantication LIPEM-BNthusconstructed,consistsofnodesthatarerando mvariableoftheunderlyingprobabilisticmodelandedgesdenotedirectdependencies.Allth eedgesarequantiedwiththecorresponding conditionalprobabilitiesoftheform p(x vjx parent(v i) ),where parent(v i)isthesetofnodesthathasdirectededgesto v i .Theseconditionalprobabilityspecicationsaredetermi nedbythegatetype.A completespecicationoftheconditionalprobabilityofat woinputANDgateoutputwillhave2 3 entriessinceeachvariablehas2states.Theedgesintheerror -encodedpartwouldbequantiedbylogic functionallowingforsomerandomness.Theconditionalpro babilityofanerror-freeANDgateandan unreliableANDgatewithgateerrorprobability p areshowninTable6.1.(a)and(b),respectively. 6.2ExperimentalResults WedemonstratetheexperimentalresultsusingLGSynth'93a ndISCAS'85benchmarkcircuits. Eventhoughthesebenchmarksarefornano-CMOS,logicalstr ucturesnamelyre-convergenceandinputoutputbehaviorsarefundamentaltomostnanodevices.Also ,wedonothavemediumsizestandard designstoshowtheefciencyofourmodelformostoftheemer gingdevices.Hencetostudythescalabilityandtime-accuracytrade-off,wechoosethesebench markcircuitsthatareusedinmanyprevious researchesandarewellunderstood. Weuseexactinferenceschemetoestimateoutputerrorproba bilitiesofsmallcircuitsfromLGSynth '93benchmarksandshowthattheruntimeandmemoryrequirem entforourmodelisordersofmagnitudelessincomparisonwithPTMbasedmodel[54].Theapprox imatecomputationoftheLIPEMusing BayesianNetworkswasdonebyatoolnamed”GeNIe”[67].Thet estswereperformedonaPentium IV,2.00GHz,WindowsXPcomputer. SmallCircuits: Intable6.2.,wereportthemaximumoutputerrorprobabilit iesofbenchmarkcircuits fordifferentgateerrorprobabilities.Column2,3and4giv ethemaximumoutputerrorprobabilities whengateerrorprobabilitiesare0.005,0.05and0.1respec tively.Incolumn5wereporttheelapsed timefortheerrorestimation.Fromthistable,wecanseetha tthedecodercircuithasthelowestoutput errorprobabilityandcircuitslike,xor5,malu4andparity havetheworsterrorcharacteristics.Forthe 79

PAGE 90

Table6.2.Outputerrorprobabilities[fromexactinferenc e] NodesinBN MaximumOutputerrorProbabilityforindividualgateerrorprobabilityp time(s) =0.005 =0.05 =0.1 c17 19 0.0148 0.1342 0.2398 0.0 parity 47 0.0699 0.3971 0.4824 0.0001 pcle 116 0.0179 0.1560 0.2702 0.07 decod 129 0.0068 0.0654 0.1251 0.14 cu 210 0.0232 0.1969 0.3327 0.56 pm1 225 0.0331 0.2627 0.4141 0.44 xor5 118 0.0925 0.4336 0.4900 0.26 alu4 222 0.0676 0.3906 0.4816 1.87 b9 392 0.0315 0.2475 0.3867 2.49 comp 415 0.0733 0.3828 0.4683 0.66 count 404 0.0203 0.1613 0.2632 1.14 malu4 280 0.0845 0.4253 0.4903 1.94 max at 70 0.0296 0.2151 0.3234 0.02 pc 261 0.0377 0.2794 0.4161 0.41 voter 134 0.0299 0.2178 0.3294 0.08 xorcircuit,evenagateerrorprobabilityof p=0:05isnotacceptable,becausewiththisgateerror probability,theoutputerrorprobabilityis0.4336,which isthehighestoverallcircuits. InTable6.3.,wecomparethetimeandspacecomplexityofour modelwiththoseofProbabilistic TransferMatrix[PTM]basedmethodproposedin[54].Column 2and3ofthistablegivetheruntime andmemoryrequirementofPTMmodel,wheresimulationswere performedusinga3GHzPentium 4processor.Column4and5givetheruntimeandmemoryrequir ementofourmodel.Weuseda 2GHzPentium4processor.Theseresultsshowtheeffectiven essofourmodelintermsofestimation timeandmemoryusage.ThetimeandspacecomplexityofBNbas edmodeldoesnotdependongate errorprobabilityvalues,whereasexperimentalresultsfo rm[54]showthattheruntimeandmemory requirementforPTMbasedmodelingarenotthesamefordiffe rentgateerrorprobabilities. Han etal. [56]hasgivenacomparisonbetweentheirPGMmodel(Probabi listicGateModel)and thePTMmodel.ThePGMmodel,whichisanapproximatemethodf orreliabilitymodelinginnanocircuits,givesreliabilityestimatesveryclosetotheres ultsfromtheexactPTMmodel.Computational complexityanalysisgivenin[56]showsthattimeandspacec omplexitywithPTMmodelsisexponen80

PAGE 91

Table6.3.ComparisonofmodelingusingBayesiannetworksa ndprobabilistictransfermatrix PTM[54] BN time(s) memory(MB) time(s) memory(MB) c17 0.076 0.003 0.0 0.096 parity 0.35 0.144 0.0001 0.14 pcle 74.9 24.2 0.07 0.34 decod 56.9 11.8 0.14 0.42 cu 93.87 10 0.56 2.28 pm1 7169 160 0.44 1.38 xor5 1337 57.3 0.26 2.48 tialwithrespecttothenumberofinputs and outputs,whereascomplexityofPGMmethodisexponential only tothenumberofcircuitinputsanditgrowslinearlywiththe numberofoutputs. ScalabilityoftheErrorModel: Weshowthattheerrormodelandassociatedcomputationssca les extremelywellwithcircuitsizebyshowingresultswithcir cuitsofvaryingsizes.Table6.4.shows theerrorprobabilitiesforvariousISCAS'85benchmarkcir cuitsforfourdifferentgateerrorsof p=0:01;0:001and0:0001.ReportedresultswereobtainedfromPLSwith1000samp les.Asexpected, allcircuitsexhibithigheroverallerrorasindividualgat eerrorincreases.Itcanbeobservedthatc499 haslowererrorgrowthoveralltheothercircuits.Formanyo fthebenchmarkcircuits(namelyc1908, c2670,c6288,c7552),theoutputerrorexceeds0.2forgatee rrorprobability0.01.Thisindicatesthat 0.01isunacceptablegateerrorprobabilityforthesecircu its.Additionalgatelevelredundancymaybe requiredforthesecircuits. Tovalidateourmodelweperformedanin-houselogicsimulat ionofthebenchmarkswith500,000 randomvectorsobtainedbychangingseedafterevery50,000 vectorsandobtainedtheaverageoutput errorprobabilitiesfordifferentgateerrorprobabilitie s.Simulationisdonebycomparingtheoutputs fromtheideallogicblockwithcorrespondingoutputsfroma nerroneouslogicblock.Bothidealand erroneouslogicblocksarefedbythesameprimaryinputs.To simulatetheeffectoferroneousoperation, weinjectanadditionalinput, I f toeachgateinthefaultylogicblocks.Ifthisinputtoagate inthe erroneouslogicblockholdslogicvalueone,thegateisoutp utsawrongsignal.If I f toagateislogic zero,itbehavesasanidealgate.Probabilityof I f beingatlogiconeisthegateerrorprobability p .Note thatinasinglesimulationrun,theinput I f toanygateintheerroneouslogicblockcanbeeitherzero 81

PAGE 92

Table6.4.OutputerrorprobabilitiesforISCAS'85circuit s-PLSwith1000samples No.ofgates MaximumOutputerrorProbabilityforindividualgateerrorprobabilityp =0.01 =0.001 =0.0001 c432 160 0.149 0.021 0.004 c499 202 0.04 0.006 0.002 c880 383 0.205 0.03 0.003 c1355 546 0.077 0.014 0.003 c1908 880 0.331 0.14 0.15 c2670 1193 0.395 0.27 0.237 c3540 1669 0.503 0.502 0.535 c5315 2307 0.389 0.643 0.011 c6288 2416 0.523 0.202 0.037 c7552 3512 0.5 0.15 0.031 Table6.5.ComparisonofBayesiannetworkmodelingandlogi csimulation NodesinBN 1000Samples 10000samples e T1(s) e T2(s) c432 475 0.0020 0.234 0.0020 2.140 c499 565 0.0010 0.344 0.0003 3.260 c880 956 0.0010 0.844 0.0004 7.876 c1355 1253 0.0020 1.141 0.0007 10.454 c1908 2172 0.0070 2.187 0.0060 18.984 c2670 3097 0.0104 3.205 0.0009 26.457 c3540 4038 0.0700 4.547 0.0700 36.249 c5315 6247 0.0077 7.844 0.0160 56.971 c6288 4896 0.0065 5.672 0.0022 43.062 c7552 8398 0.0650 11.64 0.0640 78.688 orone,dependingonthegateerrorprobability p andtheoutputoftherandomnumbergeneratorwhich determinesthestateofthisinput.Fig.6.2.(a)isasmalldy namicerrorsimulationcircuit.Fig.6.2.(b)is thetruthtableofanandgateintheerroneouslogicblockoft hesimulationmodel. InTable6.5.,wecomparesimulationresultswithBayesiann etworkresults.Wereporttheaccuracy ofourmodelintermsofaverageerror, e ,betweentheexactoutputerrorprobabilitiesandtheestim ated outputerrorprobabilitiesobtainedfromPLSwith1000and1 0,000samples.Column2ofthistablegives thenumberofnodesintheBayesiannetwork.Welist e andelapsedtimeforPLSwith1000samples incolumn3and4respectively.Column5and6ofthistablesgi ves e andelapsedtimewith10,000 samples.Column7givesthelogicsimulationtime.Meanesti mationerrorwith1000samplesis0.017 82

PAGE 93

1 f f Erroneous Output Correct Output 19 10 C Ideal Logic Block 16 T1T2 nand_f nand_f (a) (b) I f I f 22 f 23 f 23 22 Logic Block With Error B A O 0 0 0 0 0 1 11 1 I 1 00 01111 000 1 0 1 1 1 Truth Table nand_f 0 0 0 1 nand_f O A B 0 1 I Figure6.2.(a)Dynamicerrorsimulationcircuit(b)Trutht ableofaNANDgateintheerroneouslogic blockandwith10000samplesis0.016.Averageestimationtimewit h1000samplesand10,000samples are3.76secondsand28.41secondsrespectively,whereasav eragesimulationtimeis161.43seconds. Theseresultsshowtheeffectivenessofourmodelintermsof accuracyandestimationtime.Weseethat estimationtimevarieslinearlywithnumberofsamples,whe reasaverageestimationerrorisalmostthe samewith1000and10000samplesformostofthecircuits.Wit hPLS1000samples,circuitc2670gave thelowestestimationerrorof0.01andcircuitsc2670andc5 315gavethehighestestimationerrorof 0.07overallcircuits.WithPLS10000samples,againc2670g avelowest e of0.0009andc3540gave highest e of0.07. ErrorSensitivity: Wecomputesensitivityofoutputerrortoaparticulargatew itha D p changeingate error,restofthegateshavingzerogateerrorprobability. Thisisusefulfordeterminingtheapplication ofredundancyinachievingnano-domainfaulttolerance.In Table6.6.,wetabulatetheoutputerror probabilitycorrespondingtogate22and23ofbenchmark c 17,whentheerrorprobabilitiesofgates 10;11;16;and 19changeby0.1.Itisclearthatoutputerroratgate22ismos tsensitivetogate16.We alsoplotsensitivityofafewindividualgatesforbenchmar kc880inFigure6.3.(a).Asitcanbeseen 83

PAGE 94

Table6.6.Errorsensitivityforc17 ErrorProbabilityatoutputs Gate d p 22 23 D p(X10)=0.1 0.0625 0 D p(X11)=0.1 0.0375 0.075 D p(X16)=0.1 0.075 0.0625 D p(X19)=0.1 0 0.0625 c880 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 X880X879X878X874X866X863X850 Output nodesError Probability ¨ p(X853)=0.1 ¨ p(X651)=0.1 ¨ p(X750)=0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 00.050.10.150.2 Gate Error Probability (p)Output Error Probability c499 c1355 (a) (b) Figure6.3.(a)Sensitivityofoutputerrorprobabilitywit hrespecttoindividualgateerrorsforc880(b) Outputerrorprolesoftwoalternativelogicimplementati on(c499andc1355) thatoutputerroratnode880issensitiveonlytogate853out ofthethreeintermediategatesshownhere andgateoutputerroratnode863isreasonablysensitivetob othnode651andnode750. DesignSpaceExploration: Figure6.3.(b)showsthevariationinoutputerrorwithgate error p for twoISCASbenchmarkc499andc1355thatarelogicallyequiva lent.Weseethatc499isclearlyabetter designofthelogicfornano-domainintermsofresistanceto dynamicerrors.Thecircuitc1355ismore sensitivetodynamic-erroralmostforallindividualgatee rrorprobabilities,speciallywhenp=0.1,the outputerrorprobabilitybecome0.4asopposedto0.25forc4 99.Theexpectedoutputdynamicerror canbeused,alongwithotherdesignmeasuressuchaspoweran dareafornano-domaincircuits. InputSpaceExploration: Inthissection,weexploreinputcharacteristicsforadesi redoutputbehavior.Namely,whatshouldbeinputprobabilities,entrop y,likelihoodetcforsuchadesiredbehavior. 84

PAGE 95

0.85 0.9 0.95 1 1.05 1.1 1.15 12367 Primary InputsInput Likelyhood Ratio [Prob(Xi=1)/Prob(Xi=0)] p = 0.01 p = 0.1 p=0.3 0.8 0.85 0.9 0.95 1 1.05 1.1 12367 Primary InputsInput Likelyhood Ratio [Prob(Xi=1/Prob(Xi=0)] p = 0.01 p = 0.1 p=0.3 (a) (b) Figure6.4.c17-Inputspacecharacterizationbylikelihoo dratiofor(a)zeroerroratoutputnode22(b) forzeroerroratoutputnode23Manyoftheaboveresultsintheprevioussubsectioniscondu ctedunderrandominputs.However,inputsthemselvesmightaggravateoreliminatesomeoftheser iousproblemsthatweface.Thereare twokeycomponentsforsuchcharacterization.Howdoweprop agateprobabilisticallyfromadesired outputcharacteristic(canbeanynodereally)?Whichinput characteristicswouldbebenecialforthe designers?Theanswertothequestionliesintheuniqueback -trackingcapabilityofBayesiannetwork basedmodels.Infact,BNmodelsareuniquefornotonlypredi ctiveinferencebuttheirmainapplicationisindiagnosis.Weshowthatexactinferenceschemecan beusedforinputspacecharacterization ofsmallcircuitsandapproximateinferenceschemeformedi umsizedbenchmarkcircuits.Thelogic samplingprovidesinaccurateestimatesforbacktrackinga sdiscussedinChapter3.1.Henceweresort toEPISbasedinferenceforprobabilisticdiagnosis.Weuse likelihoodoflogiconeasaninputbehavior. Designersarefreetouseanyothermetricstocharacterizet heinputs. InputSpace-ExactinferenceWecharacterizetheinputspaceofc17benchmarkforobtaini ngsomespecicoutputbehavior giventhedynamicerrorprobabilityofindividualgates.We setsomespecicoutputerrorprobabilities asevidenceintheBNmodelandbackpropagatetheseprobabil itiestoobtainthecorrespondinginput probabilities.WeuseHugintoolforexactinference.InFig 6.4.(a)and(b),weplottheinputspacein 85

PAGE 96

termsoflikelihoodratio Prob(X i=1) =Prob(X i=0)forprimaryinputs1,2,3,6and7bysetting(a) errorprobabilityofoutput22atzeroand(b)errorprobabil ityofoutput23atzero,respectively.Weplot thegraphsforthreedifferentgateerrorprobabilities, p=0:01, p=0:1,and p=0:3.Likelihoodratio isequaltooneiftheinputprobabilityis0.5whichindicate srandomnessoftheinput.FromFig.6.4.(a) and(b),itcanbeseenthat,withindividualgateerrorproba bility p=0:01,alltheinputshavelikelihood ratioequalto,orclosetoone.Inotherwords,randominputs arelikelytoproducezerooutputerror probability,ifindividualgateerrorprobabilityis p=0:01.Thusitcanbeconcludedthatifwehaveno informationabouttheinputs,i.e.iftheactualinputspace ofthiscircuitisrandom,wedon'thaveto applyanydynamicerrormitigationtechniquetothegatesin thecircuitsincethecircuithasin-builterror tolerance.Asthegateerrorprobabilityincreases,wecans eethatinputlikelihoodmovesfurtheraway fromone,indicatingthatthecircuitrequiressomeerrormi tigationschemesiftheactualinputpattern isdifferentfromthepatternshowninthegraph.Forexample ,withgateerrorprobability p=0:3,in ordertohavenoerroratoutput22,input1shouldhavemoreon esthanzeros.Iftheactualinputpattern hasmorezerosthanonesforthisinput,weshoulddenitelya pplyadditionalredundancytechniquesto makethecircuiterror-free.FromFig.6.4.(a),itiseviden tthatinputs1and2followalmostsamepattern asgateerrorprobabilityincreases,whereasinput6follow sanoppositepattern.Likelihoodofinput6 beingatlogic0increasesasthegateerrorprobabilityincr easesfrom0.01to0.3inordertogetzero outputerroratnode22.Henceintheactualinputspaceofthe circuit,ifinput6hasmore1valuesthan 0's,circuitrequiresmorestringentredundancytechnique sandvice-versa.FromFig.6.4.(b),itcanbe seenthatlikelihoodofinputs2and7beingatlogiconeincre asewithincreaseingateerrorprobability, whereasinputs3and6exhibittheoppositebehavior.Input1 haslikelihoodratioequaltooneforall gateerrorprobabilitieswhichshowsthatthelogiclevelof thisinputdoesnotinuencetheoutputerror probabilityatnode23foranygateerrorprobabilities.Sin ce c 17isaverysmallbenchmarkcircuit,the effectofinputsontheoveralloutputerrorisnotveryimpor tant.Todemonstratetheusefulnessofour model,weexperimentinputspacecharacterizationonmid-s izebenchmarks,whichisexplainedbelow. InputSpace-ApproximateInferenceTohandlemediumsizedbenchmarkcircuits,weusetheapprox imateinferencescheme,namely EPIS,forinputspacecharacterization.Inthefollowingth reesetofexperimentsonc432andc1908,we 86

PAGE 97

c432 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 46789101112161718202122283134 Primary InputsInput Likelyhood Ratio [Prob(Xi=1)/Prob(Xi=0)] p=0.01 p=0.05 p=0.1 c19080 1 2 3 4 5 6 7 8 9 101 3 5 7 9 1 1 1 3 1 5 17 19 2 1 2 3 2 5 2 7 2 9 3 1 3 3Primary InputsInput Likelyhood Ratio:Prob(Xi=1)/Prob(Xi=0) p=0.01 p=0.05 Note: No input space possible for error-free operation, i f p =0.1 Hence p=0.1 is a hard upper bound (a) (b) Figure6.5.Inputspacecharacterizationbylikelihoodrat ioforbenchmarks(a)c432(b)c1908 makethegate-errorprobabilityofallgatesinthecircuitt o0.01andforcedalltheoutputerrorprobabilitiestozero(alsotermedevidence),thenweback-trackand ndouttheinputprobabilitiesthatwillgive thedesiredoutputcharacteristicofzeroerror.Theseinpu tprobabilitiesarethenusedtocalculateinput likelihoodratios.InFigure6.5.(a),weplotthelikelihoo dratios Prob(X i=1) = (Prob(X i=0)ofselected inputofbenchmarkc432.Asitcanbeseenthatforthiscircui t,withgateerrorprobability p=0:01,the likelihoodisclosetooneformostoftheinputs.Thisimplie sthattohavenooutputerror,primaryinputs wouldhaveequalprobabilityofbeingatlogic0oratlogic1, ifthegateerrorprobability p=0:01.We thenrepeatthisbyincreasingtheindividualgateerrorpro babilityvaluestop=0.05andthentop=0.1. Itisclearthatinputlikelihoodsarefurtherawayfromonei nbothdirections,forgateerrorprobability p=0:1.Forexample,inputs6and10havelikelihoodratio,closet o1.5forachievingzerooutputerror withagateerrorprobability p=0:1,indicatingthatprobabilityoftheseinputsbeingatlogi coneshould behigherthanthatofbeingatlogiczero.Henceitisclearth atiftheseinputshavemore0'sthan1'sin theactualinputpattern,thenthecircuitislessresistant todynamicerrorsandneedsmitigationschemes tomakeiterror-tolerant.However,inputs4and12havelowl ikelihoodratios,closeto0.7,forgate errorprobability p=0:1.Circuitbehaviorwithrespecttotheseinputsareexactly oppositetothatwith respecttoinputs6and10.Wecanalsoobservethatsomeinput slike,4,8,9,11,12,20and21have likelihoodratiolessthanoneforallthreegateerrorproba bilitieswhereassomeotherinputslike,6,10 and22alwayshavelikelihoodratiogreaterthanone.Fromth eseobservations,itcanbeconcludedthat 87

PAGE 98

ifinputsinthe rst categoryhavemorenumberof zeros than ones andinputsinthe second category havemorenumberof ones than zeros ,intheinputpattern,thiscircuitpossessesin-builterro rtolerance withrespecttodynamicerrorsanddoesn'trequireseverere dundancytechniques.However,iftheinput patternisdifferentfromabove,thecircuitgiveserroneou soutputsanddesignershavetoincorporate errortolerantschemesforreliableoperation. Theseexperimentsarerepeatedforc1908andweobservethat mostinputshavealikelihoodofone forp=0.01asseeninFig.6.5.(b)Forp=0.05,wehavemanyinp utswithlikelihoodmuchhigherthan oneindicatingthatlogiconeatthoseinputsismoreprobabl etocausezerooutputerrors.Iftheactual inputpatternshavemorezeros,thecircuitwillgiveerrone ousoutputsandhencegatesaretobetargeted forredundancyapplication.Wecanseeanotherimportantre sultfromFig.6.5.(b)Withp=0.1,noinput vectorscanaccountforazerooutputerror,whichmeansthat noinputcombinationcangivezerooutput errorifthegateerrorprobabilityreaches0:1.Hence p=0:1isanupperboundongateerrorprobability, beyondwhichreliablecomputationisimpossible.Weneedto haveeffectiveerrormitigationschemes forcircuitc1908toachievereliablecomputingiftheindiv idualgateerrorprobabilityis0.1orabove. ReliabilityEnhancement :Theeffectsofdynamicerrorsinlogiccircuitscanbemitig atedbyapplicationofvariouserror-toleranttechniquestherebyachie vingagivenlevelofreliability.Applicationof redundancytoallgatesinacircuitwillresultinveryhigha reaoverheadandexcesspowerdissipation inadditiontoincreasedcost.Weneedtochoosegatesthatar ehighlysensitivetodynamicerrorsand applyselectiveredundancymeasurestoachievetrade-offb etweenredundancy,reliability,areaoverhead andcost. Werstestimatedynamicerrorsensitivitiesofselectedga tesinacircuit.Gateswhicharelogically closertotheprimaryoutputsareconsideredtobemoresensi tivetodynamicerrorsthanthosegates whichareathigherlogicaldepthsfromprimaryoutputs.Hen cewechoosegateswhicharewithina speciclogicaldepthfromprimaryoutputs.Wecomputeerro rsensitivitiesofeachoftheseselected gatesandthencomputepercentagereductioninoveralloutp uterrorprobabilitiesbyapplyingselective redundancytohighlysensitivenodes.Thisisexplainedbel ow:Step1:Estimatetheoveralloutputerrorprobabilitieswit hagivendynamicerrorprobability(say p=0:01)forALLgatesinthecircuit( fP E 1;P E 2; ;P E Ng ). 88

PAGE 99

Step2:Giveahighererrorprobability(say p=0:05)forthegatewhoseerrorsensitivityisto bedeterminedkeepingallothergateerrorprobabilitiessa measbefore(0.01)andestimatethe overalloutputerrorprobabilities( fP0E 1;P0E 2; ;P0E Ng ).Step3:Computethedifferenceinestimatedoutputerrorpro babilitiesfromstep1andstep2(D P= fjP E 1P0E 1j ; jP E 2P0E 2j ; ; jP E NP0E Njg ).Step4:DeterminethesetofgatesfSgwhichgivemax(D P) athresholdvalue(d p th).These gatesaretobeselectedforapplicationofredundancy.Valu eof d p th ischosendependingonthe desirederrortolerance.Step5:Weestimateoveralloutputerrorprobabilitiesbygi vingalowdynamicerrorprobability (say p=0:001)forgatesselectedforredundancy(gates2 fSg)andoriginalerrorprobability p=0:01forallothergates( gates= 2 fSg)inthecircuit.Wecomparethesevalueswithoutputerrorprobabilitiesfor100%redundancy(bygiving p=0:001; 8gates )andcomputereliability/redundancytrade-offintermsofpercentagereduction inaverageoutputerrorprobabilities. Resultsforcircuitsc17andc432inTable6.7.andTable6.8. respectively.Column1ofthese tablesgivesthe d p th valuesandcolumn2givesthepercentageofgatesselectedfo rredundancyfor each d p th .Incolumn3,wereportthepercentagereductioninoutputer rorprobabilitiesbyapplying selectiveredundancy.Numberofgatestobeselectedforred undancyapplicationdependsontheerror threshold d p th .Withdecreasein d p th ,morenumberofgatesareselectedforredundancy,thusredu cing theoutputerrorprobability.Forexample,fromTable6.8., itcanbeseenthatbydecreasing d p th from 0.020to0.001,weachievereliabilityimprovementfrom11% to60%measuredintermsofreduction inoutputerrorprobability.Thusourmodelingtoolcanbeus edforchoosingthegatestobeselectedfor applicationoferror-toleranttechniquesandalsofordete rminingtheamountofsuchtechniquesrequired forachievingadesirederrortolerance. Wepresentedanexactprobabilitymodel,basedonBayesiann etworks,tocapturetheinter-dependent effectsofdynamicerrorsateachgate.Dynamicerrorateach gateismodeledthroughtheconditional probabilityspecicationsintheBayesiannetwork.Theexp ectedoutputerrorcanbeusedtoselect 89

PAGE 100

Table6.7.c17-Selectiveredundancy d p th %ofGates PercentageReductionin Selectedfor AverageOutputError Redundancy Probability 0.025 60% 69.91 0.000 100% 89.80 Table6.8.c432-Selectiveredundancy d p th %ofNodes PercentageReductionin Selectedfor AverageOutputError Redundancy Probability 0.020 8% 11% 0.015 14% 17% 0.010 24% 32% 0.005 37% 50% 0.001 59% 60% 0.000 100% 88% betweendifferentimplementationofthesamelogicalfunct ion,soastoresultinreliablenano-level circuits. Wetheoreticallyprovedthatthedependencymodelforproba bilisticgateerrorsisaBayesianNetworkandhenceisanexact,minimalrepresentation.Weusede xactinferenceschemeforsmallcircuits andshowthatourmodelisextremelytimeandspaceefcientc omparedtoexistingPTMbasedtechnique.Tohandleevenlargerbenchmarks,weusedtwostochas ticsamplingalgorithms,PLSandEPIS. Eventhoughmanyofthefuturistictechniquesdonotyetoffe rbenchmarks,wedemonstratescalability ofourestimationtoolbyusingthelogiclevelspecication softheISCAS'85benchmarks,whichwould bevalidforuncertaintymodelinginnano-CMOS.Westudiedt heinputcharacteristicsoflogiccircuits foradesiredoutputbehaviorbyexploringtheuniqueback-t rackingfeatureofBayesiannetworks.By conductingasensitivityanalysisofgatesinthecircuits, wewereabletoidentifygatesthataffectthe overallcircuitbehaviorduetodynamicerrorsandpresenta techniquetodeterminetheamountofredundancytobeappliedforachievingadesiredimprovementi ncircuitperformance. Wearecurrentlyworkingonmodelingdynamicerrortolerant designsbyapplyingTMRredundancy onselectednodeshavinghighdynamicerrorsensitivitiesb asedontheirswitchingcharacteristicsand 90

PAGE 101

otherdevicefeatures.Weintendtopursuethedynamicerror forQCAlogicblocksandmodelinterconnectgateerrorinfuture.Arelativestudyonvarioustec hnologieslikeCNT,RTDwithrespectto dynamicerrorwouldbeeasilyextendedbythisframework.Th eothermoredifculttaskwouldbeto handlesequentiallogicintermsofdynamicerrors. 91

PAGE 102

CHAPTER7 CONCLUSIONANDFUTUREWORK Inthisdissertation,wemodelandanalyzethreemajorclass esoffaults/errorswhicharesignicant inlogicnetworksindeep-submicron,sub-100-nanometeran dfuturenanometerregimes.Wealsostudy howreliabilitycanbeenhancedbytheapplicationefcient errormitigationschemes.Wedeveloped probabilisticmodelsbasedonBayesianNetworksfortheest imationandanalysisofdifferenttypesof errors.Wesummarizebelowtheimportantresultsfromthisd issertation: 1.LIFE-DAG:LogicInducedFaultEncodedDirectedAcyclicG raphwhichisanaccurateandefcientmodelforestimationofFaultDetectionProbabilitie s. 2.TALI-SES:ATimingAwareProbabilisticModelforSingleE ventUpsetSensitivity,whichisan exactprobabilisticmodelfortheestimationofsingleeven tupsetsensitivitiesofindividualgates inlogiccircuitsthatcapturestheeffectoflogicalmaskin g,inputs,gatedelays,SEUdurationand circuitre-convergence. 3.LIPEM:LogicInducedProbabilisticErrorModelforaccur ateandefcientestimationofoverall circuitoutputerrorduetodynamicerrors/inherenterrors inlogicgates. 4.Inputspacecharacterizationoflogiccircuitsforadesi redoutputbehaviorbyexploringtheunique backtrackingfeatureofBayesiannetworks. 5.Analysisoftheeffectofselectiveredundancytechnique soncircuitreliabilityandcalculationof reliability/redundancytrade-off. Itisoneofthefutureresearchdirectionstodevelopacompr ehensivemodelforsofterroranalysisby integratingtheelectricalmaskingeffect,latchingwindo wmaskingeffectandalsotheSEUgeneration andpropagationcharacteristicsofindividualgateswitho urcurrentapproach.Anotherfutureworkis 92

PAGE 103

tomodeldynamicerrortolerantdesignsbyapplyingTMRredu ndancyonselectednodeshavinghigh dynamicerrorsensitivitiesbasedontheirswitchingchara cteristicsandotherdevicefeatures.Weintend topursuethedynamicerrorforQCAlogicblocksandmodelint erconnectgateerrorinfuture.Arelative studyonvarioustechnologieslikeCNT,RTDwithrespecttod ynamicerrorwouldbeeasilyextendedby thisframework.Theothermoredifculttaskwouldbetohand lesequentiallogicintermsofdynamic errors.Thereisfuturescopefordevelopingamacromodelfr omourgatelevelmodelsthatcanbeused athigherlevelsofabstraction.Anotherresearchdirectio nistheestimationoferrorboundsthatare circuit-specic. 93

PAGE 104

REFERENCES [1]SemiconductorIndustryAssociation,InternationalTe chnologyRoadmapforsemiconductors, 2005. [2]C.Zhao,Y.ZhaoandS.Dey,“Constraint-awarerobustnes sinsertionforoptimalnoise-tolerance enhancementinVLSIcircuits”, DesignAutomationConference ,pp.190–195,June2005. [3]N.Shanbhag,K.SoumyanathandS.Martin“Reliablelowpo werdesigninthepresenceofdeep submicronnoise”, InternationalSymposiumonLowPowerElectronicsandDesig n ,pp.295– 302,2000. [4]C.Zhao,X.BaiandS.Dey,“Ascalablesoftspotanalysism ethodologyforcompoundnoiseeffects innano-metercircuits,” ProceedingsofDesignAutomationConference ,pp.894–899,Jun.2004. [5]J.vonNeumann,“Probabilisticlogicsandthesynthesis ofreliableorganismsfromunreliable components,”in AutomataStudies (C.E.ShannonandJ.McCarthy,eds.),pp.43–98,Princeton Univ.Press,Princeton,N.J.,1954. [6]N.Pippenger,“ReliableComputationbyFormulasintheP resenceofNoise”, IEEETranson InformationTheory ,vol.34(2),pp.194-197,1988. [7]W.Heidergott,“SEUtolerantdevice,circuitandproces sordesign”, DesignAutomationConference ,pp.5–10,June2005. [8]S.Borkar,T.KarnikandVivekDe,”Designandreliabilit ychallengesinnanometertechnologies”, D esignAutomationConference,2004. [9]Y.S.Dhillon,A.U.DirilandA.Chatterjee,“Soft-Error ToleranceAnalysisandOptimization ofnanometercircuits,” ProceedingsofDesign,AutomationandTestinEurope ,Volume:1,pp. 288–293,Mar.2005. [10]M.L.BushnellandV.D.Agrawal,”EssentialsofElectro nicTestingforDigital,Memory,and Mixed-SignalVlsiCircuits,”Boston:Springer,2005. [11]S.C.Seth,L.Pan,andV.D.Agrawal,“Predict-probabil isticestimationofdigitalcircuittestability,” IEEEInternationalSymposiumonFault-TolerantComputing ,pp.220–225,Jun.1985. [12]H.J.Wunderlich,“PROTEST:AToolforprobabilisticte stabilityAnalysis,” Proceedingsofthe 22ndIEEEDesignAutomationconference ,vol.14-3,pp.204–211,1985. [13]R.Krieger,B.BeckerandR.Sinkovic,“ABDD-basedAlgo rithmforcomputationofexactfault detectionprobabilities,” DigestofPapers,Fault-TolerantComputing ,vol.22-24,pp.186–195, June1993. 94

PAGE 105

[14]K.MohanramandN.A.Touba,”Cost-EffectiveApproachf orReducingSoftErrorFailureRatein LogicCircuits,” InternationalTestConference ,pp.893–901,2003. [15]T.KarnikandP.Hazucha,”Characterizationofsofterr orscausedbysingleeventupsetsinCMOS processes,” IEEETransactionsonDependableandSecureComputing ,Volume:1-2,pp.128–143, Apr-Jun.2004. [16]V.Degalahal,R.Rajaram,N.Vijaykrishan,Y.XieandM. JIrwin,”Theeffectofthresholdvoltages onsofterrorrate,” 5thInternationalSymposiumonQualityElectronicDesign ,March2004. [17]M.ZhangandN.R.Shanbhag,“ASoftErrorRateAnalysis( SERA)Methodology” International ConferenceonComputerAidedDesign ,November,2004. [18]N.Seifert etal. “ImpactofScalingonSoft-ErrorRatesinCommercialMicrop rocessors” IEEE TransactionsonNuclearScience ,Volume:49,No.6,pp.3100–3106,Dec.2002. [19]P.Hazucha etal. “MeasurementandAnalysisofSER-TolerantLatchina90-nmD ualVT CMOSProcess” IEEETransactionsonSolid-StateCircuits ,Volume:39,No.9,pp.1536–1543, Sept.2004. [20]R.IrisBahar,J.Mundy,andJ.Chan,“AProbabilisticBa sedDesignMethodologyforNanoscale Computation”, InternationalConferenceonComputerAidedDesign ,pp480–486,2003. [21]R.Martel,V.Derycke,J.Appenzeller,S.Wind,andPh.A vouris,“CarbonNanotubeField-Effect TransistorsandLogicCircuits,” Proceedingsofthe39thConferenceonDesignAutomation ,2002. [22]J.Chen,J.Mundy,I.Bahar,andJ.M.Xu,“Fault-toleran ceCarbonNanotube-basedComputer LogicDesign,” ForesightConferenceonMolecularNanotechnology ,Oct.2003. [23]J.P.Sun,G.I.Haddad,andP.Mazumder,“ResonantTunne lingDiodes:ModelsandProperties” ProceedingsoftheIEEE ,vol.86-2,pp.641–661,April1998. [24]R.K.Kummamuru,A.O.Orlov,R.Ramasubramanyam,C.S.L ent,G.H.Bernstein,andG.L. Snider,“OperationofQuantum-DotCellularAutomata(QCA) ,ShiftRegistersandAnalysisof Errors” IEEETransactionsonElectronDevices ,vol.50-59,pp.1906–1913,June1993. [25]M.B.Tahoori,M.Momenzadeh,andJ.Huang,andF.Lombar di,“DefectsandFaultsinQuantumCellularAutomataatNanoScale” WorkshoponFaultToleranceinParallelandDistributed Systems ,2004. [26]J.Pearl,“ProbabilisticReasoninginIntelligentSys tems:NetworkofPlausibleInference,”Morgan KaufmannPublishers,Inc.,1988. [27]URLhttp://www.hugin.com.[28]M.Henrion,“Propagationofuncertaintybyprobabilis ticlogicsamplinginBayes'networks,” UncertaintyinArticialIntelligence ,1988. [29]W-B.JoneandS.R.Das,“CACOP-arandompatterntestabi lityanalyzer,” IEEETransactionson ManandCybernetics ,vol.25-5pp.865–871,1995. 95

PAGE 106

[30]R.Pathak,“Ageneralizedalgorithmforboundingfault detectionprobabilitiesincombinational circuits,” AUTOTESTCON,ProceedingsofIEEESystemsReadinessTechno logyConference ,vol. 20-23,pp.683–689,Sep.1993. [31]J.SavirandG.S.Ditlow,andP.H.Bardell,“Randompatt erntestability,” IEEETransactionson Computers ,vol.C-33,pp.79–90,Mar.1984. [32]S.ChakravartyandH.B.hunt,III,“Oncomputingsignal probabilityanddetectionprobabilityof stuck-atfaults,” IEEETransactionsonComputers ,vol.39-11,pp.1369–1377,Nov.1990. [33]H.Farhat,A.LioyandM.Pocino,“Computationofexactr andompatterndetectionprobability,” ProceedingsofIEEECustomIntegratedCircuitsconference ,vol.9-12,pp.26.7.1–26.7.4,May 1993. [34]S.BhanjaandN.Ranganathan,“Dependencypreservingp robabilisticmodelingofswitchingactivityusingBayesiannetworks” DesignAutomationConference ,June2001. [35]S.Bhardwaj,S.B.K.VrudhulaandD.Blaauw,“ t AU:Timinganalysisunderuncertainty” InternationalConferenceonComputerAidedDesign ,Nov.2003. [36]P.D.Wiley,”FaultTolerantDesignVericationThroug htheUseofLaserFaultInjection”, Master'sThesis ,UniversityofSouthFlorida,Feb.2004. [37]F.J.Falquez,”FaultToleranceandTestabilityDesign WithValidationbyLaserfaultInjectionTo ImprovethePerformanceandTestabilityofAdvancedVLSIC” PhDDissertation ,Universityof SouthFlorida,Dec.1998. [38]S.Mitra,T.Karnik,N.SeifertandM.Chang,“Logicsoft errorsinsub-65nmtechnologiesdesign andCADchallenges”, DesignAutomationConference ,pp.2–4,June2005. [39]D.Alexandrescu,L.AnghelandM.Nicolaidis,“NewMeth odsforEvaluatingtheImpactofSingle EventTransientsinVDSMICs,” Proc.DefectandFaultToleranceSymposium ,pp.99–107,2002. [40]P.Shivakumar,etal.,“ModelingtheEffectofTechnolo gyTrendsontheSoftErrorRateofCombinationalLogic,” Proc.InternationalConferenceonDependableSystemsandN etworks ,pp.389– 398,2002. [41]M.Violante,“AccurateSingle-Event-TransientAnaly sisviaZero-DelayLogicSimulation,” IEEE TransactionsonNuclearScience ,Vol.50,No.6,pp.2113–2118,2003. [42]Q.ZhouandK.Mohanram,“Cost-effectiveradiationhar deningtechniqueforcombinational logic”, InternationalConferenceonComputerAidedDesign ,pp.100–106,Nov.2004. [43]P.K.Samudrala,J.RamosandS.Katkoori,“SelectiveTr ipleModularRedundancy(STMR)Based Single-Event-Upset(SEU)TolerantSynthesisforFPGAs,” IEEETransactionsonNuclearScience Vol.51,No.5,Oct.2004. [44]M.Omana,G.Papasso,D.Rossi,C.Metra,“AModelforTra nsientFaultPropagationinCombinatorialLogic,” On-LineTestingSymposium ,pp.111-115,2003. 96

PAGE 107

[45]P.HazuchaandC.Stevenson,”ImpactofCMOStechnology scalingontheatmosphericneutron softerrorrate,” IEEETransactionsonNuclearScience ,Volume:47-6,pp.2586–2594,Dec.2000. [46]S.WinogradandJ.D.Cowan, ReliableComputationinthePresenceofNoise .TheMITPress, 1963. [47]S.SpagocciandT.Fountain,“Faultratesinnanochipde vices,”in ElectrochemicalSociety pp.582–593,1999. [48]J.HanandP.Jonker,“Adefect-andfault-tolerantarch itecturefornanocomputers,” Nanotechnology ,vol.14,pp.224–230,2003. [49]K.Nikolic,A.Sadek,andM.Forshaw,“Fault-tolerantt echniquesfornanocomputers,” Nanotechnology ,vol.13,pp.357–362,2002. [50]G.Norman,D.Parker,M.KwiatkowskaandS.K.Shukla,“E valuatingthereliabilityofdefecttolerantarchitecturesfornanotechnologywithprobabili sticmodelchecking”, InternationalConferenceonVLSIDesign ,2004. [51]J.B.Gao,YanQiandJ.A.B.Fortes,“BifurcationsandFu ndamentalErrorBoundsforFaultTolerantComputations” IEEETransactionsonNanotechnology ,vol.4-4pp.395–402,July2005. [52]J.Han,J.B.Gao,P.Jonker,YanQiandJ.A.B.Fortes,“To wardhardware-RedundantFaultTolerantLogicforNanoelectronics” IEEETransactionsonDesignandTestofComputers ,vol.224pp.328–339,July-Aug.2005. [53]J.B.Gao,Y.QiandJ.A.B.Fortes,“MarkovChainsandPro babilisticComputation-AgeneralFrameworkforMultiplexedNanoelectronicSystems” IEEETransactionsonNanotechnology vol.4pp.194–205,July2005. [54]S.Krishnaswamy,G.S.Viamontes,I.L.Markov,andJ.P. Hayes,“AccurateReliabilityEvaluation andEnhancementviaProbabilisticTransferMatrices”, DesignAutomationandTestinEurope (DATE) ,March2005. [55]D.BhaduriandS.K.Shukla,“NANOLAB:AToolforEvaluat ingReliabilityofDefect-Tolerant Nanoarchitectures”, IEEETransactionsonNanotechnology ,2005. [56]J.Han,E.Taylor,J.GaoandJ.Fortes,`”ReliabilityMo delingofNanoelectronicCircuits” IEEE ConferenceonNanotechnology ,July2005. [57]C.YuanandM.J.Druzdzel,“AnImportanceSamplingAlgo rithmBasedonEvidencePrepropagation,” Proceedingsofthe19thAnnualConferenceonUncertaintyon ArticialIntelligence pp.624-631,2003. [58]J.Cheng,“EfcientStochasticSamplingAlgorithmsfo rBayesianNetworks,” Ph.DDissertation, UniversityofPittsburgh ,2001. [59]R.G.Cowell,A.P.David,S.L.Lauritzen,D.J.Spiegelh alter,“ProbabilisticNetworksandExpert Systems”,Springer-VerlagNewYork,Inc.,1999. 97

PAGE 108

[60]S.BhanjaandN.Ranganathan,“CascadedBayesianinfer encingforswitchingactivityestimation withcorrelatedinputs,” AcceptedforpublicationinIEEETransactiononVLSI ,2004. [61]K.P.Murphy,Y.WeissandM.I.Jordan“Loopybeliefprop agationforapproximateinference:an empiricalstudy,” InProceedingsofUncertaintyinAI ,pp.467–475,1999. [62]Y.Weiss,“Correctnessoflocalprobabilitypropagati oningraphicalmodelswithloops,” Neural Computation ,vol.12-1,pp.1-41,Aug.2000. [63]S.ManichandJ.Figueras,“Maximizingtheweightedswi tchingactivityincombinationalCMOS circuitsunderthevariabledelaymodel,” EuropeanDesignandTestConference ,pp.597–602, 1997. [64]I.Sutherland,R.SproullandD.Harris,“LogicalEffor t:DesigningFastCMOSCircuits”,Morgan Kaufmann,February1999. [65]N.RamalingamandS.Bhanja,“CausalProbabilisticInp utDependencyLearningforSwitching ModelinVLSICircuits”, A CMGLSVLSI,2005. [66]R.Marculescu,D.Marculescu,M.Pedram,”SequenceCom pactionforPowerEstimation:Theory andPractice”, IEEETrans.onComputer-AidedDesignofIntegratedCircuit sandSystems ,vol.18, No.7,pp.973-993,July1999. [67]"GeNie",URLhttp://www.sis.pitt.edu/genie/genie2.[68]B.P.Philips,“Oncomputingthedetectionprobability ofstuck-atfaultsinacombinationalcircuit,” IEEEsystemreadinessTechnologyConference ,vol.24-26,pp.301–305,Sep.1991. [69]S.C.Seth,V.D.AgrawalandH.Farhat,“Atheoryoftesta bilitywithapplicationtofaultcoverage analysis,” Proceedingsofthe1stEuropiantestconference ,vol.12-14,pp.139–143,April1989. [70]Y.FangandA.Albicki,“Efcienttestabilityenhancem entforcombinationalcircuits,” ProceedingsoftheInternationalConferenceonComputerDesign ,vol.2-4,pp.168–172,1995. [71]T.RejimonandS.Bhanja,“AnAccurateProbabilisticMo delforErrorDetection,” Proc.IEEE InternationalConferenceonVLSIDesign ,pp.717–722,Jan.2005. [72]M.SonzaReordaandM.Violante,“FaultListCompaction throughStaticTimingAnalysisfor EfcientFaultInjectionExperiments,” Proc.DefectandFaultToleranceSymposium ,pp.263– 271,2002. [73]T.RejimonandS.Bhanja,“AStimulus-FreeProbabilist icModelforSingle-Event-UpsetSensitivity,” Proc.IEEEInternationalConferenceonVLSIDesign ,Jan.2006. [74]M.Nicolaidis,“TimeRedundancybasedSoft-ErrorTole rancetoRescuenanometerTechnologies,” VLSITestSymposium ,pp.86–94,1999. [75]P.Robinson,W.Lee,R.AgueroandS.Gabriel,“Anomalie sduetosingleeventupsets,”Journalof SpacecraftandRockets,” JournalofSpacecraftandRockets ,vol.31,no.2,pp.166–171,Mar-Apr 1994. 98

PAGE 109

[76]J.T.WallmarkandS.M.Marcus,“Minimumsizeandmaximu mpackagingdensityofnonredundantsemiconductordevices,” ProceedingsofIRE ,vol.50,pp.286–298,March1962. [77]G.H.Johnson,J.H.Hohl,R.D.SchrimpfandK.F.Gallowa y,“SimulatingSingle-eventburnoutin n-channelpowerMOSFETs,” IEEETransactionsonElectronDevices ,vol.40,pp.1001–1008, 1993. [78]J.Srinivasan,S.V.Adve,P.BoseandJ.A.Rivers,”TheC aseforLifetimeReliability-Aware Microprocessors”, InternationalConferenceonComputerArchitecture ,2004. [79]T.RejimonandS.Bhanja,“ScalableProbabilisticComp utingModelsusingBayesianNetworks,” IEEEMidwestSymposiumonCircuitsandSystems ,pp.712–715,July2005. [80]Y.Monnet,M.RenaudinandR.Leveugle,“Asynchronousc ircuitstransientfaultssensitivityevaluation”, DesignAutomationConference ,pp.863–868,June2005. [81]G.AsadiandM.B.Tahoori,“AnaccurateSERestimationm ethodbasedonpropagationprobability[softerrorrate]”, DesignAutomationandTestinEurope ,pp.306–307,2005. [82]C.Lopez-Ongil,M.Garcia-Valderas,M.Portela-Garci aandL.Entrena-Arrontes,“Techniques forfasttransientfaultgradingbasedonautonomousemulat ion[ICfaulttoleranceevaluation]”, DesignAutomationandTestinEurope ,pp.308–309,2005. [83]S.Tosun,N.Mansouri,E.Arvas,M.KandemirandY.Xie,“ Reliability-centrichigh-levelsynthesis”, DesignAutomationandTestinEurope ,pp.1258–1263,2005. [84]J.S.Hu,F.Li,V.Degalahal,M.Kandemir,N.Vijaykrish nanandM.J.Irwin,“Compiler-directed instructionduplicationforsofterrordetection”, DesignAutomationandTestinEurope ,pp.1056– 1057,2005. [85]S.Srinivasan,A.Gayasen,N.Vijaykrishnan,M.Kandem ir,Y.XieandM.J.Irwin,“Improving soft-errortoleranceofFPGAcongurationbits”, InternationalConferenceonComputerAided Design ,pp.107–110,Nov.2004. [86]V.DeandS.Borkar,“Technologyanddesignchallengesf orlowpowerandhighperformance”, InternationalSymposiumonLowPowerElectronicsandDesig n ,pp.163–168,1999. [87]R.HegdeandN.R.Shanbhag,”Towardachievingenergyef ciencyinpresenceofdeep-submicron noise”, IEEETransactionsonVLSISystems ,vol.8,no.4,pp.379–391,Aug.2000. 99

PAGE 110

ABOUTTHEAUTHOR TharatookherB.TechinElectricalandElectronicsEnginee ringfromUniversityofKerala,Indiain 1992andM.TechinPowerElectronicsfromUniversityofCali cut,Kerala,Indiain1995.From1995to 1997,sheworkedasaninstructorintheElectricalEngineer ingdepartmentsoftwoengineeringcolleges inIndia.From1997to2000sheservedasanElectricalEngine erinKeralaStateElectricityBoard, India.Intheyear2002shereceivedMastersDegreeinElectr icalEngineeringmajoringinVLSIcircuits designfromtheUniversityofSouthFlorida,afterwhichshe continuedherstudiestowardsdoctoral degree.Shehasco-authoredintwojournalpublications(IE EETransactionsonVLSISystemsandIEE journalonComputersandDigitalTechniques).Herworkondy namicerrormodelingandreliability estimationisunderreviewinIEEEtransactionsonNanotech nology.Shehasco-authoredinfourpeerreviewedIEEEconferencepublicationsandtwootherpublic ations.HerpaperonSingle-Event-Upset modelingandanalysiswasnominatedforthe”BestPaperAwar d”andreceived”HonorableMention Award”intheInternationalConferenceonVLSIDesign,2006 ,whereshepresentedthepaper.Shehas alsopresentedherworkonSEUanalysisinasymposiumonVLSI designsponsoredbyNASA.


xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001910670
003 fts
005 20070928112107.0
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 070928s2006 flu sbm 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0001707
040
FHM
c FHM
035
(OCoLC)173483336
049
FHMM
090
TK145 (ONLINE)
1 100
Rejimon, Thara.
0 245
Reliability-centric probabilistic analysis of VLSI circuits
h [electronic resource] /
by Thara Rejimon.
260
[Tampa, Fla] :
b University of South Florida,
2006.
3 520
ABSTRACT: Reliability is one of the most serious issues confronted by microelectronics industry as feature sizes scale down from deep submicron to sub-100-nanometer and nanometer regime. Due to processing defects and increased noise effects, it is almost impractical to come up with error-free circuits. As we move beyond 22nm, devices will be operating very close to their thermal limit making the gates error-prone and every gate will have a finite propensity of providing erroneous outputs. Additional factors increasing the erroneous behaviors are low operating voltages and extremely high frequencies. These types of errors are not captured by current defect and fault tolerant mechanisms as they might not be present during the testing and reconfiguration. Hence Reliability-centric CAD analysis tool is becoming more essential not only to combat defect and hard faults but also errors that are transient and probabilistic in nature.In this dissertation, we address three broad categories of ^errors. First, we focus on random pattern testability of logic circuits with respect to hard or permanent faults. Second, we model the effect of single-event-upset (SEU) at an internal node to primary outputs. We capture the temporal nature of SEUs by adding timing information to our model. Finally, we model the dynamic error in nano-domain computing, where reliable computation has to be achieved with "systemic" unreliable devices, thus making the entire computation process probabilistic rather than deterministic in nature.Our central theoretical scheme relies on Bayesian Belief networks that are compact efficient models representing joint probability distribution in a minimal graphical structure that not only uses conditional independencies to model the underlying probabilistic dependence but also uses them for computational advantage. We used both exact and approximate inference which has let us achieve order of magnitude improvements in both accuracy and speed and have enabled us t o study larger benchmarks than the state-of-the-art. We are also able to study error sensitivities, explore design space, and characterize the input space with respect to errors and finally, evaluate the effect of redundancy schemes.
502
Dissertation (Ph.D.)--University of South Florida, 2006.
504
Includes bibliographical references.
516
Text (Electronic dissertation) in PDF format.
538
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
500
Title from PDF of title page.
Document formatted into pages; contains 99 pages.
Includes vita.
590
Adviser: Sanjukta Bhanja, Ph.D.
653
Single-event-upsets.
Soft errors.
Dynamic errors.
Error modeling.
Bayesian networks.
690
Dissertations, Academic
z USF
x Electrical Engineering
Doctoral.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.1707