USF Libraries
USF Digital Collections

Regression approach to software reliability models

MISSING IMAGE

Material Information

Title:
Regression approach to software reliability models
Physical Description:
Book
Language:
English
Creator:
Mostafa, Abdelelah M
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla
Publication Date:

Subjects

Subjects / Keywords:
Linear regression
Non-homogenous poisson process
Power law process
Non-parametric
Monotone regression
Rank regression
Dissertations, Academic -- Mathematics -- Doctoral -- USF
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Abstract:
ABSTRACT: Many software reliability growth models have beenanalyzed for measuring the growth of software reliability. In this dissertation, regression methods are explored to study software reliability models. First, two parametric linear models are proposed and analyzed, the simple linear regression and transformed linearregression corresponding to a power law process. Some software failure data sets do not follow the linear pattern. Analysis of popular real life data showed that these contain outliers andleverage values. Linear regression methods based on least squares are sensitive to outliers and leverage values. Even though the parametric regression methods give good results in terms of error measurement criteria, these results may not be accurate due to violation of the parametric assumptions. To overcome these difficulties, nonparametric regression methods based on ranks are proposed as alternative techniques to build software reliability models. In particular, monotone regre ssion and rank regression methods are used to evaluate the predictive capability of the models. These models are applied to real life data sets from various projects as well as to diverse simulated data sets. Both the monotone and the rank regression methods are robust procedures that are less sensitive to outliers and leverage values. In particular, the regression approach explains predictive properties of the mean time to failure for modeling the patterns of software failure times.In order to decide on model preference and to asses predictive accuracy of the mean time between failure time estimates for the defined data sets, the following error measurements evaluative criteria are used: the mean square error, mean absolute value difference, mean magnitude of relative error, mean magnitude oferror relative to the estimate, median of the absolute residuals, and a measure of dispersion. The methods proposed in this dissertation, when applied to real software failure data, give lesserror ^in terms of all the measurement criteria compared to other popular methods from literature. Experimental results show that theregression approach offers a very promising technique in software reliability growth modeling and prediction.
Thesis:
Dissertation (Ph.D.)--University of South Florida, 2006.
Bibliography:
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
Statement of Responsibility:
by Abdelelah M. Mostafa.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 107 pages.
General Note:
Includes vita.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001801595
oclc - 162168296
usfldc doi - E14-SFE0001648
usfldc handle - e14.1648
System ID:
SFS0025966:00001


This item is only available as the following downloads:


Full Text

PAGE 1

by AbdelelahM.Mostafa Adissertationsubmittedinpartialfulllment oftherequirementsforthedegreeof DoctorofPhilosophy DepartmentofMathematics CollegeofArtsandSciences UniversityofSouthFlorida MajorProfessor:K.M.Ramachandran,Ph.D. Chair:TapasK.Das,Ph.D. ChrisTsokos,Ph.D. MarcusMcWaters,Ph.D. GeoreyOkogbaa,Ph.D. JogindarRatti,Ph.D. DateofApproval: June21,2006 Keywords:LinearRegression,Non-homogenousPoissonProcess,PowerLawProcess,Non-parametric,MonotoneRegression,RankRegression cCopyright2006,AbdelelahM.Mostafa

PAGE 3

AbdelelahM.Mostafa

PAGE 4

ListofTablesv Abstractvii 1IntroductionandReviewofSoftwareReliabilityModeling1 1.1Introduction................................1 1.2HardwareandSoftwareReliability....................3 1.2.1HardwareReliability.......................4 1.2.2SoftwareReliability........................4 1.3SomeBasicDenitions,Concepts,andTerminology..........5 1.4FundamentalsofReliability.......................8 1.4.1FailureIntervalDescription(FID)................9 1.5SomeProbabilityDistributionsandReliabilityFunctions.......13 1.5.1ExponentialDistribution.....................13 1.5.2WeibullDistribution.......................15 1.5.3RayleighDistribution.......................17 1.6LiteratureReview.............................18 1.6.1SoftwareReliabilityGrowthModel...............18 1.6.2Littlewood-VerrallModel(LV)..................19 1.7NonHomogenousPoissonProcess(NHPP)Models..........20 1.7.1Goel-OkumotoModel.......................22 1.7.2PowerModel...........................22i

PAGE 5

1.7.4SureshandRobertModels....................23 1.7.5OtherModels...........................24 1.8ModelSelectionandComparison....................25 1.9ErrorMeasurementsPredictionCriteria.................30 1.10DataSets.................................32 1.11SummaryofDissertation.........................34 2LinearRegressionApproachestoSoftwareReliabilityModels36 2.1Introduction................................36 2.2BackgroundResults............................36 2.2.1TheMLEestimatorsofand.................38 2.3RegressionApproachforPowerLawProcesses.............39 2.3.1IntensityFunctionfortheRegressionModel:..........40 2.4LinearRegressionApproach.......................44 2.5SuccessivePredictionUsingRegression.................46 2.6ModelValidation.............................47 2.6.1Quantile-QuantilePlotofTBFandPredictedValues.....47 2.7TheGoodness-of-Fit-TestforthePLPProcessModelandtheRegressionApproach......................49 2.8ComparisonoftheMSEandMAVDValues..............52 2.9Conclusion.................................58 3Non-ParametricRegressionModelsforSoftwareReliability59 3.1Introduction................................59 3.2Preliminaries...............................61 3.3DetectingOutliersandLeverageValues.................65 3.4MonotoneRegression...........................70 3.4.1AlgorithmofObtainingtheEstimateofE(YjT)ataParticularPoint.......................71 3.4.2TheEstimateoftheRegressionofYonT...........73ii

PAGE 6

3.5RankRegressionBasedonRobustSlopeEstimation..........78 3.5.1AlgorithmCriteriaforComputingtheSlopeEstimator^...84 3.6ResultsandComparisons.........................86 3.7Conclusion.................................89 4ConclusionandFutureResearch97 4.1SummaryofDissertation.........................97 4.1.1ChapterOne...........................97 4.1.2ChapterTwo...........................98 4.1.3ChapterThree..........................98 4.2Limitations................................99 4.3FutureDirections.............................100 AbouttheAuthorEndPageiii

PAGE 7

1.2GeneralModelMotivationUsingTimebetweenFailures.......29 1.3ModelMotivationUsingSoftwareFailureDataSets.........34 2.1Quantile-QuantilePlotofTBFandPredictedValues.........48 2.2MSEComparisonofSoftwareReliabilityModels............55 2.3MAVDComparisonofSoftwareReliabilityModels..........56 3.1AnalysisofMonotoneRegression(Apollo8,CompleteData).....76 3.2AnalysisofMonotoneRegression(Apollo8,NoLeverageValues)..77 3.3AnalysisofMonotoneRegression(Apollo8,NoOutliers).......77iv

PAGE 8

2.1MSEandMAVDoftheMTBFa-RegressionModel...........41 2.2MSEandMAVDoftheSRRM1Model.................45 2.3MSEandMAVDoftheSuccessiveRecurrenceRegressionModel...47 2.4Goodness-of-FitTestStatisticalValues.................51 2.5ComparingMSEandMAVDofMLEandRegressionMethods....52 2.6ComparingMSEforDierentModels(System40)...........53 2.7PercentageReductionAchievedbySRRM1(System40)........54 2.8ComparingPredictiveModelsbyUsingVariousErrorCriteria....57 3.1TheEectsofOutlierandLeverageValuesonSRRM1(Apollo8)..67 3.2TheEectsofOutlierandLeverageValuesonSRRM1(System40).68 3.3TheEectsofOutlierandLeverageValuesonSRRM1(Project1).69 3.4AnalysisofSoftwareMonotoneRegressionMethod(Apollo8)....74 3.5AnalysisofSoftwareMonotoneRegressionMethod(System40)...74 3.6AnalysisofSoftwareMonotoneRegressionMethod(Project1)....75 3.7AnalysisofSoftwareMonotoneRegressionMethod(Project5)....76 3.8AnalysisofSoftwareRankRegression(Apollo8)...........86 3.9AnalysisofSoftwareRankRegression(System40)...........87 3.10AnalysisofSoftwareRankRegression(Project1)...........87 3.11AnalysisofSoftwareLinearRegressionMethod(Apollo8)......88 3.12AnalysisofSoftwareLinearRegressionMethod(System40).....88 3.13AnalysisofSoftwareLinearRegressionMethod(Project1)......89v

PAGE 9

3.15The%EectofOutlierandLeverageValuesofApollo8(MSE,V)..89 3.16The%EectofOutlierandLeverageValuesofApollo8(MAVD)..90 3.17The%EectofOutlierandLeverageValuesofSystem40(MSE,V).90 3.18The%EectofOutlierandLeverageValuesofSystem40(MAVD).90 3.19The%EectofOutlierandLeverageValuesofProject1(MSE,V).90 3.20The%EectofOutlierandLeverageValuesofProject1(MAVD).91 3.21The%EectofOutlierandLeverageValuesofProject5(MSE)...91 3.22The%EectofOutlierandLeverageValuesofProject5,(MAVD).91 3.23AnalysisofSRRM1,Monotone,andRankRegression(SLn=50)...92 3.24AnalysisofSRRM1,Monotone,andRankRegression(SLn=44)...92 3.25MSEandVPercentageEectofOutlierValues(SLn=50)......93 3.26AnalysisofSRRM1,Monotone,andRankRegression(TSLn=50)..93 3.27AnalysisofSRRM1,Monotone,andRankRegression(TSLn=42)..94 3.28MSEandV%EectofOutlierValues(TSLn=50)..........94 3.29ComparisonofModelsUsingDierentMeasurement'sCriteria....95 3.30MSEandVPercentageEectofOutlierValues(n=20)........96 3.31MSEandVPercentageEectofOutliervalues(n=50)........96vi

PAGE 15

Failuresarecausedbydecienciesindesign,production,andmaintenance.2. Failuresareduetowearoranyotherenergy,partsrelatedphenomena,butonecangetawarningaheadoftime.3. Preventivemaintenanceisavailableandmakesthesystemmorereliable.4. Reliabilityistimerelated.Failureratesmaybedecreasing,increasingorcon-stantwithrespecttooperatingtime.5. Reliabilityisrelatedtoenvironmentalconditions.6. Reliabilitycanbetheoreticallypredictedfromphysicalbases.7. Reliabilitycanbeimprovedbyredundancy.8. Failureratesofthecomponentsofasystemarepredictablebyanalyzingthepatternoffailuretimes.9. Hardwareinterfacesarevisual.10.HardwareDesignusesstandardcomponents.1.2.2SoftwareReliability1. Failuresareprimarilyduetodesignfaultsinthesoftware.Modifyingthedesigncanmakeitrobustversusconditionsthatcouldtriggerafailuretomaketherepairs.4

PAGE 16

Nowearphenomenainthesoftware.Softwareerrorsoccurwithoutpreviouswarning.Oldcodescanexhibitanincreasingfailurerateasafunctionoferrorsinducedwhilemakingupgrades.3.Failuresoccurwhenthelogicpaththatcontainsanerrorisexecuted.Reliabilitygrowthobservedaserrorsinthesoftwarecanbedetectedandcorrected.4. Externalenvironmentalconditionsdonotaectthesoftwarereliability,whiletheinternalenvironmentalconditionsaectthereliability,theseinternalcon-ditionsareinsucientmemoryandinappropriateclockspeeds.5. Knowledgeofdesign,usage,andenvironmentalstressfactorsarenotfactorsinpredictingthereliability.6. Wecanimprovereliabilitybydebuggingandincreasingthereadaccessmemory.7. Reliabilitycanbeimprovedbyreplicatingthesameerror.8. Reliabilitycanbeimprovedbydiversity.Inotherwords;makingthesoftwareworkwithdierentsystems.9. Softwareinterfacesarenotvisualbutareconceptual.10. Softwaredesigndoesnotusestandardcomponents;itdependsonthequali-cationsofaprogrammer. (Source:Adaptedfromtable1,page7,Keene,S.J.,`ComparingHardwareandSoftwareReliability',ASQReliabilityReview,Vol.14,Dec.1994andtable1,page4,Walker,E.\BridgingtheSoftware/HardwareReliabilityGap",RACJournal,Vol.4,No.2,2Q96).1.3SomeBasicDenitions,Concepts,andTerminology

PAGE 17

Alloftheabovereasonscausetheriskfactorthatsoftwaremayfailwhendeliveredtotheendusers,consequentlyincreasingcosts.Tominimizetheriskofthesoftwarefailure,oneshouldcontinuouslyverifyandvalidatethesoftwarethrougheachstageofthesoftwaredevelopmentprocess.Thefailureintensity(FI)andthemeantimetofailure(MTTF)aretwoalternativewaysofexpressingsoftwarereliability.TheFIistheexpectednumberoffailuresperunittimewhiletheMTTFistheexpectedvalueofthefailureinterval.Theprincipalobjectiveofasoftwarereliabilitymodelistoforecastfailurebehaviorthatwillbeexperiencedbythetimetheprogramisoperational.Thisexpectedbehaviorchangesrapidlyandcanbetrackedduringtheperiodsinwhichtheprogramistested.Ingeneral,reliabilityimproveswithintimeasthefailureintensitydecreases.Previousresearchdiscussesvarioustechniquesusedtodeterminesoftwarereliability.Nomatterhowsimpleorcomplexasoftwareprogramis,itiswidelyrecognizedthat100%percentreliabilityisimpossibletoobtain.Therearethreegenerallyacceptedapproachesusedtopursuehighlyreliablesoftware:1. Designsoftwarebyusingstructuredprogrammingthatreliesonformalspeci-cationlanguages.2. Designfault-tolerantsoftwaresystemsthatareabletoperformsatisfactorily,eveninthepresenceoffaults.3. Improvereliabilitybydebugging. Debuggingisstilltheprimarymethodforachievingreliability.However,thisprocesscanconsumeasignicantpercentageofthelifespanoftheprogram.Thequestionremains:howdoesonemeasuresoftwarereliability?Severalmetricswereproposedtoprovideananswertothisquestion.Forrepeatedrunsofaprogram,wheretheinputsareplannedtocovertheexpectedoperation,ifNisthetotalnumberofruns,andNsisthenumberofrunscompletedwithnoerrors,thenlimN!1Ns

PAGE 18

Itincreasestheburdenoftestingeort.2. Itisdiculttouniformlyseederrorsintheprogram;3. Theideathatthetotalnumberoferrorsisrepresentativeofreliabilityisques-tionable.4. Anothergeneralandmoreacceptabledenitionofsoftwarereliabilitycanbedenedintermsofthetimeintervalbetweenfailures.Theaverageprocessingtimebetweentwosuccessivefailuresisoftentakenasasignicantreliabilityindex.Therefore,theoccurrenceofafailureisakintoarandomeventgov-ernedbysomeprobabilisticlaw.Thetimeintervalbetweenfailuresaswellasthecumulativenumberoffailuresexperienceduptoagiventimearerandomvariableswithgivenstatisticaldistributions.Theanalysisofsoftwarereliabilitymergesperfectlyintothemainstreamofclassicalreliabilitytheory. Thebottomlineisthataslongasthetestproceedsanderrorsareremovedfromtheprogram,thereliabilityisexpectedtoincrease.Denition1.3.1 Asoftwaresystemisarepairablesystem.Whileasoftwareisbeingdevelopedandtested,programmersdetectandcorrectfailures.Afterthecorrectionsaremade,programmerscheckthesoftwareagainuntilanotherfailureisobserved.Theycon-tinuethisprocessuntiltheyhaveareliablesystem.Beforetheyputthesystemin7

PAGE 20

wheref(t)istheprobabilitydensityfunction(pdf)ofthefailuretimeT>0.Thecumulativedistributionfunction(cdf)oftherandomvariableTcanbewrittenintermsofR(t)asfollows:F(t)=Zt0f(x)dx=P(Tt)=1R(t)(1.4.2) ThereliabilityfunctionisalsocalledthesurvivalfunctionofT.R(t)decreasesfrom1to0ast=0tot=1.Hence,f(t),F(t),andR(t)areequivalentrepresentativesoftherandomvariableT. ANHPPisdescribedbythefailureintensityfunction,whichisdenotedbyv(t).Denition1.4.1 t;(1.4.3)9

PAGE 21

1F(t)=d dt[ln(1F(t))]=d dt[lnR(t)] Theprobabilitydensityfunctionf(t),thecumulativedistributionfunctionF(t),thereliabilityfunctionR(t)andthefailureratefunctionv(t)arecloselyrelatedtoeachother.Undergeneralconditions,anyoneofthesecanbedeterminedfromtheothersgiventhefailureratefunctionv(t),andthereliabilityfunctionR(t).Failureratefunctionmaybecomputedby:R(t)=eRt0v(x)dx(1.4.4)f(t)=v(t)eRt0v(x)dx(1.4.5) Iftheintensityfunctionincreases,thentheprobabilityoffailureoveraspecicintervaloftimebecomesgreateraslongastimeproceeds.Thistrendindicatesthatthesoftwaresystemdeteriorates.Onthecontrary,iftheintensityfunctiondecreases,thisindicatesthatthesoftwarereliabilityisgrowing.Noticethatinsoftwaresystemsitisreasonabletoassumethattheintensityfunctionmaychangeonlywhentheprogramundergoessomemodicationinitscodes(additionofnewcodesintheprogram,faultremoval,andsoon).Thetimeintervalunderwhichthesystemsoftwarewillbeusedisimportant.Inordertoachievearequiredoutcome,itisessentialthatthesystemsoftwareinvolvesfunctionsproperlywithanextremelyhighreliabilityduringashorttimeinterval,whichisusuallyshorterthanotherphasesofsoftwaredevelopment.Forsoftwareincommercialandindustryapplications,thetimeintervalforwhichsoftwareisdesignedissupposedtobemuchlonger.Therearethreesyntheticmeasuresofreliability,namely:10

PAGE 22

Themean-time-to-failure(MTTF)isdenedby:MTTF=E(T)=Z10tf(t)dt=Z10R(t)dt:(1.4.6) whereMTTFistheaverageintervaloftimeexpectedtothenextfailuretime.InotherwordsgiventhereliabilityfunctionR(t),MTTFisthusameasureoftheav-eragetimetofailureforsoftwaresystemwithlifedistributionF(T).Themeantimebetweenfailures(MTBF)istheexpectedintervallengthfromthecurrentfailuretime,sayTn=tn,tothenextfailuretimeTn+1=tn+1.Letf(tjt1;t2;:::;tn)denotetheconditionaldistributionoffailuretimeTn+1givenT1=t1;T2=t2;:::;Tn=tn,thentheMTBFisdenedby:MTBF=Z1tnf(tjt1;t2;:::;tn)dttn:(1.4.7) Thereciprocaloftheintensityfunction1 AnalternativemeasureofreliabilityisthemedianoftherandomvariableT.Itisdenedby:F(~t)=R(~t)=1 2(1.4.9) Themedianisalwayswelldened.However,thereexistsrandomvariablesTwhose11

PAGE 23

Consideranincreasingintensityfunctionv(t).Thisequationindicatesthatthechanceoffailureoveraspeciedshortintervaloftimebecomesgreateraslongastimeproceeds.Thistrendisgoodwheneverwear-outphenomenaaecttheoperationofthesystem,suchasinhardwareproducts.Ifthehazardfunctionisadecreasingfunction,thenitissuitablefordeterminingdurabilityortheexpectedlife-spanoftheproductduetoimproperdesignandmanufacturingdefects.Insoftwaresystems,itisreasonabletoassumethatthehazardratemaychangeonlywhentheprogramundergoessomemodicationsuchasfaultremovalornewcodeadditionwherenophysicaldeteriorationeectoccurs. Sincethefailureintensityfunctionv(t)dependsonlyonthecumulativefailuretimetandnotonthepreviouspatternoffailuretimes,thenwecanassumethatafailedsystemisinexactlythesameconditionafterarepairasitwasjustbeforethefailure.Denition1.4.2 thentheprocessiscalledthepowerlawprocess(PLP),whereisthescaleparameterandistheshapeparameterofPLP[51],[53],and[64]. ThepowerlawprocessisaspecialcaseofNHPP.Themodelv(t)demonstrateswhetherasoftwaresystemisimprovingordeterioratingwhenonechoosestheap-propriateparameters.When>1,thefailureintensityincreases(TBFbecomesshorter)atanexponentialratewithtime,andthePLPmodelsthereliabilityofarepairablesystemwithrapiddeterioration.While,if<1,theintensityfunctionisstrictlydecreasing(TBFbecomeslarger).Thiscorrespondstomodelingthereli-abilityofarepairablesystemwithrapidimprovement.ForthePLP,when=1,meantimebetweenfailuresisequaltoaconstantvalue.ThePLPhasprovedtobeusefulinreliabilitymodelingforseveralreasons.12

PAGE 24

Itcanbeusedtomodeldeterioratingsystems(TBFgettingshorter)aswellastomodelimprovingsystems(TBFgettinglarger).2.Duane,Rigdonetal.[51]showedthatthefailuredataofmanysystemsusedatGeneralElectrictamodelthatiscloselyrelatedtoPLP,andStatisticalinferenceprocedurescanbeusedeasilyandappliedtoPLPmodels. Therearetwostatisticaldescriptions,namely:1. Time-intervalbetweenfailures.2.Numberoffailuresexperiencedinagivenperiod.1.5SomeProbabilityDistributionsandReliabilityFunctions Thecorrespondingcumulativedistributionfunction(c.d.f.),thereliabilityfunction(r.f.)andintensityfunctioncanbeestimatedasfollows:F(t)=1etR(t)=et13

PAGE 25

Proof.Theintensityfunctionis:(t)=f(t)

PAGE 26

Theparametercanbeinterpretedastheinstantaneousfailurerate,sometimescalledfailureintensity.Itisindependentoft,suchthattheconditionalchanceoffailureinaspeciedtimeintervalisthesameregardlessofhowlongthesoftwaresystemhasbeenstudied;thusbyusingtheformulaMTTF=1 IfTisarandomvariablewiththiscdf,thenXisdistributedtoWeibull(;).Thecdf,pdf,andtheintensityfunctionaregivenby:F(t)=1R(t)=1et;t>0(1.5.12)f(t)=F0(t)=t1et;t>0(1.5.13)15

PAGE 27

Theprobabilitydensityfunction(p.d.f)isatwoparameter(>0;>0)function.Theparametersandarereferredtoastheshapeandscaleparameters,respec-tively.ThemeanoftheWeibullfunctioncanbeexpressedintermsofthegammafunction.Theorem1.5.3 Therefore,themeantimeuntilnextfailureis:MTTF=1

PAGE 28

2 2 2 2 2 2 217

PAGE 29

2+(kt)1 2

PAGE 30

Modelstructureisselected.2.Thefreeparametersinthemodelaretunedonthebasisoftheexperimentaldata.3. Aruleisgiventousetheestimatedmodelforpredictivepurposes. Asoftwarereliabilitymodelfallsintotwocategoriesthatdependontheoperatingdomain.Thus,themostpopularmodelsarebasedontime.Theirmainfeatureofreliabilitymeasures,suchasthefailureintensitywhichisderivedasafunctionoftime.Thesecondkindofsoftwarereliabilitymodelshaveadierentapproach.Thisapproachismadebyusingoperationalinputsastheirmainfeatures,whichmeasurereliabilityastheratioofsuccessfulrunstototalruns.Thesecondapproachhassomeproblemssuchas:manysystemshaverunsoflargelengthswithoutputmeasuresthatareincompatiblewiththetime-basedmeasures.Duetotheseproblems,theworkofthisdissertationhasbeendevotedtotime-domainmodels.Thetimedomainmodelemployseithertheobservedtimebetweenfailuresorthenumberofdiscoveredfailurespertimeperiod.Thus,thesetwoproceduresweredevelopedtoestimatethemodelparametersfromeitherfailurecountdataortimebetweenfailures.Therefore,softwarereliabilitymodelingandestimationcanbegroupedintotwocategoriesofgeneralapplicability:1. Failurecountingdescription(FCD).2. Failureintervaldescription(FID).1.6.2Littlewood-VerrallModel(LV)

PAGE 31

() ThesubstantialdierencebetweentheJMandLVmodelsisthatintheJMmodel,axalwaysleadstoareductionofthefailureintensity,whichhasalwayshadthesamevalue.WhileintheLVmodel,thefailureintensityfollowsarandompatternsothatthemagnitudeofitsvariationsisnotnecessarilyaconstant.Furthermore,thesignofthevariationmayvary;thatis,axmaynotresultinareliabilityimprovement.1.7NonHomogenousPoissonProcess(NHPP)Models wherem(t)isthemeanresidualtimeorexpectedcumulativenumberoffailuresin[0;t).TheassumptionsofNHPPare:20

PAGE 32

Whereo(t)approacheszeroforsmallt.Theinstantaneousfailureintensity(t)isdenotedby:(t)=Limt!0+P(N(t+t)N(t)>0) t(1.7.19) Themeanvaluefunctionis:m(t)=E(N(T))=Zt0(x)dx(1.7.20) Ifm(t)isknown,thethefailureintensity(t)is:(t)=dm(t) If(t)isconstant,thenwehaveahomogeneousPoissonprocess(HPP).In1975,Schneidewind[59]wasthersttosuggesttheNHPPmodel.However,in1979,Goel-Okumoto[15]wastherstwhopresentedasimplemodelforthesoftwarefailureprocess.HeassumedthatthecumulativefailureisaNHPPwithasimplemeanvaluefunction.Later,theGoel-Okumoto(GO)modelbecameverywellknownamongsoftwarereliability[70].GoelandOkumoto[15]proposedthetimedependentfailureratemodelbasedonNHPP.Ohba[47]andOhba-Yamada[48]proposedsomeparticularNHPPmodelssuchasthedelayedS-Shapedsoftwarereliabilitymodels,andtheinectionS-Shapedmodel.MusaandOkumoto[45]proposedtheLoga-rithmicPoissonexecutiontimemodel.Musa[42]proposedthebasicexecutiontimemodel.Goel[13],[14]introducedthetestqualityparameter.LittlewoodassumedamodicationoftheDuanemodelbasedonNHPP.Yamadaetal[71]assumedamodelwithtwotypesoffaults.In1986,Yamadaetal[73]assumedadiscretetime21

PAGE 33

Parametersmandareestimatedwiththemaximumlikelihoodapproachandpre-dictionscanbeobtained.BothJMandGOmodelsareconceptuallyindistinguishableonthebasisofasinglerealizationofthetimetofailureprocess.Theonlyactualdif-ferencebetweentheJMandGOmodelsisthemaximumlikelihoodfunction,whichisdierentlybuiltinthetwomodels.Forthisreason,thepredictionssuppliedbythetwomodelsdonotcoincide.In1984,Musadiscussedthepossibilityofclassifyingthemodelsintermsofdierentattributes.Thetimedomainwasusedforthesemodels,wherethecalendartimeortheexecutiontimewasadapted.Onlyafewmodelsassumetheexecutiontimeastheunderlyingtimemeasure.Foranydynamicmodel,failuretimecanbeincorporated,andtheprobabilisticnatureofthemodelassumptionsisnotchanged.1.7.2PowerModel

PAGE 34

If<1,thenthesoftwarereliabilityimproves.1.7.3InversePolynomialModel Thefailureintensityis:(t;;)= whereQ1=3q 2Q2=3q 21.7.4SureshandRobertModels

PAGE 35

whereandcanbeestimatedbyusingthecollectedfailuredata.Therateofoccurrenceoffailures(ROCOF)attimetis(t)=dm(t)

PAGE 36

+1!>0;>0;>0 whererepresentsthenumberoffailurestobedetected.ThecorrespondingROCOFisdenotedby:(t)=m0(t)=(+t)1>0;>0;>0 Thelogisticgrowthmodelisdenotedby:m(t)= where;b,andareconstantparameterstobeestimatedbyttingthefailuredata.Also,istheexpectednumberoffailures.Notethatm(1)=.TheGompertzgrowthmodelisdenotedbym(t)=bt;>0;b<1;>0 whereistheexpectednumberoffailures,and,andbareconstantparameters.Wealsohavededucedthatm(1)=.1.8ModelSelectionandComparison ModelSelection

PAGE 37

Ifthetimebetweenfailurestendstostayapproximatelythesame,or,inotherwords,theintensityfunctionremainsconstantovertimeandthegraphshowsalinearrelationship,then,thepossibilityofahomogeneouspoissonprocessmaybeconsideredtobeanappropriatemodelifthetimesbetweenfailuresareindependent.Ifafterremovingbugsthetimebetweenfailurestendstogetlonger,thenwecanconsiderthesystemtobeanimprovingsystem,meaningthattheintensityfunctiondecreases.Thiscanbeemployedasareliabilitygrowthmodelbecausethegraphshowsaconcavedowncurvaturewhichindicatedreliabilityimprovement,andhenceisofinterestforus.Ifthelastconditionisconcaveupward,thenitillustratesadeterioratingsystem.Thishappensbecauseaftertheremovalofsoftwarebugs,thetimebetweenfailuresdecreases,thatis,theintensityfunctionincreases.IfthegraphsofN(ti)versustihavesignicantcurvatures,thesoftwaredatamaybemodeledbyanon-stationaryprocesswhichiscapableofdescribingtheoccurrenceoffailureeventsintime.Theintensityfunctionisdenedby:v(t)=(t)1

PAGE 38

SystemA SystemB SystemC 1 3 9 20 2 5 20 45 3 9 65 76 4 20 88 113 5 25 104 129 6 41 107 152 7 50 138 174 8 69 143 193 9 91 149 199 10 128 186 210 11 151 208 220 12 190 230 226 13 245 237 228

PAGE 41

Softwarereliabilitymodelsaretheresultsofatrialanderrorprocessesaimedtoachieveareasonabletrade-obetweenthemodelstatisticalcharacteristicsespe-ciallyintermsofpredictivevalidity.Totestthepredictivereliabilityofamodel,thesoftwarereliabilityuptoacertainpoint,i,canbeusedtoestimatethemodelparametersandtousetheestimatedmodeltopredictthefuturevalueatpointi+jofavariableofinterestAposteriori,theestimatecanbecomparedwiththeobserved\true"valuetakenbythevariableattimei+j.Thisprocedurecanberepeatedeitherbysubsequentlyincreasingi,soastocoverallavailabledata,orbyextendingthepredictionhorizond,whilekeepingthenumberiofdatapointconstant.1.9ErrorMeasurementsPredictionCriteria

PAGE 42

TBF(1.9.24)MeanMagnitudeofRelativeError(MMRE)ItisthemeanofMRE.Conteetal.[9]consideredMMRE0:25tobeanaccept-ablevalueforpredictionmodelseort.Thereareadvantagesforthisassessment:1. Comparisonscanbemadeeasyacrossfailuretimedatasets[7],[69].2. TheMeanmagnitudeofrelativeerrorisindependentofunitsofdata.3. Comparisonscanbemadeacrossalltypesofpredictionmodels[9].4. SinceMMREisindependentofscale,thatistheexpectedvalueofMREdoesnotvarywithsize.

PAGE 43

Then,MdARisthemedianofthevaluesofAR.Also,theyproposedMAR,whichisthemeanofAR.MARisnothingbutMAVD,whichwehavealreadycalculated.Inchapterstwoandthree,weusethesedierentmethodsofmeasurement(MSE,MAVD,MMRE,MdAR)forsomeofthesoftwarefailuretimedatasetssuchasSystem40,Project1,andProject5in[40].1.10DataSets

PAGE 44

Thetimeelapsedfromthepreviousfailuretothecurrentfailure.Thetimebetweenfailuresinprojects1,5,and40isthetimegiveninwall-clockseconds.MoredetailedinformationonthespeciccharacteristicsofeachprojectisavailableatDataandAnalysisCenterforSoftware[43].TheoriginaldataofProject5has832observations,butweconsider810observationsonly,becausethe832ndhasanegativefailureintervallength,whichisimpossiblewhileanother21observationshavezerovaluesofTBF.Note:SincewearelookingatTBF,notnumberoffailures,thenwereplacethesevalues.Project1datasetisdocumentedin[17],itisoriginallyattributedtoMusain1979.Thisdatasethasbeenappliedinthesoftwarereliabilitycommunityformodelcomparison.Theoriginaldatahas137observationsforthetimebetweenfailures,andthelastobservation(137th)isnegative.Wedroppedthezeroandthenegativeobservations,henceweconsideronly133observations.Thefollowinggraphicalmethodfordisplayingthegivensoftwarefailuredatasets(Apollo8,System40,Project1,andProject5)canbeusedtogainmoreinsightintothedata. Figure(1.3)showsthatthegraphsofApollo8,Project1,andSystem40arecurvingdownward.Thisindicatesthemodelisimproving.However,theintensityfunctionfortheseprojectswillshowimprovementsforthemodelaswestudyitinthenextchapter.WhilethegraphofProject5uctuatescurvingupanddown,itindicatesthattheintensityfunctionisapproximatelyconstantwithinfailuretime.33

PAGE 48

Letf(tjt1;t2;:::;tn)betheconditionaldistributionfunctionoffailuretimeTn+1givenT1=t1,T2=t2;:::;Tn=tn.Thejointpdfoftheobservedfailuretimescanbederivedusingthefollowingexpression,f(t1;t2;;tn)=f1(t1)f2(t2jt1)f3(t3jt1;t2)fn(tnjt1;t2;:::;tn1): Asaconsequence,wehavef(tkjtk1)=v(tk)expZtktk1v(x)dx!;tk>tk1:(2.2.2) WewillusethefollowingresultfromRigdonetal.[52].Theorem2.2.1

PAGE 49

Now,usingthestandardderivationsQiaoetal.[50],andRigdonetal.[51],themaximumlikelihoodestimators(MLEs)oftheparameters,and,are^=n t^n(2.2.6)38

PAGE 50

ForthePLP,MTBFn=Z1tntt1exp[t+tn]dttn:(2.2.7) If=1,thenMTBFn=Z1tntexp(t+tn)dttn=1 UsingtheMLE,anestimatoroftheMTBFisgivenbyM^TBF=1 ^v(t)=(^^)1t1^:(2.2.8) InTable(2.5)ofsection(2.8),wewillgivethepredictiveerrors,MSEandMAVD,resultingfromusingtheseestimators.2.3RegressionApproachforPowerLawProcesses

PAGE 51

wheretisthefailuretime.Takingthenaturallogarithm,weget:ln(MTBF)=ln()+(1)ln(t):(2.3.10) Bywriting,Y=ln(MTBF),b=ln(),a=(1),andZ=ln(t),wecanrewrite(2.3.10)asalinearequationY=b+aZ:(2.3.11) Usingthemethodofleastsquaresforthelinearregressionmodel,Ryan[56],wecanderiveleastsquaresestimatorsofaandbas^aand^b,where^a=PZiYi(PZi)(PYi)=n and^b= Z:(2.3.13) Now,usingthefollowingderivationa=(1)implies=1a;b=ln()implies=1 1a1

PAGE 52

and^reg=1 1^a1 Usingtheseestimators,wecanobtainanestimatorofMTBFas:M^TBFareg=(^reg^reg)1t(1^reg):(2.3.16) WewillcallthisastheMTBFa-regressionmodel. WenowpresentsomenumericalresultsusingMTBFa-regressionmodelbyusingrealsoftwarefailuredataMusa[40].ThefollowingtablegivestheMSEandMAVDcalculatedforthreedierentsoftwarefailuredataofSystem40,Project1,andProject5.Table2.1:MSEandMAVDoftheMTBFa-RegressionModel Project1 Project5 Data (log(t),tisin (log(t),tisin (log(t),tisin seconds) seconds) seconds) MSE 0:0653 0:1074 0:0483 MAVD 0:1660 0:2312 0:1588 InsteadofassumingaPLPmodel,wesupposethenaturallogarithmofdata,ln(t),followsalinearrelationship,thatis,Y=ln(MTTF)=alnt+b.Then,thequestionbecomeswhatisthecorrespondingintensityfunction.Toanswerthisquestion,wewillintroducethefollowing.LetTbeacontinuousrandomvariable41

PAGE 53

Weusethefollowingtheoremtondtheintensityfunction(t).Theorem2.3.1 R(t):(2.3.18) Fromtheassumedlinearrelationshipofequation(2.3.11),Y=ln(MTTF)=alnt+b;(2.3.19) wegetMTTF=ealnt+b=ealnteb=ebta: R(t)=ebta;ebtaR(t)=Z1tR(u)du(2.3.20)42

PAGE 54

Bydierentiating(2.3.20)andusingthefollowingformulad dtZ1tR(u)du=d dtZt1R(u)du=R(t); dtR(t)=R(t) ord dtR(t)+1+akta1 dtR(t)+(k1ta+at1)R(t)=0(2.3.21) Since,v(t)=f(t)

PAGE 55

and^b= where Forthissimplelinearmodel(wewillcallthisSRRM1model),MSEandMAVDaregivenintheTable(2.2).44

PAGE 56

Project1 Project5 Data log(t),tisin log(t),tisin log(t),tisin seconds seconds seconds MSE 2:56 1:80 2:71 MAVD 1:24 0:95 1:33 Asinsection(2.3),oneofthequestionsthatcancomeupbecomeswhatshouldtheintensityfunctioncorrespondingtothesimplelinearregressionmodelbe.Forthelinearmodel,wewillderivetheintensityfunction.Wewillonceagainusethemeanresidualtime(MRT). Forasimplelinearregressionmodel,MTTF=at+b(2.4.26) equatingtheMRTwithMTTFinordertondtheintensityfailurefunctionv(t),MTTF=m(t); R(t)=at+b;(2.4.27) followingthesamestepsasinsection(2.3),weobtaintheintensityfunctioncorre-45

PAGE 57

a:(2.4.28) Itshouldbenotedthatinasimplelinearregressionmodel,MTTF=at+b; Iftheslopeoftheregressionlineispositive,thentheMTBFincreases.Letkbethedesiredoptimaltimeofsatisfactoryoperation(speciedbythedesign).ThetestingofthesoftwarewillbeterminatedwhenMTBF=k.(2) Iftheslopeoftheregressionlineremainsnegativeforacertainsoftwarefailuredata,thenwehavetodiscardthesystem.2.5SuccessivePredictionUsingRegression Thesuccessivepredictionmethodworksasfollows:itpredictsonlythenextfailuretimeonestepaheadbyapplyingthelinearregressionmodelwiththeiterativeestimatedparametersfrompastfailuredata.Thatisweapplythelinearregressionlineforcomputingtheestimatedparameters,thenbyusingtheparametersatevery46

PAGE 58

Thepredictiveperformanceofthismethodisassumedbycomputingmeansquarederror(MSE)andmeanabsoluteerror(MAVD).WearelistingtheresultsofthismodelinTable(2.3).TheseresultswillbecomparedwithothermodelsinTable(2.8)attheendofthischapter.Table2.3:MSEandMAVDoftheSuccessiveRecurrenceRegressionModel Project1 Project5 Data log(t),tisin log(t),tisin log(t),tisin seconds seconds seconds MSE 4:35 2:30 3:01 MAVD 1:69 1:20 1:40 AfterapplyingtheqqplotforthePLPandlinearregressionmodelsbyusingSystem40,Project1,andProject5datasets,wenoticethattheactualTBFts47

PAGE 59

Regmodelsseemtotthedatabetter.Project5data:Fig.(7)andFig.(8)areveryclose,whileFig.(9)tsmostofthedataexceptforlowestquartiles.ThequantilequantileplotindicatesthatthesamplesfailuredataarecomingfromthePLP,PLPReg,andthelinearregressiondistributions.48

PAGE 60

andtheRegressionApproach Theratiopowertransformationisusedforconstructingthegoodness-of-ttest.If(t)istheexpectednumberoffailuresbeforetimet,thenRi=(ti) (tn);i=1;2;:::;n Forthepowerlawprocessofintensityfunctionv(t)=t1(t)=Zt0v(t)dx=Zt0x1dx=t: (tn)=ti

PAGE 61

12(n1)+n1Xi=1^RiE^Ri2; 2(n1).ThusC2R=1 12(n1)+n1Xi=1^Ri2i1 2(n1)2:(2.7.30) LargevaluesofC2Rmeansthatthereisanevidenceofadeparturefromthepowerlawprocess(Rigdonetal.[52]).Forcertainvaluesofcondencelevel,thecriticalvaluesfortheCramer-vonMisesgoodness-of-ttestwithm=(n1)canbeobtainedusingTable(A6),page256,ofRigdonetal.[52].Form>100,weknowthecriticalvalueswillbesmallerthanthevaluesgiveninthistable.Thisinformationwillbeusedinthedecisionmakingforthedatasetswehave. Now,wetesttheadequacyofthelinearregressionmodelgoodness-of-ttestusingtheCramer-von-Misestest.Let Fromequation(2.4.28),theintensityfunctionforthesimplelinearregressionmodelisgivenbyv(t)=a+1 b:50

PAGE 62

(tn);i=lnj(ati+b)=bj Now,theratiopowertransformationis:^Ri=ln^ati+^b=^b Thus,theteststatisticfortheCramer-VonMisestestis:C2R=1 12(n1)+n1Xi=10@ln^ati+^b=^b 2(n1)1A2:(2.7.32)Table2.4:Goodness-of-FitTestStatisticalValues System40 Project1 Project5 PLP 0:04231272276 0:0121209429 0:008258820298 SimpleLinearRegression 0:009400859828 0:007124003038 0:01730674729

PAGE 63

Table(2.5)givesthecomparisonbetweenMLEapproachandregressionap-proachwithassumingthelogarithmicfailuredataofSystem40,Project1,andProject5followPLP,whereweusethenotations,MTBFa mleforthePLPmodelbyusingtheMLEestimates.andMTBFa regthebasedPLPmodelthroughregression.Table2.5:ComparingMSEandMAVDofMLEandRegressionMethods Error System40 Project1 Project5 Measurement log(t),tisin log(t),tisin log(t),tisin seconds seconds seconds MTBFa reg MSE 0:0653 0:1074 0:0483 MTBFa mle MSE 23:4327 16:6756 292:5728 MTBFa reg MAVD 0:1660 0:2312 0:1588 MTBFa mle MAVD 4:7957 3:7663 16:3511 Wenowgiveresultscorrespondingtosimplelinearregression(SRRM1)model,wherewedidnotassumethePLPstructuretothedata.Giventhattherearevirtuallynodistributionalassumptions(exceptthatisrequiredintheleastsquaresmethodofregressionanalysis),ourresultsarecompetitive. Wewillmakecomparisonusingthefollowingeightsoftwaremodels,whichin-52

PAGE 64

MeanSquareError MeanAbsoluteValue \MSE" Dierence\MAVD" SingpurwallaandSoyerModelI 4:92 NotAvailable(NA) SingpurwallaandSoyerModelII 12:99 NA SingpurwallaandSoyerModelIII 9:58 NA SingpurwallaandSoyerModelIV 16:2029 NA MTBFhs(Horigome, Singpurwalla,&Soyer) 5:19 1:6865 MTBFa/SRGM/Suresh 4:72 1:85 MTBFa/ComputedbyHenry 4:15 NA MTBFq/Henry 3:17 1:5677 SRRM1/(LinearRegression) 2:57 1:2412

PAGE 65

%Reduction %Reduction %Reduction Achievedby Achievedby Achievedby SRGM/ MTBFq/ SRRM1Model Suresh, Henry,Tsokos, andRaoModel andRaoModel SingpurwallaandSoyer ModelI 4:06 35:57 47:76 SingpurwallaandSoyer ModelII 63:66 75:60 80:22 SingpurwallaandSoyer ModelIII 50:73 66:91 73:17 SingpurwallaandSoyer ModelIV 70:87 80:44 84:14 MTBFhs(Horigome, Singpurwalla&Soyer) 9:06 38:92 50:48 MTBFa/SRGM/Suresh &RaoModel 32:84 45:55 MTBFa/ ComputedbyHenry 23:61 38:07 MTBFq/Henry&Tsokos 18:93

PAGE 68

PLP mle SimpleLin reg PLP Reg SuccessiveRecurrence MSE 23:4327 2:5657 0:0653 4:3483 MAVD 4:7957 1:2370 0:1660 1:6919 MMRE 2:1000 0:1320 0:0762 0:1737 MdAR 4:7961 1:0701 0:1375 1:3917 Project1 MSE 16:6756 1:7965 0:1074 2:3011 MAVD 3:7663 0:9479 0:2313 1:2164 MMRE 0:7424 0:2733 0:1704 0:3298 MdAR 4:1110 0:6596 0:1628 1:0105 Project5 MSE 292:6700 2:7122 0:0483 3:0068 MAVD 16:3511 1:3260 0:1588 1:4071 MMRE 1:8385 0:1679 0:0778 0:1714 MdAR 15:2034 1:1517 0:1343 1:2794 Reg,andthesuccessiverecurrencemodelsusedforpredictionperformbetterincomparisontoMLEmethodforprediction.Overall,forthedatasetsconsidered,thePLPregressionmethodoutperformsallothermodels.57

PAGE 70

In[39],ithasbeendemonstratedthatthelinearregressionmodelsoutperformotherpopularmodelsfromliteratureintermsofthepredictiveaccuracyoftheMeanTimeBetweenFailure(MTBF)estimates.TheseestimatesweremeasuredusingtheMeanSquareError(MSE)andMeanAbsoluteValueDierence(MAVD).However,forsomeofthesoftwarefailuredata,certainparametricassumptions,suchasthenormalityassumption,donothold.Asaresult,theinferentialproceduresdescribedin[39]andmanyotherreferencesmaynotbeapplicableandhencemaynotbeaccurate.Also,theassumptionofuncorrelatederrorsismoreoftenviolatedwhenoneusessoftwarefailuredata,sinceitiscollectedovertime.Therearemanyrobustalternativesthathavebeendevelopedinthelastdecadestodealwiththesesituations.See(Thatis[23],[16]orHampeletal.,[55],[18],and[19]).Oneofthealternativewaysistomodeltheregressionfunctionnon-parametricallysoastoletthedatadecideonthefunctionalform.Silverman[61]statesthat\Aninitialnon-parametricestimatemaywellsuggestasuitableparametricmodel(suchasalinearregression),butneverthelesswillgivethedatamoreofachancetospeakforthemselvesinchoosingthemodeltobetted".Inthesecases,thettedvaluesdeterminednon-parametricallyaresuperiortothettedvaluesobtainedfromaparametricmodel.A59

PAGE 71

Inthiswork,weapplytwononparametricmethods:monotonicregressionandrankregression.Themainideaofthemonotonerankregressionapproachistousetheranktransformationinregressionproblems.Ifthereareobservationsofthede-pendenttimebetweenfailuresY,wereplacetheseobservationsbytheircorrespond-ingranks,whereR(Y)istheassignedranktothevalueoftimebetweenfailures.Similarly,replaceeachoftheindependentfailuretimeTj,j=1;2;:::;kbyitscorre-spondingranks.Tiesarereplacedbyassigningaverageranks.Then,weperformtheleastsquaresregressionanalysisontheserankstoobtainthepredictiveregressionestimates. Fortherankregressionmethod,wepresentanasymptoticallydistribution-freerankbasedprocedurebycomputingtheestimatorsoftheregressionparameter.LetRi()denotetherankoftheresidualasafunctionof.TheestimatoristhevaluethatminimizesthedispersionD(YiTi)denedin(3.2.2).Theseapproachesarelesssensitivetooutliers.Theyrepresentanalternateapproachfortheparametricregressionandtheleastsquaresapproachesincasetheparametricassumptionsarenotappropriate.Inaddition,themonotoneandtherankregressionmethodslimittheinuenceofoutliersandhighleveragevalues,asshowninSection(3.4)and(3.5). InSection(3.2),weintroducesomepreliminarynotations,denitionsandmo-tivationforusingnon-parametricmethods.MuchoftheanalysisofoutliersandinuentialhighleveragearediscussedmainlyinSection(3.3).InSections(3.4)60

PAGE 73

Now,weintroducesomeimportanttermsanddenitions.LetT1;T2;:::;Tnbethefailuretimesofthesoftware,andletYbetherandomvariablerepresentingthetimetothenextfailure.Considerthefollowinglinearregressionmodelforsoftwarefailureprediction:Yi=+ti+i;1in;(3.2.1) whereistheinterceptparameter,istheparameterrepresentingtheregressioncoecient,and"irepresentstherandomerrors.Denition3.2.1

PAGE 74

whereRci()=Ri()(n+1) 2;(3.2.3) arethecenteredranksormid-ranksandRi()aretheranksoftheresidualsYiTi: Thepurposeoftherankregressionistosuggestasimplemodicationoftheleastsquaresmethodwhichyieldsanevenlocation-freemeasureofdispersion.Con-sidertheresidual"i=YiTi,intheleastsquaresregressionwehavenPi=1"2(i)=nPi=1"(i)"(i),where"iistheithorderedresidualfromasymmetricdistri-butioncenteredatzero.Fortherankregression,replacetheresidualsbyitsranks.LetRi();Rci();i=1;:::;n;respectivelybetheranksandcenteredranksofthe63

PAGE 75

ItfollowsthatnPi=1Rci()=0,andRc1()Rc2():::Rcn()(monotone); Now,weintroducetheerrormeasurecriteriathatwewilluseinlieuofleastsquares.Theorem3.2.4 Proof.FromDenition(3.2.1),itisenoughtoprovethat1. ShowthatD(;)=D()D(;)=D((YiTi))=nXi=1Rci()(YiTi)=nXi=1Rci()(YiTi)nXiRci()=nXi=1Rci()(YiTi)nXi=1Rci()=nXi=1Rci()(YiTi)=D()64

PAGE 76

NotethatsincenPi=1Rci()=0,wehaveanevenandlocation-freemeasureofdisper-sion.

PAGE 77

Notethat:hi1 Thestandardizedresidualscanprovideausefultooltodetectoutliervalues.Astandardizedresidualthatexceeds2suggeststhattheresidualsaremorethan2standarddeviations,aboveorbelowtheregressionline.Assumingnormallydis-tributederrors,standardizedresidualsshouldfalloutsidethisrangeonly5%ofthetime[65].ItisclearthatthefurthertheTvaluefromitsmean,thelargerthelever-agevalue.Weusedhi6=ninourcomputationsasathresholdconditiontoobtainleverageobservations.Agoodleveragepointisapointthatisunusuallylargeorsmallamongtheobservationsandisnotaregressionoutlier.Thepointisrelativelyremovedfromthemajorityoftheobservations,butreasonablyclosetotheregressionline.Inproject1dataset,therearetwooutliers(0.69,and1.10)andthreeleveragevalues(0.69,1.10,and1.39).Thus,wecanconsidertheobservation1.39asagoodleveragepoint.Additionally,inSystem40,therearenooutliersbutoneleveragepointthatisconsideredtobeagoodleveragevalue.Abadleveragepointisapointthathasanunusuallylargeresidualcorrespondingtosomeregressionlines.Thispointissituatedfarfromtheregressionlineofthebulkobservationsdata.Abadleveragepointisanoutlieramongallobservationsaswell.Badleveragepointscanrapidlyaecttheestimatedvalueoftheslope.SuchaneecthasbeenseeninthecaseofApollo8,withtheleveragevalue33.ThenormalitytestrevealsthatApollo8dataisnotnormal.Project1andProject5haveoutliersandslightlydepartfromthenormaldistribution.Insuchcases,itcansubstantiallyreduceourabilitytode-tectatrueassociationbetweenthefailuretimeandtimebetweenfailures.However,theoutliersamongthefailuretimeTvaluesinatethesamplevariances2t,anddecreasethestandarderroroftheleastsquaresestimateoftheslope.Thissuggeststhatoutlierpointsarebenecialintermsofincreasingourabilitytodetectregression66

PAGE 78

Now,wedemonstrateinTable(3.1)theeectofoutlierandleveragevaluesinApollo8dataontheerrormeasurementsofthesoftwarereliabilityregressionmodeloftherstdegree(SRRM1),usingtheleastsquareprocedure.Table3.1:TheEectsofOutlierandLeverageValuesonSRRM1(Apollo8) Measurements CompleteData Datawithout Datawithout Criteria LeverageValues'91' Outliers'33,91' SRRM1 MSE 151:9420 29:3014 7:1063 SRRM1 MAVD 6:3229 3:1597 1:9085 SRRM1 MMER 0:5914 0:57 0:5147 SRRM1 MMRE 2:2680 1:01 0:7385 SRRM1 MdAR 2:2045 2:18 2:0853 SRRM1 V(LSR) 341:25 11:64 73:6

PAGE 79

Measurements CompleteData Datawithout Datahas Criteria LeverageValues'4.70' noOutliers SRRM1 MSE 2:5657 2:3428 MAVD 1:2370 1:1736 MMER 0:1224 0:1138 MMRE 0:1320 0:1223 MdAR 1:0701 1:0336 V(LSR) 404 400

PAGE 80

Measurements CompleteData Datawithout Datawithout Criteria LeverageValues Outliers SRRM1 MSE 1:7965 1:6384 1:7209 SRRM1 MAVD 0:9479 0:9195 0:9428 SRRM1 MMER 1:1691 0:1628 0:0:1694 SRRM1 MMRE 0:2733 0:2277 0:2423 SRRM1 MdAR 0:6596 0:6125 0:0:6577 SRRM1 V(LSR) 343 357:65 355

PAGE 81

TheleastsquaresregressionanalysisisperformedontheranksofTandY.Theregressionequationwhichexpresses^R(Yi)intermsofR(Ti)is^R(Yi)=(n+1)=2+^(R(Ti)(n+1)=2)+"i(3.4.7)70

PAGE 82

AcquiretheranksR(Ti)andR(Yi)oftheTandYrespectively.Incaseoftiesuseaverageoftiedranks.2. Identifytheleastsquaresregressionestimatorsofequation(3.4.6),^=nPi=1R(Ti)R(Yi)n(n+1)2=4 ArankR(t0)fort0canbeacquiredbyapplyingthenextalgorithms:71

PAGE 83

Ift0equalsoneoftheobservedpointsTi,letR(t0)equaltherankofthatTi.(b)Ift0existsbetweentwoadjacentvaluesTiandTjwhereTi
PAGE 84

ForeachrankofY,ndtheestimatedrankofTiand^R(Ti)fromEquation(3.4.7)^R(Ti)=[R(Yi)^]=^;i=1;2;:::;n:(3.4.13)3. Transformeach^R(Ti)toanestimate^Tiinthemanneroftheprecedingstep(5).MoreSpecically:(a) If^R(Ti)equalstherankofsomeobservationTj,let^Tiequalthatobservedvalue.(b) If^R(Ti)isbetweentheranksoftwoadjacentobservationsTjandTk,whereTjTk<0,obtain^Tibyusingthefollowingequation:^Ti=Tj+^R(Ti)R(Tj) ForallobservedranksofTvalues,if^R(Ti)max(R(Ti)),thenthereisnoestimatefor^Ti.4. Ploteachofthepointsfoundinstep(3),withYiastheordinateand^Tiastheabscissa.Alsoplottheendpointsfoundinstep(1),withE(^YjT)astheordinateandT(1)orT(n)astheabscissa.Allthesepointsaremonotonic,increasingif^>0anddecreasingif^<0.5. TheestimateoftheregressionofYonTisrepresentedbylinesjoiningpointsinstep(4).73

PAGE 85

CompleteData NoOutliers NoLeverageValues (33),and(91) (91) MSE 326:7174 9:8692 42:9201 MAVD 6:9378 2:5810 3:7246 MMER 1:4040 0:5796 0:7987 MMRE 1:2132 0:9779 1:1072 MdAR 2:3256 2:1249 2:4462 CompleteData NoOutliersExist NoLeverageValues MSE 2:9877 NoOutliersexist 2:7345 MAVD 1:3286 NoOutliersexist 1:2868 MMER 0:1295 NoOutliersexist 0:1249 MMRE 0:1460 NoOutliersexist 0:1365 MdAR 1:0331 NoOutliersexist 1:0166

PAGE 86

(Project1) CompleteData NoOutliers NoLeverageValues MSE 2:0517 19:6718 16:05 MAVD 1:0592 4:1360 3:77 MMER 0:0408 2:9756 2:10 MMRE 0:1981 0:7150 0:64 MdAR 0:7494 4:31 3:91

PAGE 87

CompleteData NoOutliers NoLeverageValues MSE 2:7975 2:4576 2:3085 MAVD 1:3348 1:2819 1:2525 MMER 0:0148 0:0109 0:0090 MMRE 0:0597 0:0451 0:0399 MdAR 1:1471 1:1108 1:1 Thefollowinggraphsrepresentthelines(curves)obtainedusinglinearregressionandmonotonicregressionmethods,alongwiththescatterplotofthedatavalues.

PAGE 88

Figure3.3:AnalysisofMonotoneRegression(Apollo8,NoOutliers)77

PAGE 89

2)=0 thenD(;)=nXi=1Rci():(YiTi) Equation(3.5.15)representsthesumthatinvolvestheresiduals(YiTi)oftheregressionequation(3.2.1).Insteadofequation(3.5.15),theleast-squaresestimators(LSE)ofandarefoundbyminimizing:C(;)=nXi=1(YiTi)2(3.5.16) TheLSEof^cfromequation(3.5.16)is:^c=nPi=1(YiY)(TiT) Equation(3.5.15)canbeminimizedfor,butthecomputationsareverydicult.78

PAGE 90

insteadofC(;)inequation(3.5.16).Buttheparameterestimates^1,and^1,computedbyminimizingthefunctionE(;)inequation(3.5.18),arenoteasytoanalyzefortheproposedrankregressionmodel.ThisleadsustouseTheil'sstatisticsprocedure[21],[66],[67],and[68]tocomputetheslopeestimatorofequation(3.2.1).Theil'sestimatoris:^TH=medianfSijg1i0:(3.5.20)79

PAGE 91

Proof.i. Equation(3.5.23))fZag=fYaTaghasnotiedvaluesas6=Z(i;j)=YjYi whichareatmostN=n(n1) 2innumber. SortthevaluesofZa=YaTa

PAGE 92

2+1intervals(Wk;Wk+1)for0kN,whereWkarethesortedvaluesZ(i;j),suchthatitissortedinanincreasingorderwithW0=,andWN+1=+1 ispiecewiselinearD()=Ak+Bk(3.5.24) whereWk<0;nXi=1Rci(T)Ti>081

PAGE 93

andwehavenootherchangeinranksRa()(a6=i;j)intheintervalWkb<
PAGE 94

IfthevaluesofTiarestrictlyincreasing,thenXX(i;j)(TjTi)=XX1i
PAGE 95

Thus,Sk=Bkinequation(3.5.30)istheslopeofD()in(Wk;Wk+1).i. ^=Wk1+Wk whereN8><>:=n(n1) 2;ifTihavenoties
PAGE 96

IfTiobservationsarenotconstant,thenQ>0.SincethefailuretimedataiscumulativeT10:(3.5.34) UsethesameargumenttoobtainD()=cXi=1Rci()(YiTi0:(3.5.35)III. fork=kT,therankregressionestimator^is:^=Wk=YjkYik

PAGE 97

wheret;Y,arethesamplemeans.3.6ResultsandComparisonsTable3.8:AnalysisofSoftwareRankRegression(Apollo8) CompleteData NoOutlierValues NoLeverageValues (33),and(91) values(91) MSE 429:4732 19:4206 73:7114 MAVD 8:2009 3:3299 4:8218 MMER 7:5311 2:0602 0:1770 MMRE 1:5341 NA NA NA 245:7090 209:9932 243:2727

PAGE 98

CompleteData NoOutliersExist NoLeverageValues MSE 28:56 NoOutliersExist 24:41 MAVD 4:59 NoOutliersExist 4:26 MMER 0:73 NoOutliersExist 0:58 MMRE 0:46 NoOutliersExist 0:42 MdAR 4:44 NoOutliersExist 4:27 V(RRS) 1274:97 NoOutliersExist 1199:35 CompleteData NoOutlierValues NoLeverageValues MSE 21:8342 21:1284 20:1449 MAVD 3:9773 3:9337 3:8370 MMER 6:7568 0:2167 0:1967 MMRE 0:9864 0:8452 0:7964 MdAR 3:8216 3:7546 3:6655 V(RRS) 1343:90 1334 1305

PAGE 99

CompleteData NoOutliersValues NoLeverageValues (33),and(91) (91) MSE 151:9420 7:1063 29:3014 MAVD 6:3229 1:9085 3:1597 MMER 0:5914 0:5147 0:57 MMRE 2:2680 0:7385 1:01 MdAR 2:2045 2:0853 2:18 CompleteData NoOutliersExist NoLeverageValues MSE 2:5657 NoOutliersExist 2:3428 MAVD 1:2370 NoOutliersExist 1:1736 MMER 0:1219 NoOutliersExist 0:1138 MMRE 0:1320 NoOutliersExist 0:1223 MdAR 1:0701 NoOutliersExist 1:0426

PAGE 100

CompleteData NoOutlierValues NoLeverageValues MSE 1:7965 1:7209 1:6384 MAVD 0:9479 0:9428 0:9195 MMER 1:1691 0:1694 0:1628 MMRE 0:2733 0:2423 0:2277 MdAR 0:6596 0:6577 0:6125 CompleteData NoOutlierValues NoLeverageValues MSE 2:7122 2:3911 2:2460 MAVD 1:3260 1:2738 1:2417 MMER 0:1435 0:1370 0:1329 MMRE 0:1679 0:1524 0:1459 MdAR 1:1517 1:1194 1:0949 PercentageEectsofOutliers PercentageeectsofLeverageValues LeastSquaresRegression 95:32 80:72 MonotoneRegression 96:98 86:86 RankRegression 14:5358 0:9915

PAGE 101

PercentageEectsofOutliers PercentageEectsofLeverageValues LeastSquaresRegression 69:8161 50:0277 MonotoneRegression 62:7980 46:3144 RankRegression 59:3959 41:2040 PercentageEectsofOutliers PercentageEectsofLeverageValues LeastSquaresRegression Datahasnooutliers 3:6793 MonotoneRegression Datahasnooutliers 8:4747 RankRegression Datahasnooutliers 5:9311 PercentageEectsofOutliers PercentageEectsofLeverageValues LeastSquaresRegression DatahasnoOutliers 3:2094 MonotoneRegression DatahasNoOutliers 3:1462 RankRegression DatahasnoOutliers 7:1895 PercentageEectsofOutliers PercentageEectsofLeverageValues LeastSquaresRegression 4:21 8:8 MonotoneRegression RankRegression 0.7367 2:8946

PAGE 102

PercentageEectsofOutliers PercentageEectsofLeverageValues LeastSquaresRegression 0:7579 3:2105 MonotoneRegression RankRegression 1.0962 3:5275 PercentageEectsofOutliers PercentageEectsofLeverageValues LeastSquaresRegression 11:84 17:19 MonotoneRegression 12:15 17:48 RankRegression NA NA PercentageEectsofOutliers PercentageEectsofLeverageValues LeastSquaresRegression 3:9366 6:3574 MonotoneRegression 3:9631 6:1657 RankRegression NA

PAGE 103

SimulatedSkewedLaplaceCompleteData(n=50) SRRM1 MonotoneRegression RankRegression MSE 0:7973 8:0781 4:5430 MAVD 0:6556 2:6587 1:7903 MMER 0:1681 2:2445 1:4355 MMRE 0:2130 0:6840 0:5821 MdAR 0:4806 2:4248 1:6064 V 291:5592 SimulatedSkewedLaplaceDatawithoutOutliers(n=44) SRRM1 MonotoneRegression RankRegression MSE 0:3725 2:1137 2:0108 MAVD 0:4451 1:2772 1:1598 MMER 0:1237 0:5549 0:4667 MMRE 0:1280 0:3334 0:3276 MdAR 0:3180 1:1966 1:0412 V 203:4474

PAGE 104

SimulatedSkewedLaplaceFailureDataofSize50 CompleteData NoOutlierValues PercentageEectofOutliers LeastSquaresRegression MSE=0:7973 MSE=0:3725 53:2798 MonotoneRegression MSE=8:0781 MSE=2:1137 73:835 RankRegression V=291:5592 V=203:4474 30:2208 SimulatedTruncatedSkewedLaplaceCompleteData(50) SRRM1 MonotoneRegression RankRegression MSE 1:2079 3:6117 6:0844 MAVD 0:8585 1:4174 1:9490 MMER 0:5010 99:0884 1:5016 MMRE 4:8654 0:9498 7:6142 MdAR 0:8108 1:1812 1:4425 V 223:3501

PAGE 105

SimulatedTruncatedSkewedLaplaceDataWithoutOutliers(n=42) SRRM1 MonotoneRegression RankRegression MSE 0:2709 1:2747 1:4043 MAVD 0:4273 0:9481 0:9449 MMER 0:4562 66:2997 4:0417 MMRE 2:5277 0:9410 4:8870 MdAR 0:4446 0:8877 0:8425 V 107:0164 SimulatedTruncatedSkewedLaplaceFailureDataofSize50 CompleteData NoOutlierValues PercentageEectofOutliers LeastSquaresRegression MSE=1:2079 MSE=0:2709 77:5726 MonotoneRegression MSE=3:6117 MSE=1:2747 64:7063 RankRegression V=223:3501 V=107:0164 52:0858

PAGE 106

(UsingaMonotonicData) LeastSquaresMethod MonotoneRegression RankRegression MSE 3:2814 0:2011 7:1656 MAVD 1:4008 0:2829 1:8184 MMER 0:1020 0:0188 0:1175 MMRE 0:1454 0:0193 0:2450 V 36:9 116:10 99:7500 R-Square) 0:9512 0:9692 0:5767 R-adj 0:9485 0:9675 0:5532

PAGE 107

NormalSimulatedFailureDataofSize20(mean=203.3626,std=38.4871) (OneOutlier=296.0650) CompleteData NoOutlierValues PercentageEectofOutliers LeastSquaresRegression MSE=0:0197 MSE=0:0154 21:8274 MonotoneRegression MSE=0.0240 MSE=0.0188 21:6667 RankRegression V=57.1451 V=52:1208 8:7922 NormalSimulatedFailureDataofSize50(mean=249.7735,std=38.34) Threeoutliers(158.9660,341.3610,355.6750) CompleteData NoOutlierValues PercentageEectofOutliers LeastSquaresRegression MSE=0:0157 MSE=0:0114 27:3885 MonotoneRegression MSE=0:0192 MSE=0:0129 32:8125 RankRegression V=157:2260 V=142:9994 9:0485

PAGE 112

A.A.Abdallah,P.Y.Chan,andB.LittlewoodEvaluationofCompetingSoft-wareReliabilityPredictions.InIEEETrans.SoftwareEng.,vol.12,no.9,pp.950{967,1986.[2] ANSI:RecommendedPracticeforSoftwareReliability.InAmericanInstituteofAeronauticsandAstronautics,1992.[3] H.AscherandH.Feingold.RepairableSystemsReliability,Inference,Miscon-ceptionsandtheirCauses.MarcelDekker,NewYork,1984.[4] R.BartoszynskiandM.Niewiadomska-Bugaj.ProbabilityandStatisticalInfer-ence.JohnWileyandSons,NewYork,1996.[5] D.A.Besley,E.Kuh,andR.E.Welsch.RegressionDiagnostics:IdentifyingInuentialDataandSourcesofCollinearity.JohnWileyandSons,NewYork1980.[6] S.Bittani.LectureNotesinComputerScience.Springer-Verlag,BerlinHeidel-bergNewYorkLondonParisTokyo,1988.[7] L.C.Brian,T.Langley,andI.Wieczorek.AReplicatedAssessmentandCom-parisonofCommonSoftwareCostModelingTechniques.InProc.Int'l.Conf.SoftwareEng.,vol.22,pp.377{386,2000.[8] W.J.Conover.PracticalNon-parametricStatistics2ed.JohnWileyandSons,1980.[9] S.D.Conte,H.E.DunsmoreandV.Y.Shen.SoftwareEngineeringMetricsandModels.Benjamin/Cummings,MenloPark,California,1984.101

PAGE 113

D.R.CoxandP.A.Lewis.TheStatisticalAnalysisofSeriesofEvents.Chap-manandHall,London,1996.[11] L.H.Crow.ReliabilityforComplexSystems,ReliabilityandBiometry.SocietyforIndustrialandAppliedMathematics(SIAM),pp.379{410,1974.[12] L.H.CrowandN.D.Singpurwalla.AnempiricallyDevelopedFourierSeriesModelforDescribingFailures.IEEETrans.Reliability,R-33(2),pp.176{183,1984.[13] A.L.Goel.AGuideBookforsoftwareReliabilityAsessment.Technicalreport,RADC-TR-83-176,RomeAirDevelopmentCenter,Rome,NewYork,1983.[14] A.L.Goel.SoftwareReliabilityModels:Assumptions,LimitationsandAppli-cability.IEEETrans.onSoft.Eng,SE-11,pp.1411{1423,1985.[15] A.L.Goel,andK.Okumoto.Time-DependenterorDetectionrateModelforsoftwarereliabilityandOtherPerformanceMeasures.IEEETrans.Reliability,R-28,pp.206{211,1979.[16] F.R.Hampel,E.M.Ronchetti,P.J.Rousseeuw,andW.A.Stahel.RobustStatistics:TheApproachBasedonInuenceFunctions.JohnWileyandSons,NewYork,1986.[17] D.J.Hand.SmallDataSets.ChapmanandHall/CRC:1stedition,November1,1993.[18] T.Hastie,andR.Tibshirani.GeneralizedAdditiveModels:SomeApplications.JournaloftheAmericanStatisticalAssociation,vol.82,no.5,pp.371{386,1987.[19] T.Hastie,andR.Tibshirani.GeneralizedAdditiveModels.ChapmanandHall,1990.[20] T.P.Hettmansperger,andJ.W.McKean.ARobustAlternativeBasedonRankstoLeastSquaresinAnalyzingLinearModels.Technometrics,vol.19,no.3,August1977.102

PAGE 114

M.Hollander,andD.A.Wolf.Non-paramtricStatisticalMethods.JohnWileyandSons,NewYork,1999.[22] M.Horigome,N.D.Singpurwalla,andR.Soyer.ABayesEmpiricalBayesApproachforSoftwareReliabilitygrowth.InComputerScienceandStatistics,(16thSymp.Interface,Atlanta,GA),North-Holland,pp.47{55,1984.[23] P.Huber.RobustStatistics.JohnWileyandSons,NewYork,1981.[24] P.Huber.RobustRegression:Asymptotics,ConjecturesandMonteCarlo.Ann.Statist.,vol.1,pp.799{821,1973.[25] R.L.Iman,andW.J.Conover.TheUseofRankTransforminRegression.Technometrics,vol.21,no.4,pp.499{506,Nov.1979.[26] A.L.Jackel.EstimatingRegressionCoecientsbyMinimizingtheDispersionoftheResiduals.Ann.Math.Statist.,vol.43,no.5,pp.1449{1458,Oct.1972.[27] Z.Jelinski,andP.B.Moranda.SoftwareReliabilityResearch.StatisticalCom-puterPerformanceEvaluation.AcademicPress,NewYork,pp.465{484,1972.[28] J.Jureckova.Non-parametricEstimateofRegressionCoecients.Ann.Math.Statist.,vol.42,1971.[29] P.K.Kapur,andR.B.Garg.OptimumSoftwarteReleasePoliciesforSoftwareReliabilityGrowthModelsUnderImperfectDebugging.R.A.I.R.O.,1991.[30] N.Kareer,etal.AnS-shapedSoftwareReliabilityGrowthModelwithTwoTypesofError.MicroelectronicsandReliability,vol.30,pp.1085{1090,1990.[31] B.A.Kitchenham,S.G.MacDonell,L.Pickard,andM.J.Shepperd.WhatAc-curacyStatisticsReallyMeasure.IEEEPro.SoftwareEngineering,148,pp.81{85,2001.[32] B.LittlewoodandJ.L.Verral.ABayesianReliabilityModelwithaStochasti-callyMonotoneFailurerate.IEEETrans.Reliab.,R-23(1),pp.108{114,1974.103

PAGE 115

M.R.Lyu.HandbookofSoftwareReliabilityEngineering.McGraw-Hill,1996.[34] T.A.MazzuchiandR.Soyer.ABayesEmpirical-BayesModelforSoftwareReliability.IEEETrans.Reliability,37(2),pp.248{254,1988.[35] R.L.Michael.HandbookofSoftwareReliabilityEngineering.McGraw-Hill,1996.[36] C.E.Modell.IEEEStandardsforaSoftwareQualityMetricsMethodology.IEEEStandard1061-1998,SodftwareEngineeringCommittee,NewYork,Dec.1998.[37] D.C.Montgomery,E.A.Peck,andG.G.Vining.IntroductiontoLinearre-gressionAnalysis.JohnWileyandSons,NewYork,2001.[38] P.B.Moranda.PredictionsofSoftwareReliabilityDuringDebugging.Proceed-ingsofAnnualReliabilityandMaintainabilitySymposium,pp.327{332,1975.[39] A.Mostafa,K.M.Ramachandran,andA.N.V.Rao.RegressionApproachtoSoftwareReliabilityModels.InternationalJournalofPureandAppliedMathe-matics(IJPAM),vol.26,no.2,2006.[40] J.D.Musa.http://www.dacs.dtic.mil/databases/sled/swrel.shtml.[41] J.D.Musa.TheoryofSoftwareReliabilityanditsApplication.IEEEPress,Orlando,Florida,UnitedStates,S.312{327,1975.[42] JD.Musa.ValidityoftheExecutionTimeTheoryofSoftwareReliability.IEEETrans.Reliability,R-28,pp.181{191,1979.[43] J.D.Musa.TheoryofSoftwareReliability.DataandAnalysisCenterforSoftware,January1980.[44] J.D.Musa,A.Iannino,andK.Okumoto.Softwarereliability.McGraw-Hill,NewYork,1987.104

PAGE 116

J.D.Musa,andK.Okumoto.ApplicationofBasicandLogarithmicPoissonExecutionTimeModelsinSoftwareReliabilityMeasurement.InSoftwareSystemDesignMethods,Ed.J.K.Skwirzynski,NatoSeries,vol.F22,Springer-Verlag,Berlin,pp.275{298,1986.[46] I.Nyrtveit,E.Stensrud,andMartinShepperd.ReliabilityandValidityinComparativeStudiesofSoftwarePredictionModels.IEEETransactionsonSoftwareEngineering,31(5),pp.380{391,2005.[47] H.Ohba,etal.S-shapedSoftwareReliabilityGrowthCurve:HowGoodisit?COMPSAC'82,pp.38{441982.[48] H.Ohba,andS.Yamada.S-shapedSoftwareReliabilityModels.4thInt.Conf.onReliabilityandMaintainability,pp.430{436,1984.[49] H.Qiao.ParametricandNonparametricStatisticalModeling:ReliabilityAnal-ysis.Ph.D.thesis,UniversityofSouthFlorida,1993.[50] H.QiaoandC.Tsokos.BestEcientEstimatesoftheIntensityFunctionofthePowerLawProcess.JournalofAppliedStatistics,25(1),pp.111{120,1998.[51] S.E.RigdonandA.P.Basu.ThePowerLawProcess:aModelfortheReliabilityofRepairableSystems.JournalofQualityTechnology,21(4),pp.251{259,1989.[52] S.E.RigdonandA.P.Basu.StatisticalMethodsfortheReliabilityofRepairableSystems.JohnWileyandSons,NewYork,2000.[53] S.E.Rigdon,M.Xiaolin,andK.M.Boden.StatisticalInferenceforRepairableSystemsUsingthePowerLawProcess.JournalofQualityTechnology,30(4),1998.[54] H.Roberts.PredictingthePerformanceofSoftwareSystemsviathePowerLawProcess.Ph.D.thesis,UniversityofSouthFlorida,Tampa,FL,2000.[55] P.J.Rousseeuw,andA.M.Leroy.RobustRegressionandOutlierDetection.JohnWileyandSons,NewYork,1987.105

PAGE 117

P.T.Ryan.ModernRegressionMethods.Wiley,NewYork,1996.[57] S.Sawyer.ARankRegressionEstimatingProcedure.www.math.wustl.edu/sawyer/handouts,April25th,2003.[58] G.J.Schick,andR.W.WolvertonAnanalysisofcompetingsoftwarereliabilitymodels.IEEETrans.SoftwareEng.,SE-4,pp.104{120,1978.[59] N.F.Schneidewind.AnalysisofErrorProcessesinComputerSoftware.Pro-ceedingsoftheInternationalConferenceonReliableSoftware,IEEEComputerSociety,21{23,pp.337{346,1975.[60] S.R.Searle.LinearModels.JohnWileyandSons,NewYork,1971.[61] B.W.Silverman.SomeAspectsoftheSplineSmoothingApproachtoNon-parametricRegressionCurveFitting.JournaloftheRoyalStatisticalSociety,SeriesB,47,pp.1{21,(Discussionpp.21{52),1985.[62] N.D.Singpurwalla,M.Horigome,andR.Soyer.ABayesEmpiricalApproachforSoftwareReliabilityGrowth.ComputerScienceandStatistics,TheInterface,NorthHolland,pp.47{55,1985.[63] N.D.SingpurwallaandS.P.Wilson.StatisticalMethodsinSoftwareEngineer-ing.Springer,NewYork,1999.[64] N.Suresh.ModelingandAnalysisofSoftwareReliability.Ph.D.thesis,Univer-sityofSouthFlorida,Tampa,FL,1992.[65] H.Theil.ARank-InvariantMethodofLinearandPolynomialRegressionAnal-ysis.I.Proc.Kon.Ned.Akad.V.wetenschA.53,pp.386{392,1950a.[67] H.Theil.ARank-InvriantMethodofLinearandPolynomialRegressionAnal-ysis.II.Proc.Kon.Ned.Akad.V.wetensch,A.53,pp.521{525,1950b.106

PAGE 118

H.Theil.ARank-InvariantMethodofLinearandPolynomialRegression.III.Proc.Kon.Ned.Akad.V.wetensch,A.53,pp.1397{1412,1950c.[69] F.WalkerdenandR.Jerey.AnEmpiricalStudyofAnalogy-BasedSoftwareEortEstimation.EmpiricalSoftwareEng.,4(2),pp.135{158,1999.[70] M.Xie.SoftwareReliabilityModeling.WorldScientic,Singapore,1991.[71] S.Yamada,etal.ASoftwareReliabilityGrowthModelwithtwoTypesofErrors.R.A.I.R.O.,vol.19,pp.87{104,1985.[72] S.Yamada,H.Ohtera,andH.Narihisa.SoftwareReliabilityGrowthModelswithTestingEort.IEEETrans.Reliability,R-35,pp.19-23,April,1986.[73] S.Yamada,andS.Osaki.DiscreteModelsforSoftwareReliabilityEvaluation.InReliabilityandQualitycontrol,Ed.A.P.Basu,Elsevier,London,pp.401{412,1986.107


xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001801595
003 fts
005 20070803130939.0
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 070803s2006 flu sbm 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0001648
040
FHM
c FHM
035
(OCoLC)162168296
049
FHMM
090
QA36 (ONLINE)
1 100
Mostafa, Abdelelah M.
0 245
Regression approach to software reliability models
h [electronic resource] /
by Abdelelah M. Mostafa.
260
[Tampa, Fla] :
b University of South Florida,
2006.
3 520
ABSTRACT: Many software reliability growth models have beenanalyzed for measuring the growth of software reliability. In this dissertation, regression methods are explored to study software reliability models. First, two parametric linear models are proposed and analyzed, the simple linear regression and transformed linearregression corresponding to a power law process. Some software failure data sets do not follow the linear pattern. Analysis of popular real life data showed that these contain outliers andleverage values. Linear regression methods based on least squares are sensitive to outliers and leverage values. Even though the parametric regression methods give good results in terms of error measurement criteria, these results may not be accurate due to violation of the parametric assumptions. To overcome these difficulties, nonparametric regression methods based on ranks are proposed as alternative techniques to build software reliability models. In particular, monotone regre ssion and rank regression methods are used to evaluate the predictive capability of the models. These models are applied to real life data sets from various projects as well as to diverse simulated data sets. Both the monotone and the rank regression methods are robust procedures that are less sensitive to outliers and leverage values. In particular, the regression approach explains predictive properties of the mean time to failure for modeling the patterns of software failure times.In order to decide on model preference and to asses predictive accuracy of the mean time between failure time estimates for the defined data sets, the following error measurements evaluative criteria are used: the mean square error, mean absolute value difference, mean magnitude of relative error, mean magnitude oferror relative to the estimate, median of the absolute residuals, and a measure of dispersion. The methods proposed in this dissertation, when applied to real software failure data, give lesserror ^in terms of all the measurement criteria compared to other popular methods from literature. Experimental results show that theregression approach offers a very promising technique in software reliability growth modeling and prediction.
502
Dissertation (Ph.D.)--University of South Florida, 2006.
504
Includes bibliographical references.
516
Text (Electronic dissertation) in PDF format.
538
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
500
Title from PDF of title page.
Document formatted into pages; contains 107 pages.
Includes vita.
590
Adviser: K. M. Ramachandran, Ph.D.
653
Linear regression.
Non-homogenous poisson process.
Power law process.
Non-parametric.
Monotone regression.
Rank regression.
690
Dissertations, Academic
z USF
x Mathematics
Doctoral.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.1648