USF Libraries
USF Digital Collections

Study of FPGA implementation of entropy norm computation for IP data streams

MISSING IMAGE

Material Information

Title:
Study of FPGA implementation of entropy norm computation for IP data streams
Physical Description:
Book
Language:
English
Creator:
Nagalakshmi, Subramanya
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla
Publication Date:

Subjects

Subjects / Keywords:
IP anomaly
Reconfigurable hardware
Randomized algorithm
Hardware design
Traffic analysis
Dissertations, Academic -- Computer Engineering -- Masters -- USF   ( lcsh )
Genre:
non-fiction   ( marcgt )

Notes

Abstract:
ABSTRACT: Recent literature has reported the use of entropy measurements for anomaly detection purposes in IP data streams. Space efficient randomized algorithms for estimating entropy of data streams are available in the literature. However no hardware implementation of these algorithms is available. The main challenge to software implementation for IP data streams has been in storing large volumes of data, along with, the requirement of high speed at which they have to be analyzed. In this thesis, a recent randomized algorithm available in the literature is analyzed for hardware implementation. Software/hardware simulations indicate it is possible to implement a large portion of the algorithm on a low cost Xilinx Virtex-II Pro FPGA with trade-offs for real-time operation. The thesis reports on the feasibility of this algorithm's FPGA implementation and the corresponding trade-offs and limitations.
Thesis:
Thesis (M.S.Cp.)--University of South Florida, 2008.
Bibliography:
Includes bibliographical references.
System Details:
Mode of access: World Wide Web.
System Details:
System requirements: World Wide Web browser and PDF reader.
Statement of Responsibility:
by Subramanya Nagalakshmi.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 59 pages.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 002000372
oclc - 318895528
usfldc doi - E14-SFE0002477
usfldc handle - e14.2477
System ID:
SFS0026794:00001


This item is only available as the following downloads:


Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 002000372
003 fts
005 20090729120554.0
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 090421s2008 flu s 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0002477
035
(OCoLC)318895528
040
FHM
c FHM
049
FHMM
090
TK7885 (Online)
1 100
Nagalakshmi, Subramanya.
0 245
Study of FPGA implementation of entropy norm computation for IP data streams
h [electronic resource] /
by Subramanya Nagalakshmi.
260
[Tampa, Fla] :
b University of South Florida,
2008.
500
Title from PDF of title page.
Document formatted into pages; contains 59 pages.
502
Thesis (M.S.Cp.)--University of South Florida, 2008.
504
Includes bibliographical references.
516
Text (Electronic thesis) in PDF format.
3 520
ABSTRACT: Recent literature has reported the use of entropy measurements for anomaly detection purposes in IP data streams. Space efficient randomized algorithms for estimating entropy of data streams are available in the literature. However no hardware implementation of these algorithms is available. The main challenge to software implementation for IP data streams has been in storing large volumes of data, along with, the requirement of high speed at which they have to be analyzed. In this thesis, a recent randomized algorithm available in the literature is analyzed for hardware implementation. Software/hardware simulations indicate it is possible to implement a large portion of the algorithm on a low cost Xilinx Virtex-II Pro FPGA with trade-offs for real-time operation. The thesis reports on the feasibility of this algorithm's FPGA implementation and the corresponding trade-offs and limitations.
538
Mode of access: World Wide Web.
System requirements: World Wide Web browser and PDF reader.
590
Co-advisor: Srinivas Katkoori, Ph.D.
Co-advisor: Rahul Tripathi, Ph.D.
653
IP anomaly
Reconfigurable hardware
Randomized algorithm
Hardware design
Traffic analysis
690
Dissertations, Academic
z USF
x Computer Engineering
Masters.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.2477



PAGE 1

StudyofFPGAImplementationofEntropyNormComputationforIPDataStreams by SubramanyaNagalakshmi Athesissubmittedinpartialfulfllment oftherequirementsforthedegreeof MasterofScienceinComputerEngineering DepartmentofComputerScienceandEngineering CollegeofEngineering UniversityofSouthFlorida Co-MajorProfessor:SrinivasKatkoori,Ph.D. Co-MajorProfessor:RahulTripathi,Ph.D. HaoZheng,Ph.D. DateofApproval April18,2008 Keywords:IPanomaly,reconfgurablehardware,randomizedalgorithm,hardwaredesign, tracanalysis c r Copyright2008,SubramanyaNagalakshmi

PAGE 2

DEDICATION ForPutta...

PAGE 3

ACKNOWLEDGEMENTS IwouldliketothankDr.SrinivasKatkooriandDr.RahulTripathiforgivingmean opportunitytoworkonthisthesis.Theyprovidedmewiththeinitialkeyreferencesfrom whichtobegintheworkandtocarryoutthisstudy.Theyhaveguidedmethroughoutthe thesis. IwouldliketothankMr.JoeRogersofAcademicComputing,UniversityofSouth FloridaforprovidingmewiththeLANcapturedatawhichIcoulduse,torunsomemy simulationson.IwouldalsoliketothankMr.PeterSchiavoandMr.DanielPrietoofCSE TechnicalSupportfortheirhelp.

PAGE 4

TABLEOFCONTENTS LISTOFTABLESiii LISTOFFIGURESiv LISTOFCODESv ABSTRACTvi CHAPTER1INTRODUCTION1 1.1ThesisMotivation4 1.2ThesisOrganization5 CHAPTER2RANDOMIZEDALGORITHMFORESTIMATINGENTROPYOF DATASTREAMS6 2.1EstimatingtheEntropyNorm6 2.1.1EstimatingtheNumberofCounters s1and s27 2.1.2SimulationResultsforEntropyNormComputation8 CHAPTER3REDUCINGCOUNTERSINENTROPYCALCULATIONALGORITHM11 3.1ProbabilisticAnalysisTools11 3.1.1Chebyshev'sInequality11 3.1.2ChernoBoundonMedianEstimator12 3.1.3NewReducedEstimatesforCounter s213 3.1.4SimulationResultswithReducedCounters14 3.2UsefulInequalitiesforTighteningtheVarianceBound15 3.2.1Boundon X 15 3.2.2BoundonSumofLogarithms16 3.3BoundonVarianceoftheRandomVariable X 18 CHAPTER4HARDWAREIMPLEMENTATIONINXILINXFPGA21 4.1MappingtheAlgorithmintoHardware22 4.2Design1:Counter Module23 4.2.1Design1:Operation23 4.2.2Design1:AdvantagesandDisadvantages26 4.3Design2:Counter ModuleUsingtheBlockSelectRAM(bRAM29 4.3.1DesignConsiderations29 4.3.2Design2:AdvantagesandDisadvantages30 i

PAGE 5

4.4Logarithm Module33 4.4.1DesignoftheLogCircuit33 4.4.2LogCircuitPerformance35 4.5RNG Module36 4.6Integrated Module38 4.6.1HardwareandSoftwareComparisonsoftheIntegrated Module45 4.7Median Module52 CHAPTER5CONCLUSIONS53 5.1AlternativeDesignConsiderations55 5.2FutureWork56 REFERENCES58 ii

PAGE 6

LISTOFTABLES Table1.1AnomaliesandPacketFeaturestobeMonitored[5]3 Table2.1 FHComputationSimulations-RandomTrac9 Table2.2 FHComputationSimulations-SkewedTrac10 Table3.1 FHComputationSimulationswithReducedCounters-RandomTrac14 Table3.2 FHComputationSimulationswithReducedCounters-SkewedTrac14 Table4.1AccuracyandSpeedoftheLogCircuitinComparisonwithMatlab36 Table4.2ComparisonofHardwareSimulationsVsSoftwareSimulations-Part147 Table4.3ComparisonofHardwareSimulationsVsSoftwareSimulations-Part248 Table4.4 FHComputationComparison-OneIntegrated Module49 Table5.1DeviceUtilizationSummaryofFPGASynthesisonXC2VP3055 Table5.2SummaryofDesignCapabilities56 iii

PAGE 7

LISTOFFIGURES Figure3.1UpperBound,TrueValue,andLowerBoundofX m16 Figure3.2BoundandTrueValueofSumofLogarithms17 Figure3.3BoundandTrueValueofSumofSquaresofLogarithms17 Figure4.1ACounter Module24 Figure4.2InitialSetupfor16Counter ModulesSimulationWaveforms27 Figure4.3CountersinOperationSimulationWaveforms28 Figure4.4Counter ModulewithbRAMSimulationWaveforms31 Figure4.5Logarithm ModuleSimulationWaveformsforComputinglog2(106)35 Figure4.6MatlabSimulationofRandomNumberGenerator37 Figure4.7RNG ModuleSimulationWaveforms37 Figure4.8TheIntegrated Module38 Figure4.9Pre-epochPhase:Initialization40 Figure4.10EpochPhase:a41 Figure4.11EpochPhase:b42 Figure4.12EpochPhase:c43 Figure4.13Post-epochPhase:RandomVariableComputationandAveraging44 Figure4.14BlockDiagramofall112ModulesandtheirInterconnect51 Figure4.15Median ModuleSimulationWaveforms52 iv

PAGE 8

LISTOFCODES Code2.1PseudocodeforComputingtheRandomVariable X [13]7 Code4.1PseudocodeofAlgorithmforLogarithmComputation34 v

PAGE 9

StudyofFPGAImplementationofEntropyNormComputationforIPData Streams SubramanyaNagalakshmi ABSTRACT Recentliteraturehasreportedtheuseofentropymeasurementsforanomalydetection purposesinIPdatastreams.Spaceecientrandomizedalgorithmsforestimatingentropy ofdatastreamsareavailableintheliterature.Howevernohardwareimplementationof thesealgorithmsisavailable.ThemainchallengetosoftwareimplementationforIPdata streamshasbeeninstoringlargevolumesofdata,alongwith,therequirementofhighspeed atwhichtheyhavetobeanalyzed.Inthisthesis,arecentrandomizedalgorithmavailable intheliteratureisanalyzedforhardwareimplementation.Software/hardwaresimulations indicateitispossibletoimplementalargeportionofthealgorithmonalowcostXilinx Virtex-IIProFPGAwithtrade-osforreal-timeoperation.Thethesisreportsonthe feasibilityofthisalgorithm'sFPGAimplementationandthecorrespondingtrade-osandlimitations. vi

PAGE 10

CHAPTER1 INTRODUCTION TheInternethasbecomethemostpopularmeansofcommunicationsineveryaspect, fromsimpleinformationexchangetoeconomicactivityandtransferofpersonalandsensitivedata.Itiseconomical,viableandthefastestmediumforinformationexchangefora numberofeverydayactivities.Butithasalsobecomeaneasytargetforfraud,theftand otherformsofattacks.Withincreasedsecurityrisks,ithasbecomecrucialtomonitorand classifyanyanomalousIPtracbehavior.TheconsequencesofattacksontheInternetcan haveseriouseconomicalimpacttotheInternetserviceproviderortheorganizationunder attack.Thequickerwebecomeawareofanattack,thefasterwecanrespondtosafeguard stakeholderinterests.OnewaytomonitortheIPtracinvolvesprocessingandanalyzing thesedatastreams.Whilesoftwareanalyzersarepopularamongnetworkadministrators theyrelyoncaptureoftheIPdatastreaminformationinreal-timeandposteriorio-line analysisofthecaptureddatausingvariousrules[1,2].Fora10Gbit/secrouter,withan assumedaveragepacketsizeof300bytes(backbonenetwork[3]),aoneminutecaptureof packetheadersatpeakloadcangenerateroughly4.6GBofdataandtheanalysisthereof cantakeconsiderabletime.Evenonaslowerrouterof1Gbit/secandaveragepacket sizerangingfrom50bytes(Emailtypetrac)to1500bytes(EthernetMTU{maximum transferunit),2.8GBto95MBrespectively,ofdataneedstobecapturedin1minuteand analyzed.Thusvaluabletimethatcouldbeusedtorespondtoanattackisoftenlostand largestoragerequirementsareanessentialfeatureofsuchsoftwareanalyzers.Ithasalso becomeimportanttodesignsomemeansforsummarizingnetworktracinformationin asuccinctway,sothatonemayperformusefulanalysis(forexample:detectingdenialof serviceattacks,detectingspambroadcast)ofanetworkstreaminreal-time.Additionally, 1

PAGE 11

itmakessensetomaintainsummaryofnetworktracatdedicatedpointsthatmaybe usedatanytimetoanalyzetracrowacrosstwoormorepoints[4]. ThemeasurementofanomalousbehaviorinInternettracusingpacketheaderfeatures suchasaddresses,portnumbersthroughShannon'sentropyhasbeenagrowingresearch areainrecentyears.Entropyhasbeenshowntobeasensitivemeasurefordetectinganomalies[5].Ithasalsobeenshowntopresentlotsofchallenges[6].Automatingtheprocessof detectinganomaliesisquitechallengingbecauseofthedicultyindividingthetracinto whatisnormalandwhatisanomalousbehavior[5].Variationsinthetracrowmight besmallenoughtobeconsideredanomalousatalocallevelbutatamacrolevelitcan bedrownedasinsignifcant.Byusingentropycalculationsonpacketdestination/source addressesanddestination/sourceportnumbersasfeaturesinahigherlevelprincipalcomponentanalysisalgorithm,authorsin[5]showsuccessinsensitiveclassifcationbetween normalandanomalousbehavior.In[8],whiletheentropybasedfeaturesareused,the principalcomponentanalysisalgorithmisshowntoneedconsiderabletuningtoprovide goodclassifcationbetweennormalandanomalousbehavior.Generalconsensusinthe researchcommunityseemstobethatentropyoffeaturesofpacketheadersisasensitive measureforuseinhigherlevelalgorithmstoseparatenormalandanomaloustrac.In Table1.1,theentropyoffeaturestobemonitoredforsuccessfuldetectionofvarioustypes ofattacksfrom[5]isreproduced. ForIPtracdata,theShannonentropyofapacketfeaturewhichtakes1 ;:::n dierent valueswithcorrespondingdiscreteprobabilities piisgivenby H = nXi =1pilog2( pi) log2( n)(1.1 Theinequalityin(1.1isexactwhenthefeatureisuniformlydistributed.Computing entropyrequirestheprobabilities pi.Asimpleapproachtoestimatingtheprobabilities involvesthecomputationofahistogram.Forexample,considertheIPportnumbers rangingfrom0to65535whicharedividedinto3categoriesby[7]asWellKnownPorts (0to1023),RegisteredPorts(1024to49151),andDynamicand/orPrivatePorts(49152 2

PAGE 12

Table1.1.AnomaliesandPacketFeaturestobeMonitored[5] Anomalytype Explanation Packetfeaturestocheck Alpharows UsuallylargevolumeP2P Source&destinationaddress. DOS DenialofServiceAttack Destination&sourceaddress. (distributedorsingle-source) FlashCrowd Unusualburstoftracto Destinationaddress&port. tosingledestination,froma typicaldistributionofsources PortScan Probestomanydestination Destinationaddress&port portsonasmallsetof destinationaddresses Outageevents Tracshiftsduetoequipment Source&destinationaddress failuresormaintenance PointtoMultipoint SPAMbroadcast Source&destinationaddress Worms Maliciouscodebroadcast Destinationaddress&port to65535).In[9]packetswithadestinationportintheWellKnownportsaredividedinto binsof10portnumberseach.Sincepacketswithportnumber80comprisethemajority ofthenetworktrac,theyareseparatedin[9]intoasinglebin.Thisproduces104packet bins.PacketswithdestinationportintheRegisteredPortsaredividedinto482additional bins,witheachbincovering100portnumberswiththeexceptionofthebinthatcovers thelast28portnumbersfrom49124to49151.PacketswithDynamicand/orPrivatePort numbersaregroupedintoasinglebin.ThecapturedIPdatacannowbeanalyzedusing the587dierentbinsandthefrequenciesofeachbindividedbythetotalnumberofpackets analyzedistakentorepresenttheprobabilities pineededtocomputetheentropyin(1.1 Whileentropycalculationsaresimpletoconceptualize,thedirectapproachofhistogram computationandestimatingprobabilitiesfromitsuersfromthesamedrawbackascapture ofdataincurrentsoftwarepackagesviz.largememoryrequirementsandslowanalysis. Usingsuchentropyfeaturesinhigherlevelalgorithms,therefore,alsosuers.Itshould benotedthatwhileportnumberrangeisknownapriori,therangeofpacketaddressesin astreamofIPtraccannotbeknownaprioriandthereforedynamicbinallocationfor histogramcomputationonpacketaddressesmaybeneeded,addingtothecomplexityof suchschemes. 3

PAGE 13

Recently,arandomizedalgorithm[13]thatuseslessmemoryincomputingentropy ondatastreamshasbeenproposed.ThedatastreamviewofInternettracisaimedat avoidingcaptureofthepacketheaderdata.Instead,aseachpacketheaderarrivesdecisions aremadeimmediatelyandprocessingofthatpacketiscompletedbeforethenextpacket arrives.Thusallsuchalgorithmsaresinglepassinnature.Therandomizedalgorithm proposedforentropycalculationsondatastreamshasbeenstudiedondatacollectedfrom routers[14]andshowntobesuccessfulincomputingtheentropyoftheselectedfeaturein ano-linesoftwareanalysis.Thealgorithmusesalargenumberofindependentidentically distributedrandomvariableswhosearithmeticaveragesingroupsof s1arecomputedfrst andthenthemedianover s2ofthesegroupsofthearithmeticaverageiscomputedto estimateameasureoftheentropy.Thevarianceofthefnalestimateisgreatlyreducedby thistwostageprocess. 1.1ThesisMotivation ToprovideareasonableassessmentofattacksrequiresmonitoringofIPtracforreasonabletimeperiods(epochs).Furthermore,iftheintentistousethedetectedanomalies forreal-timecontrolofthetracandstopanattack,thentheanalysishastoberapid. Currently,moststudiesintheresearchliteratureareposteriorianalysisofanomalieson datacollectedfromroutersandtherun-timeofthesoftwareishigh,andtheschemesare notconducivefornearreal-timeapplication.Tobeuseful,suchanomalydetectionschemes havetokeepuptothehighdatarateswiththeanalysisthroughputcomparabletoepochs ofshortdurationsothatresponsemeasurescanbeactivatedsensitively. Whileentropycalculationsaresimpleiftherequisiteprobabilitiesareknown,andit hasbeenshowntobesensitivetodetectanomalies,thedirectapproachofhistogram computationandestimatingprobabilitiessuersfromthesamedrawbackascaptureof dataincurrentsoftwarepackagesviz.largememoryrequirementsandslowanalysis.The randomizedalgorithmof[13]useslessmemoryincomputingentropy.Thisalgorithmhas beenstudiedondatacollectedfromrouters[14]andshowntobesuccessfulincomputing 4

PAGE 14

theentropyoftheselectedfeaturesinano-linesoftwareanalysis.Noattempthasbeen madetoimplementthisalgorithminhardwareintheliterature.Dedicatedhardwarefor suchapplicationsshouldbeseveralordersofmagnitudefasterandthusmaybeconducive totheuseofentropyforreal-timeanalysis.Thisthesisisafeasibilitystudyofimplementing someofthecorepartsoftherandomizedalgorithmforentropycalculationinhardware, usingtheXilinxFPGAtools. 1.2ThesisOrganization InChapter2,therandomizedalgorithmof[13]thatisstudiedinthisthesiswillbe discussedandsimulationresultsofthealgorithminMatlabwillbepresented.Thelimitationsofthealgorithmforhardwareimplementationwillbepointedout.InChapter3,a newderivationforthenumberofcountersneededforthealgorithmof[13]iscarriedout toreducetheoverallnumberofcounters.Anattemptattighteningthevariancebounds derivedintheliteratureforthealgorithmisalsodiscussed.InChapter4,thehardware designofthecountermodulesisconsideredanddesigntradeosareexploredandsimulated,andtheresultspresented.Alogarithmcomputationcircuitalongwithaveraging andmediancalculatingcircuitryneededforpost-epochprocessinginthealgorithmarealso designedandstudied.Thedesignsareintegratedintoamodule.Chapter5concludesthe thesiswithsomepossiblealternativedesignstrategiesandsuggestionsforfuturework. 5

PAGE 15

CHAPTER2 RANDOMIZEDALGORITHMFORESTIMATINGENTROPYOFDATA STREAMS Anumberofresultsusingentropyasameasurementforanomalydetectionpurposes areavailableinliterature[5,9,10,11].Thisthesisfocusesontherandomizedalgorithm forentropyandentropynormestimationbasedontheworkavailablein[13]whichhas beenfurtherstudiedby[14].In[13]themetriccalledentropynorm FHasdefnedin(2.1 isestimated. FH=nXi =1milog2( mi)(2.1 In(2.1itisassumedthatthereare n distinctpacketheaders(packetfeature)1;:::;i;:::n andforeach i,thepacketheaderrepeats mitimesinthestream.Thetotalnumberof packetsinthestreamisPmi= m.Iftheprobabilityofitem i istakenasmi m,theentropy normdefnedin(2.1canberelatedtotheentropyofthedatastreamof m asfollows: H =nXi =1mi m log2m mi=log2(m ) FH m (2.2 Thekeycontributionof[13]isthatthealgorithmforestimating FHhassub-linearspace requirementsfromwhichtheentropy H canbereasonablyestimatedtowithina(1+ ") factor. 2.1EstimatingtheEntropyNorm In[13],theauthorsintroduceanestimator Y defnedin(2.3arandomvariable,which givestheestimatefor FH.Thealgorithmuses s1;s2independentbasicestimators f Xij:1 i s1; 1 j s2g ,whereeach Xijisanindependently,identicallydistributed 6

PAGE 16

randomvariable,astherandomvariable X defnedbelowinCode2.1Theproofthat therandomvariable Y estimates FHtowithina(1 ")factorwitha(confdencelimit) probabilitygreaterthan(1 )isshownin[13].Thechoiceofvaluesforthevariables s1;s2whichrepresentthenumberofcountersneededtocompute Y isbasedon "; as discussedbelow.Itisalsoshownin[13]thatindividualrandomvariable X issuchthat E (X )= FH.Thustherepeatedaveragingoftherandomvariables Xijover s1counters reducesthevarianceoftheestimated FHwhilethemediancomputationoverthesubsequent s2averagesreducesthevariancefurtherandensurestheconfdencelimit. Y =median 1 j s2(1 s1 s1Xi =1Xij)(2.3 InputStream: A = ha1;a2::::ami,whereeach ai f 1 ; 2;::::;n g 1.Choose p uniformlyatrandomfrom f 1;::::m g 2.Let r = jf q : aq= apandp q mgj .Here r 1. 3.Let X = m ( r log2( r ) (r 1)log2( r 1))withaconventionthat0log2(0=0. Code2.1.PseudocodeforComputingtheRandomVariable X [13] 2.1.1EstimatingtheNumberofCounters s1and s2Forany "; (0; 1],in[13]itisshownthatif s1 8Var [ X ] ( "2E ( X )2)and s2ischosenas integer 4log2(1 )thenthefnalestimator Y in(2.3deviatesfrom E (X )bynomore than "E ( X )withprobabilityatleast1 .Assumingthat FH>m ,in[13],itisshownthat theVar [ X ] E ( X )2< [log2( e)+log2( m ) 1]= O log2( m)). > 0isassumedin[13]while isassumedtobe1orlowerin[14].Thecondition=1implies m> 2n [14].Usingthe boundand=1theminimumnumberofcounters s1 8[log2( e ) 1+log2( m )] ="2.For 1 =0 :2and =0 :2with m =106,thecorrespondingvaluesof s1;s2arethus4075 and10respectively.With m =105,thecorrespondingvaluesof s1;s2arethus3411and 10respectively. 7

PAGE 17

2.1.2SimulationResultsforEntropyNormComputation Sincethenumberofcountersneededforthealgorithmisdependentonthequantity m, considerationwasgiventohowmanyofthesecouldftonaFPGAandatthesametime, notunderutilizeit.Basedonaninitialtrialanderrorprocess,anepochsize( m)ofa millionpacketswasfxed.ThealgorithmwascodedinMatlabtoexamineitsperformance. Astreamofsize m =106with n dierentintegersisgeneratedusingMatlabrandom functiontosimulatethepackets.Thenumberofcounters s1;s2arecomputedasperthe abovesubsection.Althoughnotpossibleinpracticebecausethenumberofdistinctpacket headers( n)isnotknownaprioriforagivenepochsize( m),insimulationswehavethe luxuryofcomputingthehistogramofthepacketsinourepoch.Usingthishistogram,the trueentropynormcanbeeasilycomputedinthesimulations.Ineachcase,inTable2.1,the numberofcountersusedis s1=4075;s2=10,respectivelyi.e.,atotalof40750counters with =0 : 2 =0 :2and=1.Tostudythevariabilityofthealgorithmof[13],thesame streamisanalyzedtentimes(withthe40750countersbeingselectedatrandomlocations inthestreameachtime).EachrunofthealgorithmintheMatlabcodetakesabout14 minutesfortheanalysisofamillionpacketstreamonaWindowsdualcorePentiumPC withaclockof1.54GHz.ThenomenclatureusedinTable2.1wheretheresultsofthese simulationsarereportedisasfollows: 1. n =numberofdistinctpacketsinthestream.Therequirementthat m> 2 n is maintainedinthesesimulations. 2. Fhm=Entropynormof[13](frstrunvalueoutofthetenisgiven) 3. Ft=Entropynormcomputedfromthehistogramofstream.Notethatthiscomputationispossibleonlyinsimulations. 4.Percentageerrorinentropynorm=100 j Fhm Ftj Ft.Thisiscomputedforeachrunand themaximumvalueoverall10runsisreported. 8

PAGE 18

5.Theratio Fhm=Ftshouldliewithin0.8to1.20(inatleast80%ofthesimulation runs).Thisisreportedastheminimum,average,andmaximumoverall10runs. Table2.1. FHComputationSimulations-RandomTrac n Fhm Ft %errorinnorm ratioFhm Ft 1500 9 : 3763 106 9 : 3820 106 0.1666 0.9983,0.9998,1.0009 10000 6 : 6476 106 6 : 6512 106 0.1895 0.9984,0.9996,1.0019 50000 4 : 3501 106 4 : 3586 106 0.3529 0.9965,1.0002,1.0032 500000 1 : 4110 106 1 : 4111 106 1.2181 0.9878,1.0005,1.0068 Theabovesimulationswerecarriedoutatahighervalueof =0 :2because,thenumber ofcounters s1,dependsonthisvalue.Asmallervalueof =0.1increasesthecounters byafactorof4times.Inspiteofthe =0 :2and =0 :2,sincetheerrorsincomputing theentropynormisverysmall,itisquiteclearthattheboundsgivenin[13]arequite looseifthetracisconsideredrandom.Randomtracpatternimplieseverypacket headeruptothenumberbeingconsidered( n)isequallylikelytooccurinthestream. Roughly,eachpacketheaderthenrepeats m=n timesinthestream.Tostudythesituation oftracwhichissomewhatmoreskewedasislikelyinanattacktypesituation,another setofsimulationshasbeenconsidered.Inthesesimulationsthestreamisgeneratedfrom twovectors q;f whereeachelementof q isthenumberofindividuallydistinctpacket headersandthecorrespondingelementof f istheirfrequencyofrepetition.Thus q = [1 ; 40700 ; 9299];f =[9301; 2; 1]impliesthatthefrstpacketheaderisrepeated9301times, 40700distinctsuccessivepacketheadersareeachrepeated2timesandthelast9299distinct packetheadersarerepeatedjustonce.Thetotalnumberofdistinctpacketheadersis n =P3 i =1qi=5 104,whilethetotalnumberofelementsinthestreamis m =P3 i =1qifi=105. Therandomnessinthestreamisnowinthelocationoftheparticularpacketheaderwhose repetitionsarespecifedin q;f intheoverallstreamoflength m.Twocasesareconsidered inTable2.2withthecorresponding q;f asgivenbelow:Case1: q1=[1 ; 40700 ; 9299];f1= [9301 ; 2 ; 1],Case2: q2=[5 ; 10 ; 100; 2000; 47885] ;f2=[4823 ; 1000; 100; 4; 1].Astream describedbyvectors q;f isgeneratedinMatlabandthegeneratedstreamanalyzedby 9

PAGE 19

thealgorithm.Eachofthetwocaseshas m =105;n =5 104andtheotheralgorithm parametersare =0 :2 =0 :2and=1.Oneachstream(Casethealgorithmisrunten times.Thecorresponding s1;s2usedforeachrunare3411,10respectivelyandthetotal numberofcountersforthesecasesis s1 s2=34110.Theresultsofthesesimulationsare presentedinTable2.2withsimilarnomenclature.ItisseenfromTables2.1,2.2thatinall Table2.2. FHComputationSimulations-SkewedTrac q;f Fhm Ft %errorinnorm ratioFhm Ft Case1 2 : 0108 105 2 : 0402 105 2.3975 0.9760,0.9966,1.0154 Case2 4 : 8363 105 4 : 7716 105 1 : 4486 0.9991,1.0053,1.0145 cases,thealgorithmseemstoprovideabetterestimatethanprescribedby "; choice.This islargelytheresultofoverestimationofthevarianceboundandthereforesubsequentlythe numberofcountersneededbythealgorithm.Foranepochofamillionpackets,aswill beseenlater,theoverallcountof40750countersisquiteabovethenumberofcounters implementableintheavailableFPGA.While "; canbetightened,thedrawbackisthat thiswouldincreasethenumberofcountersfurther.Thisisthelimitationofthealgorithm givenin[13].Therehavebeennofurtherimprovementstotheseboundsintheliterature. Someattemptstoreducethenumberofcountersinthisthesisisdocumentedinthenext chapter. 10

PAGE 20

CHAPTER3 REDUCINGCOUNTERSINENTROPYCALCULATIONALGORITHM Inthischapter,thecountersneededforthealgorithmin[13]arereducedbyfrstusing astrongerboundonmediancomputationfrom[12]andthenanattemptattighteningthe boundon Var ( X )isshown.Whilemanyofthetechniquesusedinthederivationscarried outinthischapterareavailableintheliterature,theirparticularizationtotheproblemat hand,isnew,inthisthesis.Thebasicprobabilisticanalysistoolsonwhichthealgorithm of[13]restsareChebyshev'sinequalityandChernoboundformedians[12]andtheseare discussednext.ThegoalistouseastrongerresultontheChernoboundformedians from[12]toobtainareductioninoverallcounters. 3.1ProbabilisticAnalysisTools InthissectionChebyshev'sinequalitywhichdescribesthedistributionofarandom variablearounditsexpectationisstated.Thisinequalityisusefulasitholdsindependent oftheunderlyingprobabilitydistributionfunctionoftherandomvariable.Thereafter,the applicationofChernoboundtomediancomputationbasedonatheoremin[12]isalso provided.Thesetworesultsandthedefnitionofabasicrandomvariable X arethebasis oftherandomizedalgorithmof[13]. 3.1.1Chebyshev'sInequality Theinequalitystatesforarandomvariable X withexpectation E (X )andvariance Var (X ),forany a> 0 Pr( jX E ( X ) a) Var ( X ) a2(3.1 11

PAGE 21

Specifcallywith a = "E (X ),Chebyshev'sinequalitystates Pr( jX E ( X ) "E (X )) Var ( X ) "2E ( X )2(3.2 If s1independent,identically,distributedrandomvariableswiththesamedistributionas X arearithmeticallyaveragedthentherandomvariable Y hastheproperties Y =Ps1i =1Xi s1;E ( Y )= E (X );Var (Y )= Var (X ) s1(3.3 ThusbyChebyshev'sinequality(3.2thedistributionof Y satisfes Pr( jY E (X ) "E (X )) Var (X ) s1"2E ( X )2= 1 K (3.4 Thusarithmeticalaveragingoftherandomvariablesensuresthatthenewestimator Y hasareducedprobabilityonthespreadofitsexpectation. K in(3.4isusuallychosena positiveinteger( > 2(fromnextsubsection)).Foragivenchoiceof K s1canbeobtained as s1= KVar (X ) "2E (X )2(3.5 3.1.2ChernoBoundonMedianEstimator Considerindependentidenticallydistributedrandomvariables Yi;i =1 ;:::;s2eachof whichisasdescribedin(3.3Considerthemedianofthese s2randomvariables.Letthis bedenoted Z E ( Z )= E (X ).Inthissubsectiontherelationshipof s2to isdetermined suchthat Pr( jZ E ( X ) "E ( X )) (3.6 12

PAGE 22

Letrandomvariables Q and R denotethenumberof Yisuchthat Yi< (1 ) E (X )and Yi> (1+ )E ( X )respectively,then Pr( jZ E (X ) "E (X ))=PrQ s2 2 or R s2 2 PrQ s2 2+PrR s2 2(3.7 E (Q ), E ( R )areboth s2 K.TheChernoBoundTheorem4.4(aof[12]statesthatfor randomvariablesoftype Q, R theinequalitiesonprobabilitiesontherighthandsideof (3.7canbewrittenfor 0(thisimplies K> 2in(3.8))as: PrQ K 2 s2 K=Pr( Q (1+ ) E ( Q))= e (1+ )1+ !E ( Q )=0 @e (K 2 1) (0 :5 K )0 : 5 K1 As 2 K(3.8 Hence(3.7canbesimplifedto Pr( jZ E ( X ) "E (X )) 20 @e (K 2 1) (0 :5 K )0 : 5 K1 As 2 K (3.9 Foragiven and K ,anoddinteger s2canbesolvedforbyusing(3.9Forexamplewith =0 :2,andfor K =5,6,7,8thecorrespondingvaluesfor s2are15,11,9,7respectively. 3.1.3NewReducedEstimatesforCounter s2Forany "; (0; 1]andinteger K> 2specifedbytheuser,if s1 KVar [ X ] ( "2E ( X )2)and s2isdeterminedasaboveforthechoiceof K ,thenthefnalestimator Y in(2.3deviates from E (X )bynomorethan "E (X )withprobabilityatleast1 .Thetighterboundin theabovesubsectiongivesthevalueof s2as7for K =8with =0 :2and =0 :2.While for[13]thecorrespondingvalueof s2is10.Sinceeachofthe s2estimatesareobtained using s1independentrandomvariables X ,usingasmallervalueleadstolesseroverall count.Forthenumericalvaluesillustratedthereisa30%reductioninoverallcounters ( s1 s2=28525insteadof40750).Computingtheproduct s1 s2forvariousvaluesof K 13

PAGE 23

usingtheresultsoftheprevioussubsections,indicatesthattheoptimumchoiceminimizing theproductisat K =8with s2=7. 3.1.4SimulationResultswithReducedCounters ThesamestreamsusedinChapter2,Section2.1.2arenowusedtocomputetheentropy normwiththereducedcountersasobtainedabovewith =0 :2 =0 :2and=1.The correspondingtableswiththesamenomenclatureasinTables2.1and2.2aregivenin Tables3.1and3.2respectively.InTable3.128525randomvariables(counters)areused fortheepochof106packetheaders.InTable3.2,foranepochof105packetheaders,atotal of23877randomvariables(counters)areused.ThetwocasesinTable3.2areconsideredas inTable2.2withthecorresponding q;f asgivenbelow:Case1: q1=[1 ; 40700 ; 9299];f1= [9301 ; 2 ; 1]andCase2: q2=[5 ; 10 ; 100; 2000; 47885];f2=[4823 ; 1000; 100; 4 ; 1]. Table3.1. FHComputationSimulationswithReducedCounters-RandomTrac n Fhm Ft %errorinnorm ratioFhm Ft 1500 9 : 3827 106 9 : 3820 106 0.1435 0.9986,1.0000,1.0014 10000 6 : 6434 106 6 : 6512 106 0.2313 0.9981,1.0000,1.0023 50000 4 : 3724 106 4 : 3586 106 0.3929 0.9961,1.0007,1.0039 500000 1 : 4013 106 1 : 4111 106 1.6461 0.9835,0.9963,1.0118 Table3.2. FHComputationSimulationswithReducedCounters-SkewedTrac q;f Fhm Ft %errorinnorm ratioFhm Ft Case1 2 : 0796 105 2 : 0402 105 2 : 2569 0.9916,1.0100,1.0226 Case2 4 : 7938 105 4 : 7716 105 1 : 1306 0.9952,1.0028,1.0113 Thesetablesrevealthatthereducedcountersstillproducesentropynormvalueswell withintheboundsproscribedbythe =0 :2and =0 :2.Howeverwiththedecreased countersboththeerrorsandthemin,maxspreadoftheratioofthenormestimateto thetruenormhasincreasedslightlyinmostcases.Thisisonlytobeexpected.Itis importanttorealizethata30%reductioninthecountersstillkeepstheseerrorstoabout 2%evenintheworstcasewhiletheexpectationwiththe =0 :2isthattheerrorswillbe 14

PAGE 24

around(within20%.Thisindicatesthatthevarianceboundimposedtheoretically,based onwhichthenumberofcountersarederived,isveryconservative.Anattempttoreduce thevarianceboundisnextdevelopedinthefollowingsectionsofthischapter. 3.2UsefulInequalitiesforTighteningtheVarianceBound Inthissection,someusefulbounds(inequalitiesareshowneitherfromtheliterature orbyderivingthem. 3.2.1Boundon X Thefollowingboundon X isfrstshown. (log2e +log2( r 1)) < X m =( r log2( r ) (r 1)log2( r 1)) < (log2e +log2( r ))(3.10 Ofthetwoboundsin(3.10theupperboundisavailablein[14]whilethelowerboundis anewderivation.Theupperboundfollowsbyconsidering 2X m= rr ( r 1)r 1= rr r 1r 1= r1+ 1 r 1r 1 1,(1+1 =(r 1))r 1is e ( r 1)(3.12 Thelaststepfollowsfromthefactthatforfnite r> 1,(1+1 =(r 1))r 1(1+1 = (r 1)) is >e andasymptoticallyapproaches e as r !1 Aplotoftheinequalitiesof(3.10isshowninFig.3.1.Theinequalitiesarewithin1% ofthetruevalueof X=m for r 14andwithin5%for r 4.Theright-handinequalityof (3.10willbeexploitedlaterinChapter4. 15

PAGE 25

Figure3.1.UpperBound,TrueValue,andLowerBoundofX m3.2.2BoundonSumofLogarithms Anumberofusefulinequalitiesinvolvinglogarithmicseriesarederivedinthissection. Thelogarithmfunctionisconcave,(secondderivativeis < 0).Interpretingthesuminthe followinglogarithmicseriesasrectangularapproximationoftheareaunderthelogarithm function,aninequalitycanbedevelopedasper(3.13mXr =2log2( r ) < 1 ln(2Zm +1 2ln( x )dx = [( m +1)ln( m +1) 2ln(2 m +1] ln(2 =( m +1)log2( m +1) 2 (m 1)log2( e)(3.13 Thelaststepin(3.13isobtainedbyintegrationbyparts.Theupperlimitoftheintegrationis m +1asthisisneededtoencompassthelastrectangleintheseries.Aplotofthe boundandthetruevaluesforvariousvaluesof m isdepictedinFig.3.2andtheerroras m increasesiswithin2.5%for m> 22. Next,thefollowingsumofsquareslogarithmicseriescanalsobeboundedsimilarly:mXr =2[log2( r )]2<1 ln(22Zm +1 2[ln( x)]2dx =h( m +1)[ln( m +1)]2 2[ln(22 2Rm +1 2ln( x ) dxi [ln(22=( m +1)(log2( m +1))2 2 2log2( e )Zm +1 2log2( x ) dx (3.14 16

PAGE 26

Figure3.2.BoundandTrueValueofSumofLogarithms Intherighthandsideof(3.14theintegralleftcanbeevaluatedintoaclosedformexpression.Thisisnotshownhereasinthetypicaluseofthisinequalityinthisthesis,this integralcancelswithanotheraswillbeseenin(3.20thenextsubsection.Aplotofthe boundof(3.14andthetruevaluesforvariousvaluesof m isdepictedinFig.3.3andthe erroras m increasesiswithin3.5%for m> 24.Twoconservativeinequalitieswhichare Figure3.3.BoundandTrueValueofSumofSquaresofLogarithms usedinthefollowingsectionarenextderived.Pn i =1(log2( mi))2 Pn i =1milog2( mi) k =0 :5283Pn i =1log2( mi) Pn i =1milog2( mi) k =0 :5(3.15 17

PAGE 27

Theapproachtakentoderiveistoassumethereexistsabound k foreachinequalityand thentocalculateitconservatively.With xi=log2( mi)and xi=1forthelefthandsideof thefrstandsecondinequalityin(3.15numeratoranddenominatoronthelefthand sidecanbewrittenasbelowandwithabound k theimplicationin(3.16isobtained: log2(Qn i =1mxii) log2(Qn i =1mmii) k )nYi =1mxi kmii 1(3.16 Theinequalityin(3.16canbeconservativelysatisfedif k maxxi mimi=2 ; 3;::: (3.17 In(3.17thevalueof miis > 1,since1poweranything(in(3.16))is1anddoesnot contributetothebound.Asimplecomputationnowrevealsthatforthefrstinequality themaxin(3.17isat k =0 :5283when mi=3andforthesecondinequalityitisat0.5 when mi=2asclaimedin(3.15 3.3BoundonVarianceoftheRandomVariable X Inthissection,thevarianceoftherandomvariable X isbounded.Whilethesection followsthetightboundsectionderivedin[14],theboundingisdoneusingtheinequalities derivedintheprevioussectionsinanattempttoobtainnewinsightonthevariance. Var( X )= E ( X2) (E ( X ))2(3.18 E (X2)= mnXi =1 miXr =1( r log2( r ) (r 1)log2( r 1))2
PAGE 28

= m2(log2( e ))2+ m nXi =1 miXr =1h2log2( e)log2( r )+(log2( r ))2i !ihmimj(ai aj)2i(3.22 In(3.22thelaststepcanbeseenbycarefulexpansionoftherightmosttwobracketed quantitiesofthepreviousstepandsimplifyingtheresultantexpansionssystematically. 19

PAGE 29

Finally,foruseinChebyshev'sinequality[12],(3.22ismodifedas: Var( X ) E (X )2< 3 : 1 m2+ mPn i =12 miai+ a2 i+2 ai+Pn 1 i =1 ;j>imimj( ai aj)2 (Pn i =1miai)2=3 :1 m2 E ( X )2+m E ( X ) "2 E (X )+Pn i =1a2 i+2 ai E ( X )#+Pn 1 i =1 ;j>imimj( ai aj)2 E (X )2(3.23 Toproceed,assumptionsaremadeonthevalueof E (X )as E ( X ) >m= with > 0 in[13].istakenas1in[14].Theentropyof n packetheadersassuminguniform randomdistributionis H =log2( n).Entropyisrelatedtotheentropynorm[13]by H =( m log2( m ) E (X )) =m where E (X )= FHistheentropynorm.Theexpressionfor variancecannowbeboundedas Var( X ) E ( X )2< 3 :2+3 :53+ O (log2( m))(3.24 Exceptforthelasttermin(3.23therestoftheexpressionscanbeboundedbyfnite quantities.Thelasttermmustbeoftheform O (log2(m))i.e. k (log2( m))where k is conjecturedtobe << 1,basedonthesimulationsconductedinChapter2.However,at thetimeofwritingthisthesis,ithasnotbeenpossibletoproveitmathematically.If thiscanbedone,thenthetotalnumberofcounterscanbereducedsignifcantlywhich willbeveryeectiveforhardwareimplementation.Thiswillbeanareaforfuturework. Whiletheprocedurefollowedhereinboundingthe Var (X )hasbeendonetaking[14] asthestartingpoint,whathasbeenpresentedhereisacompletelydierentbounding techniqueandproducesdierentresults.Primarily,thekeyresultinthissectionisthe closedformexpressionin(3.23whichleadsto(3.24Thoughtheauthorsof[14]claim thattheirboundsarethetightestintheliterature,theauthorsin[13]haveamuchstronger theoreticalbound.Thesameanalysisusedinthischaptertoarriveat(3.23canalsobe extendedtotheexpressionin[13]. 20

PAGE 30

CHAPTER4 HARDWAREIMPLEMENTATIONINXILINXFPGA ThetargetXilinxFPGAchosenforstudyoffeasibilityofthehardwareimplementation istheXC2VP30.Thereasonforthischoiceisthat,theXilinxUniversityProgram(XUP boardisavailableintheUSFlaboratory.ThedesigniscarriedoutusingtheXilinxsoftware toolsversion9.2i.ThesoftwarepackagesavailableinthisversionaretheXilinxISEand theXilinxEDK.XilinxEDK9.2nolongersupportstheXUPboardandtheentiredesign wascarriedoutusingonlytheXilinxISE.XilinxindicatesthatafutureversionoftheEDK willsupporttheboard.TheresourcesavailableintheXC2VP30are:273924-inputLUTs, 13618kbitBlockSelectRAM(bRAM,confgurableinstandarddatawidthsonly)and136 18 18Blockmultipliers.Thereareupto8digitalclockmodules(DCMsavailableonthe chipwhichcanbeconfguredtoincreasetheinternalclockfrequency.Theperiodofthe clockoftheXUPboardis10ns.UsingtheseDCMsthechipcanbeinternallyclocked ataperiodof5ns.ThisisthemaximumclockrateforthebRAMthatwasobtained duringsimulationsforthischip.Thisisalsothemaximumperformanceestimateasgiven bytheXilinxbRAMnotes[18].ModelSim6.1XEstarterversionisthetoolofchoicefor post-routesimulationsofdesigns.Thewaveformsshowninthischapterareobtainedfrom thepost-routedesignsimulatedwithModelSim.Thehardwaredescriptionlanguagein whichdesigniscarriedoutisVerilog. TheUSFAcademicComputingprovidedaminutes'scaptureofIPdata,thathadapproximately2 :6 106packets.Whileitwasindicatedthatthiscapturewasbasedonavery low-tracrowduringo-peakhours,itistobeanticipatedthatnormalIPdatatrac willfarexceedthisnumber.Anonlinefreesoftware[16]wasusedtoextractthiscaptured dataandorganizeitintoitsconstituentIPpacketheaderfeatureslikesource/destination 21

PAGE 31

MACaddress,IPversion,source/destinationIPaddress,source/destinationport,applicationusingtheIPprotocoletc.Asimpleprogramwasthendevelopedtoextractfromthis, onlywhatisrelevantforthisworki.e.IPheadersource/addressinformation.OnlytheIP destinationaddresseswillbeconsideredforallfurtheranalysis. Normallyona1Gbit/secrouter,basedonanassumedaveragepacketsizeof300bytes [3],approximately450 103numberofpacketscanbeexpectedinasecond.Ona10 Gbit/secrouter,thisnumberwouldbeapproximately4 :5 106inasecond.Thenumber ofcountersneededforthealgorithmisbasedonthetotalnumberofpackets m ,fxed beforehand.AsstatedinChapter2,bytrialanderror,anepochsizeof m =106was fxed,takingintoaccounttheresourcesontheFPGAboardbutwithoutunder-utilizing theboard.Henceforallthehardwaredesignsandsimulationstoo,theepochsizeofa millionpacketshasbeenused. 4.1MappingtheAlgorithmintoHardware Thehardwaredesignhastohandle32bitdatabecauseinInternetProtocol,version4 (IPv4thesourceanddestinationaddressesneed32bitsforstorage.Althoughthenewer IPv6haslargeraddressfeldof128bits,sincetheUSFLANdatacaptureshowedIPv4 headers,thisversionwasusedforthestudy.32bitregistersareneededtostorethepacket destinationaddressesandthecountvalue( r inCode2.1)foreachoftherandomlyselected packetswillbestoredin16or32bitcounters.Initially16bitregisterforthecountvalue wasselectedtosavehardwareresources,subsequentlyinthebRAMbaseddesigns32bit countvalueisused.The32bitregisterneededtostorethepacketdestinationaddresses servesadualpurpose:Initiallyitcontainsarandomnumberthatsignifeswhenapacket willbepickedformonitoringandsecondlyfromthatinstantonitstorestheaddressof thepacketheaderthatwasavailableatthattimeinstantonthebus.Tobeabletoselect thepacketheader(addressatatimeinstant,thereisaneedforglobalsample(sequence counttobeavailabletoallmodulessothat,attheappropriateinstant,thepacketheaders canbepickedup.Thereisalsotheneedforaglobalpacketheaderbusonwhichthepacket 22

PAGE 32

addressesareplacedinthedesign.Thecounterassociatedwitheach32bitpacketheader register,containsthecountorinotherwords,thenumberoftimesthatpacketheader addressshowedupinthedatastreamfromitsinstantofsamplingduringtheentireepoch. Thisisthebasicmodulethatimplementsthecomputationof r ,oftherandomvariables X ofChapter2asshowninCode2.1.Thedesignrequires s1 s2suchrandomvariables, andhencethatmanycounters.Onceanepochof m packetheadersisanalyzedforcount values r ,thenpost-processingofthe r tocomputethelogarithmsandthevalueofthe randomvariable X ofstep3oftheCode2.1canbecarriedout.Finallytheaveraging andmediancomputationsof(2.3ofChapter2arealsorequiredinthepost-processing (post-epoch)phase.Lastly,inthepre-epochphase,arandomnumbergeneratortostore initialvaluesinthepacketheaderregisterisalsoneeded.Iftheepochsizeis m =220then a20bitrandomnumbergeneratorwouldsucefromwhichthefrst s1 s2valueswould beselectedandplacedintothepacketheaderregistersbeforeanepochbegins.Thusthe wholealgorithmrequiresthreephases,apre-epochphase(initializationphase),theepoch phase(monitoringphase)andapost-epochphase(computationphase). Ideally,highspeedscanbeachievedduringanepoch,if s1 s2(givenbytheexpressions inChapters2,3)packetheaderregistersandcounterswhichareusedtomonitorthedata stream,canalloperateinparallel.Thisisthebasiccountermoduleandisthefrstdesign considered.Onlyafterexploringthismodule'sdesignspaceormodifcationsofitinthe nexttwosections,isthepost-processingphaseconsideredinsubsequentsections. 4.2Design1:Counter Module TheblockdiagramoftheCounter ModuleisshowninFig.4.1 4.2.1Design1:Operation Theinputtothemoduleconsistsoftwobuses: 1.The32bitwidepacketheaderbus(phbus). 23

PAGE 33

Figure4.1.ACounter Module 24

PAGE 34

2.A16or20bitwide(widthdependsonhowmanytotalpackets m ,areplannedapriori forthecompleteepoch)globalsampleorglobalsequencecounterbus(gscbus).For amillionpackets,thisbusis20bitwide.Shouldthedesigncallformonitoringmore orlesspacketsinanepoch,thenthisbuswidthcanbeincreasedordecreasedto matchthenumberofpacketstobemonitored.Thecontentsofthisbus,isthetime instantforchoosingpacketswhosesamplinginstant,arandomnumber,isstoredin thephregisterduringresetofthismodule.Thevalueonthegscbusisincremented everytimeapacketshowsuponthepacketheaderbus,andissynchronizedwiththe positiveedgeoftheclock. Arandomnumbergeneratorisneededforstoringinitialvalueinthephregisterinthe module.Atthetimeofdesigningthismodule,oneseparaterandomnumbergenerator modulefortheentiredesignwasenvisagedwhichwouldbeusedtoinitializethe32bit phregisterswithrandomvaluesduringreset.Duringtestingofthisdesign,therandom numbergeneratorinVerilogwasusedinthetestbenchfxture.Fortesting,randomnumbers werealsousedtoserveaspacketheaders.Thephbusservedboth,toinitiallycarrythe randomnumbersforsamplingatresetintothephregister,aswellastocarrythepacket headersintheepochphase.Withresethigh,therandomnumberonthephbusisloaded intothephregisterbyexercisingtheldlineoftheregister.Simultaneously,theDriprop inthecircuitiscleared.Intestingsinceonlyrandomnumbersareusedforsimulations, thephbusalwayshasastreamofrandomnumberswitheveryclockcycle. ThetwomodesofoperationintheCounter Moduleisasfollows. 1.Reset/LoadPhase:AhighontheresetlinemakestheoutputoftheDriprop(f/f) low.Theoutputofthef/fservesastheselectsignal S forthemultiplexer.If S is low,thegscbusischosenforcomparison,elsethephbusischosenforcomparison. When S =0thecontentsofthe32bitregisteriscomparedwiththegscbus.If thereisamatch,thecomparatorhasanoutputof1whichthenforcestheoutput oftheANDgateAhighwhichservesastheloadsignalforthe32bitregister.The phbuscontentsthengetsstoredintheregister.Thus,the32bitregisterwhich 25

PAGE 35

originallycontainedtherandomnumberwhichsignalsthetimeinstantatwhicha packetheader(addresshastobepicked,nowstoresthepacketheaderavailableon thephbusandtheripropissetto1. 2.IncrementPhase:When S =1,themultiplexerchoosesthepacketheadercontents forcomparison.Ifthereismatchwiththe32bitphregistercontainingtheIPheader addresstothecontentsofthephbus,thenboththef/fandthecomparatoroutput arehighwhichenablestheotherANDgateBtoincrementthe16bitcountervalue associatedwiththe32bitphregisterinthemodule. Aprojectwith16suchCounter Modulesinparallelwassimulatedtotestthedesignand itspost-routesimulationisshownasinFig.4.2andFig.4.3.Sincenowweneedtoaddress eachofthemodules,thereisadditionalcircuitryneededforanaddressdecoderanda separateaddressenable(aeline.Duringtheinitialphaseofoperation,whentheresetand theaddressenable(aesignalsarebothhigh,randomnumbersavailableonthephbusact getstoredintothe32bitregistersofeverymodule.Theaddresslinesstartataddress0and incrementuntiladdress15.DuringtheresetphasetheDripropsineachmoduleisalso clearedasexplainedabove.Whentheresetgoeslow,thevalueonthegscbusischecked withtherandomnumberinthe32bitregisterofeachmodule,andifthereisamatch, thenthecurrentcontentsofthephbus,areloadedintotheregistersandthecorresponding countervalueissetto1.Thereafter,everytimeanewpacketarrivesonthephbus,these registersarecheckedforamatchandthecorrespondingcountersincremented. 4.2.2Design1:AdvantagesandDisadvantages TheCounter Moduleasdescribedintheprevioussectionisexcellentforspeedas manysuchmodulesoperatinginparallelcanachieveveryhighspeedsonlylimitedby themaximumclockfrequencypossibleontheXUPboardandtheDCMs.Ifthenumberof modules= s1 s2,thenallrandomvariables X canbeoperatedinparallel.Byitsdesign, sinceitisadedicatedmodule,anynumbercanbeadded,providedroutingrequirements tomultipleparallelmodulesaretakencareof.However,TheFPGAXC2VP30hasatotal 26

PAGE 36

Figure4.2.InitialSetupfor16Counter ModulesSimulationWaveforms 27

PAGE 37

Figure4.3.CountersinOperationSimulationWaveforms 28

PAGE 38

of273924-inputLUTs.ThenumberofsuchCounter Modulesthatcanbefttedonthe boardisonlyabout500sinceeachmoduleoccupied54LUTs.Since s1 s2is28525,even onthelargestVirtex-IIProchipintheXilinxfamily,thisdesigncannotbeaccommodated asonlyabout2000Counter Modulescanbeftted.Thelimitationofthisdesignseemsto beonlywithavailableresourcesandnotwiththedesignitselfandhence,thisdesignis abandoned.Analternative,althoughslowerdesign,thatcouldsatisfyboththealgorithm requirementsaswellaspossiblybefttableontheXC2VP30chip,isinvestigatedinthe nextsection. 4.3Design2:Counter ModuleUsingtheBlockSelectRAM(bRAM 4.3.1DesignConsiderations Asdescribedintheprevioussection,theparametersthatweneedtostoreandkeep trackof,aretheIPdestinationaddressandtheassociatedcountervalue.Theaimis tobeabletoft28525packetheadersand( s1 s2=28525)counters.TheIPpacket headersource/destinationheadersneed32bitstoragespace.Thecountvalue(inthecase ofnormalIPtracpatterns)foreachofthesedistinctpacketheadersshouldnormally requirepossiblyonly16bitstostoreacountvalueofnotmorethan65536.Theusageofa bRAMftsthealgorithm'sdatastructureofamatrix,becauseofitsrequirementof s1 s2counters. TheXilinxXUPVirtex-IIProboardhas136bRAMeachofwhichis18kbit[18].The actualavailablenumberofblocksdependonthewidthofthebRAMused.Ifthedata widthismorethan32bitswide,thentheavailablenumberdrops,becausetheXilinxISE throughitsIPcoreoptioncascades/tilestwoormoreblockstoaccommodatethewidth needed.Forexample,sincetheIPaddressesare32bitslong,andifthecountervalues requiredisassumed16bits,thenthememorywidthforthedesignwouldhavebeen48 bits.Butwhenthedesignwassimulatedusing48bitsbRAM,theISEendedupusing3 bRAMsinsteadof1.TheeectivenumberofcountersaccommodatedontheXC2VP30 thenwouldamaximumof11605(136 3 256)countersinsteadof28525counters.Hence 29

PAGE 39

thisdesignhastobechangedforsomethingthatwouldnotendupusingsomanybRAMs, andatthesametime,satisfythecounterrequirementsforanepochofonemillionpackets. Confguringa512by32bRAMasadualportRAMwith256evenaddresslocations servingtoholdpacketheaderaddressand256correspondingoddaddresslocationsholding thecountervalueenablestheuseofjustonebRAMpersuchconfguration.Using112 bRAMsandtherefore256countersperbRAMinthisconfgurationprovidesatotalof 28672countersandthereforeallowsanepochof1millionpacketstobehandled.Since bothevenandoddmemoryaddressescanbesimultaneouslyaccessedandindependently writtento,thisapproachtostoringthecountersandpacketaddressvaluesisquiteecient. Howeverthepenaltyisinspeed.Sincenow,oneachpacketaddressavailableontheph bus,everyevenlocationmustbesearchedtoseeifthecorrespondingpacketheaderexists atthatlocationandifitexiststhenthecorrespondingcounterattheoddaddresswhich issimultaneouslyavailableforincrementingisincrementedandwrittenback.Tohandle theinitializationwithrandomsamplinginstantsattheevenlocations,theinitialrandom numberscanbeloadedthroughafleforthebRAM,thattheISEprovides.However,this worksonlyforsimulationsashardwareresetdoesnotloadthefleagain.Therandom numbergeneratorandentryofdataintothesebRAMlocationswillbeaddressedlaterfor thisdesigninSection4.5.Itistobenotedthatforanepochofamillionpacketheaders, themaximumcountvalueisnotmorethan220andthereforetocodethecountervalue requiresatmost20bitsperpacketheader.Theextrabitsavailableinthecounterscanbe usedtostoreadditionalinformation. Thusbyincreasingthecounterwidthto32bitsinsteadof16bits,abetterdesign emergesduetothelimitationsimposedbytheXilinxtoolswhenusingthebRAMs. 4.3.2Design2:AdvantagesandDisadvantages Thisisagooddesignforfttinginseveralcounterswithinoneblock.Thishardware designalsoftsthegeneraldatastructure(sketch)ofthealgorithm,i.eofthenotionof having s1 s2counters.Severalsuchblocks,inthiscase,16bRAMblockscouldbelinked 30

PAGE 40

Figure4.4.Counter ModulewithbRAMSimulationWaveforms 31

PAGE 41

toget4096counters >s1=4075(fromChapter2)countersforanepochofamillion packets,andtherewouldbe s2=7suchgroupstocoverthe112bRAMs.Furthermore, theadditionalcircuitryneededforcomparingpacketheadersonthephbus,globalsample (sequencecountervaluesandforcontrollingthestatemachineofthismoduleof256 countersinonebRAMoccupies141LUTsandallowsthefttabilityof112suchmodules intotheXC2VP30withadditionalroomtospareforothercircuits.Thedrawbackisnow speed.A5nsclockingperiod(200MHzclockingfrequency)ofthisdesignispossible.To compareapacketheaderwitheachlocationrequiresaREADstatetoreadfrommemory thecontentsofthepacketheaderareaandcountervalue,aWAITstatetosettlethe outputs,aWRITstatewhenallcomparisonsandincrementsaremadeasappropriateand writebacktomemorylocationsofthecontents,aLOOPstatewhereaddressincrement occursorifalldoneafnishsignalisraisedtoprovidehandshakingwiththetestbench moduletoprovidethenextpacketheaderonthephbus.Themodulegoesbacktoa STRTstatewhereitawaitsastartsignalfromthetestbenchmodulewhenthenext packetheaderisapplied.Thusif1 N 256countersarepresentinthebRAM,thenthe smallesttimeperiodbeforethenextpacketheadercanbeappliedis1+ N 4clockperiods. If N is256then1025clockperiodselapsebeforeanewpacketheadercanbeprocessed bythisdesign.Sinceall(112bRAMscanbeoperatedinparallel,witha5nsclocking period,thepacketheaderscanbeappliedtothedesignatintervalsof5.125 s.Howeverit shouldberememberedthatthedesignrequiresadditionalpost-processingcircuitryandthe additionofthesecircuitryandpost-routesimulationsofthecompositedesignmaychange thismaximumclockingperiodduetodelaysintroducedbyrouting.Consequentlyamore realisticexpectationforthisdesignisa10nsclockingperiodleadingtomaximumpacket headerintervalsof10.25 s.ThesimulationsshowninFig.4.4arewitha10nsclocking period.ThedesignallowstradingspeedtonumberofcountersineachbRAM.Inthe XC2VP100withatotalof444bRAMs,using64countersineachbRAMprovidesalmost therequirednumberofcountersandtheoverallpacketheaderthroughputcanbeincreased by4timesfromthenumbersindicatedabove. 32

PAGE 42

Duringtheinitialpartofthiswork,itwasplannedthattheXilinxEDK9.2i,would beusedforsimulatingtheabovedesigns.TheoriginaldesignwastousethePowerPC ontheXUPboardtohandlethepost-processingphaseofthealgorithm.Sinceitwas laterrealizedthatthisversionofEDKdoesnotsupporttheavailableXilinxFPGAboard, additionalcircuitrydescribedinthenextsectionforpost-processingofthecountervalues hadtobedesigned. 4.4Logarithm Module Alogarithm(base2)circuit(logcircuit)isneededinthedesigntocomputetherandom variables XijasshowninCode2.1with X = m ( r log2( r ) (r 1)log2( r 1)).InChapter 3itwasshownthatthisequation(3.10iswellapproximatedforlarge r 4to5%error andfor r 15to1%errorbytherightorlefthandsideof(3.10Tokeeptheerrors introducedbythisapproximationlow,forvaluesof r 15,theequationinCode2.1is precomputedandlookedup,whilebeyondthat,itiscomputedusingtherighthandsideof (3.10Therighthandsideof(3.10reducesthecomputationbyhavingjust1logarithm computationandoneadditioninsteadof2multiplications,2logarithms,1subtractionand 1addition. 4.4.1DesignoftheLogCircuit Theinputtothelogcircuitisapositiveintegerandrepresentsthefnalcountvalueof howmanytimesapacketshowedupinthestream.Sincethemaximumepochisassumed tohandleuptoamillionpackets,themaximumcountervalueiscodedin20bitsandthe logarithmvaluethereforerequiresatmost5bitsforitsintegerportion.(log2(220)=20). Iftheoveralllogarithmvaluehastoftintoan8bitregister,thenitleaves3bitsforthe fractionalpart.ThealgorithmofthelogcircuitisdescribedinCode4.1.Theoutput ofthelogcircuithasanoverallaccuracyofthreeplacesafterthebinarypoint.The maximumerroris 0.125.Duringnormaloperation,thefnalcountvalueforeachofthe countersmaybesmall,butduringanomalies,theycanreachhighvalues,andoneplace 33

PAGE 43

Begin MCnt=0;SCnt=0; ReadInputNumber While(InputNumber 6=0) Begin Shift SCnt++; End SetSize Setbase While((base*Size) InputNumber) MCnt++; ReturnLogNum=SCnt+MCnt End Code4.1.PseudocodeofAlgorithmforLogarithmComputation afterdecimalpointisallthatisrequiredinsuchcases.Sothechoiceofthreeplacesafter binarypointisquiteadequate.Thedesignusesthebasicmultipliermoduleavailableinthe Xilinxlibrary.Therearethreedierentwaysamultiplierstylecanbeconfguredonthe XilinxFPGAandtheseareasfollows:1.Blockmultiplier(preconfguredasan18 18bit multiplierproducing36bitresult.Thereare136suchmultipliersontheXC2VP30chip). 2.LUTmultiplier(bitsizeconfguredasrequired)and3.PipelinedLUTmultiplier(bit sizesconfguredasrequiredbutusuallyrequiresmoreLUTsthanchoice2formultiplier style).Ofthesethree,preliminarysimulationsshowthatthepipelinedLUTmultiplier canbeclockedthefastest,theLUTisthenextfastestandtheBlockmultiplierisslowest forthisapplication.AchoiceoftheBlockmultiplierwouldmean,thatsincetheinputis morethan18bits,theXilinxtoolsuse2Blockmultipliersforeachlogcircuit.ABlock multiplierisecientintermsofoverallLUTutilization,becauseofdedicatedhardware multiplier.LUTbasedmultipliersneedadditionalLUTs.WiththeLUTmultiplierbased design,theLogarithm Moduletakes363LUTs. 34

PAGE 44

Figure4.5.Logarithm ModuleSimulationWaveformsforComputinglog2(106) 4.4.2LogCircuitPerformance Thelogcircuittakesamaximumof29clockcyclestocomputelogarithmofaninteger from1to220andisclockableat5nsperiodandthisistheperiodusedinthetestresults ofthissection.InFigure4.5,thewaveformsofthelogcircuitforcomputingthelogarithm aloneareshown.InTable4.1acomparisonoftheperformanceofthelogcircuitbothin termsofnumericalaccuracyandspeedofcomputationismadewithMatlabcomputations ofthelogrunningona1.54GHzPentium(dualcore)machine.TheMatlabtimingresults inTable4.1arecomputedasanaverageofamillionrepeatedlogarithmcomputations. FromtheresultsofTable4.1itcanbeseenthatthelogcircuitoperatesataclockrate thatis8timesslowerthanthePentiumprocessorclock,butisstillfasterthanMatlab computations.Onlythelogarithmcomputationtimingisshowninthetableasthelookup forcountvalues r 15andadditionneededinequation(3.10canbeaccomplishedinless thanthistimeorinaboutthesametime. 35

PAGE 45

Table4.1.AccuracyandSpeedoftheLogCircuitinComparisonwithMatlab Number LogofNumber Error Timetaken Logcircuit Matlab Logcircuit(ns Matlab(ns 200MHzclk 1 :54GHzprocessor 47 5.625 5.5546 -0.0704 70 203 243 7.875 7.9248 0.0498 90 219 999 10.0 9.9643 -0.0357 100 219 1024 10.0 10 0.0 65 109 7021 12.875 12.7775 0.0975 114 203 9999 13.375 13.2876 0.0874 100 219 25000 14.625 14.6096 -0.0154 115 203 50832 15.625 15.6334 0.0084 120 203 73222 16.25 16.1600 0.09 110 219 111111 16.75 16.7616 -0.0116 130 218 499502 18.875 18.9301 0.0551 130 203 637421 19.375 19.2819 0.0931 130 203 872119 19.75 19.7342 -0.0158 145 218 1000000 20.000 19.9316 -0.0684 145 219 4.5RNG Module TheRNG Moduleisarandomnumbergenerator.XilinxsupportsloadingthebRAM locationswithinitialvalues,usinganexternalfle.However,asstatedinSection4.3.1, thismethodworkswellonlyforsimulations(onetime).Ifthecircuitisresetduringa simulationorattheendofasimulation,thebRAMlocationsdonotholdtheiroriginal initialvalues.Hencearandomnumbergeneratorhastobepartofthedesignalongwith additionalcircuitryneededtoinitializethebRAMlocationswithrandomvaluesonreset. Duringtheinitializationphase,justbeforethebeginningofanepoch,thebRAM(even) addressesareinitializedwithrandomnumbersdenotingthesamplinginstant.Therandom numbergeneratordesignisasimpleLFSR(linearfeedbackshiftregister)withfeedback tappointswelldocumentedintheliteratureforvariousbitsizes[17].Thisdesignischosen becauseMatlabsimulationsofthehistogramshowninFig.4.6ofthegeneratedrandom numbersbythismethodshowsauniformdistribution.Thedesignselectedusesa20bit LFSRandusestappoints[17]atvariousbitlocations,withaninitialseedthatisprovided inthe20bitregisteratreset.Randomnumberswithaperiodicityof1048575i.e(220 1) 36

PAGE 46

Figure4.6.MatlabSimulationofRandomNumberGenerator Figure4.7.RNG ModuleSimulationWaveforms isgeneratedbythisrandomnumbergeneratorandsucesforthemillionperiodepoch. AtestrunoftherandomnumbergeneratorisshowninFig.4.7.Thisissucientfor ourapplicationsasonly28672valuesareneededatmost(112bRAMs 256countersin eachbRAM).Usingonesuchrandomnumbergeneratorpermoduleaddsminimallytothe overallLUTs(justoneadditionalLUT)andaslongastheseedprovidedineachmodule isrelativelyprimewiththeseedsintheothermodules,therandomnumbersequencesfor eachbRAMmoduleisindependent.Fortestsimulationsthemodulewithasmallerword size(8bits)withappropriatetaps[17]isused. 37

PAGE 47

Figure4.8.TheIntegrated Module 4.6Integrated Module TheblockdiagramoftheIntegrated ModuleisshowninFig.4.8.Toimplementthe wholealgorithmforanepochof106packets,112suchmodulesareneeded.Duetodesign androutingconstraints,itwasnecessarytodesignallthreepartsofthealgorithmi.e., pre-epochphase,theepochphase,andthecomputingandlocalaveragingofallthe256 randomvariablesofpost-epochphaseofthealgorithmwithinthesamemodule.Each Integrated ModulecontainsaCounter ModuleusingthebRAM,aRNG Module,anda Logarithm Module.Finallyattheendoftheepoch,itreturnstheaverageofallthe256 38

PAGE 48

computedrandomvariablesi.e.,computes(P256 i =1Xi) =256.Simulationwaveformsrepresentingpre-epoch,epochandpost-epochphasesareshowninFig.4.9throughFig.4.13. 1.InFig.4.9waveformsshowthewritesignalweabeginenabledtowritetoeach(even addresses)ofthe256locationsofbRAM. 2.InFig.4.10waveformsshowbRAMlocation12hasarandomnumber=3 3.InFig.4.11waveformsshowsignalsweaandwebenabledtostorepacketheader9 attimeinstant3.Hererandomnumbersgeneratedbythesimulatoractaspacket headers. 4.InFig.4.12waveformsshowbRAMlocation12containingpacketheader9andthe countervaluei.e.,bRAMlocation13incrementedby1. 5.InFig.4.13waveformsshowrandomvariable X computationofeachcountervalue (oddaddresses)andamovingsumofall256bRAMlocations. 39

PAGE 49

Figure4.9.Pre-epochPhase:Initialization 40

PAGE 50

Figure4.10.EpochPhase:a 41

PAGE 51

Figure4.11.EpochPhase:b 42

PAGE 52

Figure4.12.EpochPhase:c 43

PAGE 53

Figure4.13.Post-epochPhase:RandomVariableComputationandAveraging 44

PAGE 54

4.6.1HardwareandSoftwareComparisonsoftheIntegrated Module AcomparativesummaryofthesoftwaresimulationswithMatlabandhardwaresimulationsinModelSimoftheIntegrated Moduleusingthesamesynthesizedtracisprovided inthissectionintabularformforthewaveformsshownfortherunofFig.4.9through Fig.4.13.Thepurposeistwofold:First,toprovideadetailedcomparisonandverifcation oftheworkingofthehardware,andsecond,toprovideacomparisonofthefnalestimates oftheentropynormprovidedbythehardwareandsoftwareforafewruns. Thesynthesizedtracisgeneratedbyusing1000Verilograndomnumbersinthe testbenchfortheIntegrated Module.A1000packetpost-routehardwaresimulationin ModelSimofoneIntegrated Moduleona1.54GHzPentiumPCtakesapproximately13 hourstorunalthoughthisisonlyapproximately10.29msinactualhardwaresimulation time. ThetracstreamgeneratedbytheVerilogtestbenchandtherandomnumbersgeneratedbythehardwarefortheinitialsamplinginstantsstoredatthe256locationsin bRAMwerecapturedmanuallyandusedinthesoftware(Matlabsimulationstoverify thehardwaresimulationsandcheckthecorrectnessofthehardwaredesign. Justaftertheepochphaseiscompleted,thecontentsofthecountersinbRAMandthe correspondingcountsfromthesoftwareisdepictedinTable4.2as r .Thevaluesmatch exactlyatall256locationsintheIntegrated Module.Thisprovesthecorrectnessofthe epochphasepartofthemoduleaswellasthepre-epochphase(initializationaspectsof thedesignedhardwaremodule. Inthepost-epochphase,therandomvariableforeachofthesecountersiscomputed as X=m = r log2( r ) (r 1)log2( r 1)insoftware(Matlabinroatingpointandprinted inTable4.2to4decimalplaces.Inthehardwarethecorresponding X=m iscomputed usingfxedpointarithmeticasdescribedearlierinthischapterandthesearealsoshownin Table4.2.Theslightdiscrepanciesareduetotheuseoffxedpointarithmeticinhardware versusroatingpointarithmeticinsoftware.Thepercentageerrorbetweenthesetwoare showninthemiddlecolumnofTable4.2. 45

PAGE 55

TheIntegrated Moduleinthepost-epochphasekeepsarunningtotalofthecomputed X=m variables,concurrently.ThiscumulativesumfromthehardwareisshownprogressivelyinTable4.3andthecorrespondingvaluesobtainedfromroatingpointsimulations inMatlabforthesamerunarealsoshownalongwiththeerrorpercentagebetweenthese twoquantities. Finally,thehardwarecomputestheaverageoftherandomvariables X=m byright shiftingby8bits(divideby256)thecumulativetotalinthelastclockcycleofthepost epochphase.Thesoftwarecomputestheaveragebydividingby256inroatingpoint.The resultsare3.1875and3.2375respectively,andrepresent FH=m.Theerrorbetweenthe twovaluesis1.5%.InTable4.3thefnalrowshowstheaveragevalueofthecumulative sumofall256randomvariablesascomputedbythehardwareandsoftware. Theentropynorm FHis X=m(avg) 103,where103isthenumberofpackets m,used inthehardwaresimulationsshowninFig.4.9through4.13.Therefore,thefnalvaluesof 3187.5(3 :1875 103)fromhardwareand3237.5(3:2375 103)fromsoftwarearetheestimatesoftheentropynorm( FH)ascomputedwith256counters(in1Integrated Module) andarewithin1.5%ofeachother. Twoadditional1000packetrunsontwootherdierentdistributionsweremade,toshow comparisonbetweenhardwareandsoftwareestimatesfor FH.Theresultsaretabulated inTable4.4.Eachrunofhardwaresimulationstabulated,representapproximately10.5 msofactualhardwaretimewhichincludesthepre-epoch,epochandpost-epochphases. Therefore,thepost-epochphaseaddsverylittleadditionalhardwarecomputationtime. Tofollowthepatternofthedatastructureofthealgorithm,16suchmodules(256 16= 4096)havetobegroupedtogether,theaverageofallsuchmodules,thenbecomes1of7 ( s2=7)values.Themedianof7suchaveragesmultipliedbytheepochsize,inthiscase, m =106isthevalueoftheentropynorm.Theblockdiagramofthisapproachisillustrated inFig.4.14.Eachmodulehasitsownrandomnumbergeneratorandshouldbeprovided withadierentseedthatisrelativelyprimetotheothers,toensurethatdierentmodules donotpickupthesamepacketsformonitoringpurposes.Initializationisdone,atthe 46

PAGE 56

Table4.2.ComparisonofHardwareSimulationsVsSoftwareSimulations-Part1 HardwareSimulations SoftwareSimulations Counter r X=m %Error X=m r Addr Hex Decimal 1 3 02C 2.7500 0.1774 2.7549 3 3 3 02C 2.7500 0.1774 2.7549 3 5 4 034 3.2500 -0.1506 3.2451 4 7 7 042 4.1250 0.4034 4.1417 7 9 5 03A 3.6250 -0.4255 3.6096 5 11 6 03E 3.8750 0.6445 3.9001 6 13 2 020 2.0000 0 2.0000 2 15 4 034 3.2500 -0.1506 3.2451 4 17 4 034 3.2500 -0.1506 3.2451 4 19 4 034 3.2500 -0.1506 3.2451 4 21 2 020 2.0000 0 2.0000 2 23 3 02C 2.7500 0.1774 2.7549 3 25 6 03E 3.8750 0.6445 3.9001 6 27 6 03E 3.8750 0.6445 3.9001 6 29 7 042 4.1250 0.4034 4.1417 7 . . . . . . . 483 5 03A 3.6250 -0.4255 3.6096 5 485 6 03E 3.8750 0.6445 3.9001 6 487 4 034 3.2500 -0.1506 3.2451 4 489 3 02C 2.7500 0.1774 2.7549 3 491 6 03E 3.8750 0.6445 3.9001 6 493 8 046 4.3750 -0.6090 4.3485 8 495 6 03E 3.8750 0.6445 3.9001 6 497 3 02C 2.7500 0.1774 2.7549 3 499 4 034 3.2500 -0.1506 3.2451 4 501 6 03E 3.8750 0.6445 3.9001 6 503 2 020 2.0000 0 2.0000 2 505 4 034 3.2500 -0.1506 3.2451 4 507 3 02C 2.7500 0.1774 2.7549 3 509 5 03A 3.6250 -0.4255 3.6096 5 511 3 02C 2.7500 0.1774 2.7549 3 47

PAGE 57

Table4.3.ComparisonofHardwareSimulationsVsSoftwareSimulations-Part2 HardwareSimulations SoftwareSimulations Counter CumSumof X=m %Error CumSumof X=m Addr Hex Decimal 1 0002C 2.7500 0.1774 2.7549 3 00058 5.5000 0.1774 5.5098 5 0008C 8.7500 0.0558 8.7549 7 000CE 12.8750 0.1675 12.8966 9 00108 16.5000 0.0378 16.5062 11 00146 20.3750 0.1537 20.4064 13 00166 22.3750 0.1400 22.4064 15 0019A 25.6250 0.1032 25.6515 17 001CE 28.8750 0.0747 28.8966 19 00202 32.1250 0.0520 32.1417 21 00222 34.1250 0.0489 34.1417 23 0024E 36.8750 0.0585 36.8966 25 0028C 40.7500 0.1145 40.7967 27 002CA 44.6250 0.1608 44.6969 29 0030C 48.7500 0.1814 48.8386 . . . . . 483 030E0 782.0000 0.0610 782.4775 485 0311E 785.8750 0.0639 786.3776 487 03152 789.1250 0.0630 789.6227 489 0317E 791.8750 0.0634 792.3776 491 031BC 795.7500 0.0663 796.2778 493 03202 800.1250 0.0626 800.6263 495 03240 804.0000 0.0654 804.5264 497 0326C 806.7500 0.0658 807.2813 499 032A0 810.0000 0.0649 810.5264 501 032DE 813.8750 0.0677 814.4266 503 032FE 815.8750 0.0676 816.4266 505 03332 819.1250 0.0667 819.6717 507 0335E 821.8750 0.0671 822.4266 509 03398 825.5000 0.0649 826.0362 511 033C4 828.2500 0.0653 828.7911 X=m (avg )=3 : 1875(033H ) 1.5444 X=m (avg )=3 :2375 48

PAGE 58

Table4.4. FHComputationComparison-OneIntegrated Module HardwareSimulations SoftwareSimulations PacketHeader FH= X=m (avg ) 103 %Error FH= X=m (avg ) 103 Range 1...255 3 : 1875 103 1.5444 3: 2375 103 1...49 5 : 6875 103 -1.9065 5: 5811 103 0...10 7 : 8125 103 -0.7298 7: 7559 103 timeoftheresetsignal,beforetheepochstarts.Thisintegrateddesignapproachishighly modular,selfcontainedandeasilyscalable. Approximately585LUTsareneededpermoduleandtheXC2VP30cannowonly accommodateoverall27392 =585 46suchmodules.Thusineithercase,usingsingle countermodules(ofSection4.2)orthebRAMbasedCounter Module(ofSection4.3)the XC2VP30cannotholdtheentiredesign. AnalternativedesignapproachwouldbetohavetheCounter ModulewithbRAMas oneseparatemodule,andhaveeitheroneorseveral(optionalLogarithm Modulesasthe othermoduleforpost-epochprocessing.Thisdesignwouldprobablyftcomfortablyinthe XC2VP30FPGA,andthenumberofLUTsusedoverallwouldbereduced.Butatthetime ofconnectingthemodulestogether,itwasrealizedthatXilinxdoesnotsupportenough tri-statebuerstoisolatebRAMsfromacommonbus.Amultiplexeddesignapproach wouldconsumeadditionalLUTstoconnecteachofthe112modulesonebyonetothe Logarithm Modulealongwithadditionalcircuitryforaddressingeachofthesemodules. Thisdesignapproachwouldgreatlyreducethespeedofthepost-epochphase.Hence ithadtobemodifedtosomethingthatwouldaccessthebRAMoutputbus,without buscontention.ThecosttothisisembeddingtheCounter ModulewithbRAMintothe Logarithm Module.ThiswouldtakeadditionalLUTsnotonlybecausetheadditional modulewouldbeembeddedintoeachblock,butalsobecauseoftheadditionalroutingand statesneededtocontroltheIntegrated Module. While112oftheIntegrated ModulesdonotftontheXC2VP30,theXC2VP100chip inthesameVirtex-IIProfamilyofwithresourcesof444bRAMs,444Blockmultipliers 49

PAGE 59

and99 103LUTscanaccommodatetheentiredesignquitecomfortablyasenvisaged. OtherFPGAboardsinthesameVirtex-IIProfamilyhandlethisrequirement. 50

PAGE 60

Figure4.14.BlockDiagramofall112ModulesandtheirInterconnect 51

PAGE 61

Figure4.15.Median ModuleSimulationWaveforms 4.7Median Module TheestimateoftheentropynormasdiscussedinChapter2,isthemedianofall s2=7 values.Ahardwaremodulebasedonasimplesortingalgorithm,theselectionsort,isused tosorttheinputnumbersandpickthemedian.Theinputtothemoduleisasetofvalues, eachofwhich,representstheaverageoftherandomvariablesof16Integrated Modules asshowninFig.4.14,computedduringthepost-epochphase.Thehardwaresimulation waveformsareshowninFig.4.15andthismoduleoccupies309LUTs. 52

PAGE 62

CHAPTER5 CONCLUSIONS Thecurrentworkhasconcentratedontwoaspects:frstofalltofndthepossibilityof reducingthenumberofoverallcounters,andsecondlytostudythefeasibilityofhardware (FPGAimplementationoftherandomizedalgorithmin[13]forentropynormcomputation ofIPdatastreams. InChapter2,sometheoreticalaspectsaswellasMatlabsimulationsofthealgorithm werediscussed.Simulationswerecarriedoutforbothauniformlyrandomdistribution ofpacketheaders,andskeweddistribution.Thisinitialsimulationstudywasnecessary tostudytheoverallalgorithmstructureaswellastogetafeelforthetotalnumberof countersneededforthealgorithm. InChapter3,itwasshownthatitispossibletoreducetheoverallnumberofcounters byabout30%justbyreducingthe s2counters.Thisisnecessarytoftamillionpacket epochconceptuallyontheXC2VP30FPGA.Furthertighteningofthevarianceboundwas attemptedinChapter3,andanewclosedformexpressionforVar ( X ) E [ X ]2hasbeenderived. Althoughithasbeenconjecturedthattheconstant k<< 1forthelastterminexpression (3.24atthetimeofwritingthisthesis,ithasnotbeenpossibletoproveitmathematically, andgiveamuchstrongerboundontheoverallexpressionforVar ( X ) E [ X ]2.Thisareawillbea subjectforfuturework.Atighterboundintheclosedformexpressionasshownin(3.24 willsignifcantlyreducethenumberofcountersoverall,andhencewouldbemuchmore feasibleforapossibleASIC/FPGAimplementation. InChapter4hardwaredesignsalongwiththeirsimulationswerepresented.Adesign approachhasbeenexploredwithdesigntrade-os.Constantmodifcationshadtobe madeinordertobalanceroutingtoallmodules,whileatthesametimecomeupwitha 53

PAGE 63

designthatcouldftintothehardwareandoperateatspeedscapableofreal-timepacket handling.Theaveragingofthelogarithmcomputationsofallthe256countersavailable in1bRAMandmediancomputationscanbeaccomplishedaftertheepochiscompleted andtheentropynormcomputationisachievableinnearreal-timeinaFPGA.Table5.1 summarizesthetotalnumberofLUTseachdesignedmoduleoccupies. ThehardwaredesignsdiscussedinChapter4canbroadlybeclassifedassupporting threedierentphasesofoperationofthealgorithm. 1.Pre-epochphase Inthisphase,theregistersthatultimatelystorethepacketheadersmustbeinitializedwithrandomnumbersgeneratedbytheRNG Module(Section4.5).Ifthe Integrated Moduledesignapproachisconsidered,sincethiswillbereplicatedwith everymodule,initialseedstoeachofthesemoduleshavetobesuchthatthesame randomsequenceisnotreplicated.Thisiseasilyachievedbychoosingnumbersthat arerelativelyprime. 2.Epochphase Counter Module(Section4.2)andCounter ModulewithbRAM(Section4.3)discussedtwodesignoptions.Inthisphase,packetheaderswillbestoredintoregisters basedonthetimeinstantintheregisters,andmonitoringofthepacketswillbedone. Countersassociatedwitheachpacketheaderincrementseverytimeapacketheader oncesampledrepeatsinthestream.Theepochforthisstudywasfxedat1million packets. Thefrstdesignisexcellentforhighprocessingspeeds.Row1ofTable5.2shows packethandlingcapabilityofthisdesign.Theseconddesignisatradebetween speedanddeviceutilization.ThebRAMlocationsrepresentingthecounterscontain thefrequencyofpacketheadersforthatepoch. 3.Post-epochphase Logarithm Module(Section4.4)andMedian Module(Section4.7)areneededto 54

PAGE 64

Table5.1.DeviceUtilizationSummaryofFPGASynthesisonXC2VP30 Designtype bRAMblock Logic Route Noof Utilization through 4inputLUTs Counter Modulea None 37 46 83 Counter Moduleb None 593 271 864 Counter ModulewithbRAM 1 85 62 147 Logarithm Module None 343 20 363 Median Module None 293 16 309 RNG Module None 1 1 Integrated Module 1 540 44 584 a1ModuleSimulatedb16ModulesSimulatedcomputetheentropynormoftheIPdatastream.Inthisphase,therandomvariablesassociatedwitheachcounterofthebRAMwillbecomputedusingtheLogarithm Moduleandaverageofalllocationsof1bRAMwillbecomputed.Several suchmoduleshavetobegrouped,andfurther,theiraveragecomputed.Themedian ofseveral( s2)suchaverageswillgive(FH=m ). Thefnaldesign,Integrated Module(Section4.6)presentedishighlymodular,self containedandscalable.Forlargescaleapplicationstheexistinghardwarecanbescaled upbyaddingIntegrated Modules.Shouldthedesigncallforentropynormestimationof otherpacketfeatures,thisinformationcanbeloadedonthebus,insteadofthedestination addressthatispresentlyused.Shouldsimultaneousentropynormcalculationsbedesired formultiplefeaturessuchasdestinationaddressesandportnumbersthenduplicationof themodulesandbuseswillbeneeded. 5.1AlternativeDesignConsiderations InChapter4,theCounter ModuleusingthebRAMwasdiscussed.Thedesignconsideredhad256bRAMlocationsforcounters,i.e.,fullyutilizedeachbRAMavailable,for optimality.Thisleadstoslowerspeedofoperationwhileprovidingbetterutilizationof theFPGA.Higherprocessingspeedscanbeachievedifthedesignconsidered,utilizedonly 32bRAMlocationsforcounters.InTable5.2row5,theincreaseinthepackethandling 55

PAGE 65

Table5.2.SummaryofDesignCapabilities Designtype Noof PacketHandlingCapability clockcycles packets/min Counter Module 2/3 4Billion 8Billion 200MHzclock 400MHzclock RAMBased 256counters 1025 5.85Million 9.75Million 100MHzclock 166.66MHzclock 32counters 129 46.5Million 77.5Million 100MHzclock 166.66MHzclock capacitywiththisapproachisshown.Butthen,morebRAMswouldbeneededandan increaseinroutingshouldalsobetakenintoaccount. 5.2FutureWork Thenextgoalwouldbetofndhowtointerfacewiththedesignedhardwareandto studytheperformanceoftheresultanthardwareonIPdatatrac. Thecurrentstudyisbynomeanscompleteandexhaustiveintermsofbeingableto detectanomalies.Duetothechallengingnatureoftheproblemarea,sustainedanddetailed statisticalstudieshavetobemadeusingentropyresultsofnotjustonepacketfeaturebut simultaneouslyseveraldierentfeaturestoarriveatconclusions.Thistakesconsiderable amountofeort.Studieshavetomadeatdierenttimesofthedaysoverseveralmonths tostudythetracpatternandthenusethisinformationtodrawsomeconclusionsabout anomalousbehaviorofthestream.Thechallengesthatlieaheadwouldbeinthefollowing issues: 1.Whatthresholdtoconsider,fortheentropynorm,inordertosetupaconclusive boundarybetweennormalandanomaloustracbehavior. 2.Whatisagoodepochperiodformonitoringsuchtracpatterns. Futurestudyshouldalsofocusonreducingthenumberofcountersbecauseavastlyreduced numberofcounterswillbemoreconducivetoanASIC/FPGAimplementation.Onthe 56

PAGE 66

hardwareside,focusshouldbeoninterfacingthedatafromnetworklinkstoprovide appropriateinputtotheexistinghardwareforentropynormcomputations. 57

PAGE 67

REFERENCES [1]http://www.snort.org,SNORT:Theopensourcenetworkintrusiondetectionsystem. [2]V.Paxson\Bro:Asystemfordetectingnetworkintrudersinrealtime,"Proceedings ofthe7thUSENIXSecuritySymposium,SanAntonio,TX,January1998. [3]L.L.Peterson&B.S.Davie,\ComputerNetworks:Asystemsapproach,"4thedition, ElsevierPublishingInc.,2007. [4]PersonalcommunicationswithDr.R.Tripathi,UniversityofSouthFlorida,Tampa. [5]A.Lakhina,M.Crovella&C.Diot\Mininganomaliesusingtracfeaturedistributions,"ComputerCommunicationsReview,Vol.35,No.4,pp.217-228,2005. [6]A.Soule,F.Silveira,H.Ringberg&C.Diot,\Challengingthesupremacyoftrac matricesinanomalydetection,"Proc.7thACMSIGCOMMconferenceonInternet measurement(IMC2007pp.105-110,2007. [7]http://www.iana.org/assignments/port-numbers,INTERNETASSIGNEDNUMBERSAUTHORITY. [8]H.Ringberg,A.Soule,J.Rexford&C.Diot,\SensitivityofPCAfortracanomaly detection,"SIGMETRICS/07,ACMSIGMETRICSPerformanceEvaluationReview, Vol.35,No.1,pp.109-120,2007. [9]Y.Gu,A.McCallum&D.Townsley,\Detectinganomaliesinnetworktracusing maximumentropyestimation,"Proc.5thACMSIGCOMMconferenceonInternet measurement(IMC2005pp.345-350,2005. [10]A.Wagner&B.Plattner,\EntropybasedwormandanomalydetectioninfastIP networks,"14thIEEEInternationalworkshopsonEnablingTechnologies.InfrastructuresforCollaborativeEnterprises(WETICE),STCAsecurityworkshop,Linkoping, Sweden,June2005. [11]K.Xu,Z.Zhang&S.Bhattacharya,\Proflinginternetbackbonetrac.Behavior modelsandapplications,"ProcACMSIGCOMM,2005. [12]M.Mitzenmacher&E.Upfal, \ProbabilityandComputing:Randomizedalgorithms andprobabilisticanalysis," CambridgeUniversityPress,2005. [13]A.Chakrabarti,K.D.Ba&S.Muthukrishnan,\Estimatingentropyandentropy normondatastreams,"LNCS3884,pp.196-205,SpringerVerlag,2006. 58

PAGE 68

[14]A.Lall,V.Sekar,M.Ogihara,J.Xu&H.Zhang,\Datastreamingalgorithmsfor estimatingentropyofnetworktrac,"Proc.ACMSIGMETRICS/06,pp.145-156, June2006. [15]XilinxUniversityProgramVirtex-IIProDevelopmentSystem\HardwareReference Manual,"UG069(v1.0March8,2005. [16]http://www.wireshark.com [17]DallasSemiconductor,MAXIM,APPLICATIONNOTE1743\Pseudo-RandomNumberGenerationRoutinefortheMAX765xMicroprocessor,"Sept25,2002. [18]XilinxApplicationNotes,\UsingBlockRAMinSpartan-3FPGAs,"XAPP463 (v1.1.2July23,2003. 59