USF Libraries
USF Digital Collections

A VLSI architecture for Rijndael, the advanced encryption standard

MISSING IMAGE

Material Information

Title:
A VLSI architecture for Rijndael, the advanced encryption standard
Physical Description:
Book
Language:
English
Creator:
Kosaraju, Naga M
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla.
Publication Date:

Subjects

Subjects / Keywords:
cryptography
aes
hardware architecture
real time key scheduling
Dissertations, Academic -- Computer Engineering -- Masters -- USF   ( lcsh )
Genre:
government publication (state, provincial, terriorial, dependent)   ( marcgt )
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Summary:
ABSTRACT: The increasing application of cryptographic algorithms to ensure secure communications across virtual networks has led to an ever-growing demand for high performance hardware implementations of the encryption/decryption methods. The inevitable inclusion of the cryptographic algorithms in network communications has led to the development of several encryption standards, one of the prominent ones among which, is the Rijndael, the Advanced Encryption Standard. Rijndael was chosen as the Advanced Encryption Standard (AES) by the National Institute of Standard and Technology (NIST), in October 2000, as a replacement for the Data Encryption Standard (DES). This thesis presents the architecture for the VLSI implementation of the Rijndael, the Advanced Encryption Standard algorithm. Rijndael is an iterated, symmetric block cipher with a variable key length and block length. The block length is fixed at 128 bits by the AES standard 4. The key length can be designed for 128,192 or 256 bits. The VLSI implementation, presented in this thesis, is based on a feed-back logic and allows a key length specification of 128-bits. The present architecture is implemented in the Electronic Code Book(ECB) mode of operation. The proposed architecture is further optimized for area through resource-sharing between the encryption and decryption modules. The architecture includes a Key-Scheduler module for the forward-key and reverse-key scheduling during encryption and decryption respectively. The subkeys, required for each round of the Rijndael algorithm, are generated in real-time by the Key-Scheduler module by expanding the initial secret key. The proposed architecture is designed using the Custom-Design Layout methodology with the Cadence Virtuoso tools and tested using the Avanti Hspice and the Nanosim CAD tools. Successful implementation of the algorithm using iterativearchitecture resulted in a throughput of 232 Mbits/sec on a 0.35mu CMOS technology. Using 0.35mu CMOS technology, implementation of the algorithm using pipelining architecture resulted in a throughput of 1.83 Gbits/sec. The performance of this implementation is compared with similar architectures reported in the literature.
Thesis:
Thesis (M.S.Cp.E.)--University of South Florida, 2003.
Bibliography:
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
Statement of Responsibility:
by Naga M. Kosaraju.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 93 pages.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001441490
oclc - 53962291
notis - AJM5930
usfldc doi - E14-SFE0000163
usfldc handle - e14.163
System ID:
SFS0024859:00001


This item is only available as the following downloads:


Full Text

PAGE 1

AVLSIArchitectureforRijndael,theAdvancedEncryptionS tandard by NagaMKosaraju Athesissubmittedinpartialfulllment oftherequirementsforthedegreeof MasterofScienceinComputerEngineering DepartmentofComputerScienceandEngineering CollegeofEngineering UniversityofSouthFlorida MajorProfessor:Dr.MuraliR.Varanasi,Ph.D. Dr.N.Ranganathan,Ph.D. Dr.SanjuktaBhanja,Ph.D. DateofApproval: November13,2003 Keywords:Hardwarearchitecture,AES,cryptography,real timekeyscheduling cCopyright2003,NagaMKosaraju

PAGE 2

DEDICATION ToMyGrandFather

PAGE 3

ACKNOWLEDGEMENTS IextendmydeepestgratitudetomymajorprofessorDr.Mural iR.Varanasiforhisinvaluable suggestionsandsupportthroughoutmyMaster'sprogram.It hankDr.Varanasiforgivingmethe oppurtunitytoworkonthistopic.Withouthispatienceandh isvaluablesuggestions,thisthesis wouldnothavebeenaccomplished.IamalsogratefultoDr.Ra nganathanandDr.Bhanjafor guidingmeasmycommitteemembers. IamfortunatetohavebeensupportedbyDr.Dunning,whosein uenceextendswellbeyondthis thesis.IwouldalsoliketothankUSGeologicalSurveyforex tendingnancialsupportduringmy lastsemesterofresearch.Iwouldliketotakethisoppurtun itytothanktheDepartmentofComputer ScienceandEngineeringforalltheirhelpandcooperationb yprovidingtheexcellentlaboratory facilitiesduringmythesiswork.Iwishtothankmyfamilyan dfriendsfortheirencouragementand moralsupportthroughouttheseyears.

PAGE 4

TABLEOFCONTENTS LISTOFTABLES iii LISTOFFIGURES iv ABSTRACT vi CHAPTER1INTRODUCTION 1 1.1Encryption 2 1.1.1PrivateKeyEncryptionanditsCharacteristics2 1.1.1.1DataEncryptionAlgorithm31.1.1.2InternationalDataEncryptionAlgorithm31.1.1.3BlowshAlgorithm 4 1.1.2PublicKeyEncryptionanditsCharacteristics5 1.1.2.1RSA 6 1.1.2.2ElGamal 6 1.1.2.3LUCAlgorithm 7 1.2ThesisOrganization 11 CHAPTER2BACKGROUND 13 2.1RelatedWork 13 2.1.1DataEncryptionStandardAlgorithmandItsVariation s13 2.1.2AdvancedEncryptionStandardAlgorithms14 2.2RequirementsofAdvancedEncryptionStandardAlgorith m17 2.3DescriptionandImplementationofAESCandidateAlgori thms18 2.3.1MARS 18 2.3.2RC6 19 2.3.3Rijndael 19 2.3.4Serpent 20 2.3.5Twosh 21 2.4BlockCipherModesofOperation 21 2.5Cryptanalysis 25 CHAPTER3RIJNDAELALGORITHMANDITSARCHITECTURE27 3.1RijndaelAlgorithm 27 3.1.1Dataunit 29 3.1.1.1ByteSubstitutionTransformation(ByteSub)293.1.1.2ShiftRowTransformation(ShiftRow)323.1.1.3MixColumnTransformation(MixColumn)353.1.1.4RoundKeyAddition(AddRoundKey)37 3.1.2KeyUnit 38 i

PAGE 5

3.1.2.1KeyExpansion 39 3.1.2.2KeyScheduling 40 CHAPTER4DESIGNOFSUBSYSTEMS 42 4.1HardwareArchitectureandVLSIImplementation42 4.1.1ImplementationofDataUnit 44 4.1.1.1ImplementationofByteSubstitutionTransformati on44 4.1.1.2ImplementationofShiftRowTransformation474.1.1.3ImplemenatationofMixColumnTransformation494.1.1.4ImplementationofRoundKeyAdditionTransformati on51 4.1.2ImplementationofKeyScheduling514.1.3IterativeImplementation 59 4.2OrderofImplementationoftheTransformations604.3ImplementationAnalysis 62 4.4MemoryArchitectureOptimization 62 CHAPTER5SIMULATIONANDPERFORMANCE64 5.1DesignFlow 64 5.2PerformanceEvaluation 74 CHAPTER6CONCLUSIONS 76 REFERENCES 78 ii

PAGE 6

LISTOFTABLES Table1.1BlockState 9 Table1.2KeyState 9 Table3.1LengthofExpandedKeyforVaryingKeySizes28Table3.2ShiftOffsetsforDifferentBlockLengths32Table3.3DataBlockRepresentedinaStateMatrix34Table3.4StateMatrixAftertheShiftrowTransformation,i nEncryption34 Table3.5StateMatrixAftertheShiftrowTransformation,i nDecryption34 Table4.1TheAfneMappingOperation 48 Table4.2TheInverseAfneMappingOperation 48 Table4.3ListofRoundConstantsforeachStandardRound59Table5.1ComponentsoftheAES-128Module 74 Table5.2SummaryofthePerformanceoftheAES-128Module75 iii

PAGE 7

LISTOFFIGURES Figure1.1EncryptionandDecryptionwithaSymmetricCiphe r3 Figure1.2EncryptionandDecryptionwithaPublicKeyCiphe r6 Figure1.3RijndaelEncryption 11 Figure2.1ElectronicCodeBookMode 23 Figure2.2CipherBlockChainingMode 23 Figure2.3CipherFeedBackMode 24 Figure2.4OutputFeedBackMode 25 Figure3.1ByteSubstitutionTransformation 30 Figure3.2MultiplicativeInverseusingS-Box 31 Figure3.3ShiftRowTransformationforEncryption33Figure3.4ShiftRowTransformationforDecryption34Figure3.5MixColumnTransformationforEncryption36Figure3.6MixColumnTransformationforDecryption37Figure3.7KeyAdditionTransformation 38 Figure3.8KeySchedulingforEncryption 40 Figure3.9KeySchedulingforDecryption 41 Figure4.1ToplevelViewoftheRijndaelAlgorithm43Figure4.2StandardRoundArchitecture 45 Figure4.3HardwareImplementationofAfneMappingandits InverseforaByte48 Figure4.4HardwareImplementationofAfneMappingandits Inversefor128Bits49 Figure4.5HardwareImplementationoftheShiftRowTransfo ramtionfor32-bits50 Figure4.6HardwareImplementationofComputationofY,Zin MixColumn Transformation 52 iv

PAGE 8

Figure4.7HardwareImplementationofMultiplicationbyX( hex”02”)53 Figure4.8HardwareImplementationofMixColumnTransform ationfor128bits54 Figure4.9HardwareImplementationoftheShiftRowTransfo ramtionfor32-bits55 Figure4.10HardwareImplementationofRoundKeyAdditionT ransformation56 Figure4.11KeySchedulingforEncryptionandDecryptionPr ocesses57 Figure4.12ImplementationoftheRoundConstants 58 Figure4.13128-128(block-key)KeyAlignment 60 Figure4.14OrderofOperations 61 Figure5.1NanoSimEnvironment 66 Figure5.201and10PropagationDelaysfora2-inputCMOSXORGate67 Figure5.31BitCounterOperation 68 Figure5.4Register 68 Figure5.5ANDOperation 69 Figure5.61-bitRAMimplementation 69 Figure5.7MultiplicativeInverseLayoutforEncryptionan dDecryption70 Figure5.8AfneandInverseAfneMappingLayoutforEncryp tionandDecryption71 Figure5.9MixColumnandInverseMixColumnTransformation Layoutfor EncryptionandDecryption 72 Figure5.10KeyGenerationLayoutforEncryptionandDecryp tion73 v

PAGE 9

AVLSIARCHITECTUREFORRIJNDAEL,THEADVANCEDENCRYPTION STANDARD NagaMKosaraju ABSTRACT Theincreasingapplicationofcryptographicalgorithmsto ensuresecurecommunicationsacross virtualnetworkshasledtoanever-growingdemandforhighp erformancehardwareimplementationsoftheencryption/decryptionmethods.Theinevitabl einclusionofthecryptographicalgorithmsinnetworkcommunicationshasledtothedevelopment ofseveralencryptionstandards,one oftheprominentonesamongwhich,istheRijndael,theAdvan cedEncryptionStandard.Rijndael waschosenastheAdvancedEncryptionStandard(AES)bytheN ationalInstituteofStandardand Technology(NIST),inOctober2000,asareplacementforthe DataEncryptionStandard(DES). ThisthesispresentsthearchitecturefortheVLSIimplemen tationoftheRijndael,theAdvanced EncryptionStandardalgorithm. Rijndaelisaniterated,symmetricblockcipherwithavaria blekeylengthandblocklength. Theblocklengthisxedat128bitsbytheAESstandard[4].Th ekeylengthcanbedesignedfor 128,192or256bits.TheVLSIimplementation,presentedint histhesis,isbasedonafeed-back logicandallowsakeylengthspecicationof128-bits.Thep resentarchitectureisimplementedin theElectronicCodeBook(ECB)modeofoperation.Thepropos edarchitectureisfurtheroptimized forareathroughresource-sharingbetweentheencryptiona nddecryptionmodules.Thearchitecture includesaKey-Schedulermodulefortheforward-keyandrev erse-keyschedulingduringencryption anddecryptionrespectively.Thesubkeys,requiredforeac hroundoftheRijndaelalgorithm,are generatedinreal-timebytheKey-Schedulermodulebyexpan dingtheinitialsecretkey. TheproposedarchitectureisdesignedusingtheCustom-Des ignLayoutmethodologywiththe CadenceVirtuosotoolsandtestedusingtheAvantiHspicean dtheNanosimCADtools.Successfulimplementationofthealgorithmusingiterativearchit ectureresultedinathroughputof232 Mbits/secona0.35CMOStechnology.Using0.35CMOStechnology,implementationofthe vi

PAGE 10

algorithmusingpipeliningarchitectureresultedinathro ughputof1.83Gbits/sec.Theperformance ofthisimplementationiscomparedwithsimilararchitectu resreportedintheliterature. vii

PAGE 11

CHAPTER1 INTRODUCTION Theincreasedrelianceoncomputersystemsandinformation sentovernetworksmakesitessentialtotakestepstoprotectthesystemsandinformationfro mknownrisks.Onvastnetworkssuchas theInternetwithnocentraladministrator,theriskiseven greateraseverycomputeralongtheroute thatthedatatraversescanattackwhatisbeingsentorrecei ved.Fortunately,numeroustechniques havebeendevelopedtokeepthedatasecureandprivate.Thee ssentialtechnologyunderlyingvirtuallyallautomatednetworkandcomputersecurityapplicati onsisknownasencryption.Encryption wasprimarilyusedformilitaryandespionageuse.Theneedf orsecuretransactionsine-commerce, virtualprivatenetworksandsecuremessaginghasmovedenc ryptionintothecommercialrealm. Someoftheusesofencryptionofinformationare:Encryptionensuresdataintegritybyprotectingthedatafr ombeingcorruptedormodied. Checksumandhash-functiontechniquesareusedtoprovidet hedataintegrity.Authenticationofusersisprovidedbyencryptionbychecki ngtheidentityoftheuser.RSA andtheDigitalSignatureAlgorithm(DSA)arethemostcommo nlyusedmethodsforthe authentication.Encryptionfacilitatesnon-repudiationandensuresthatt heendpointuserscandenytheirparticipation.RSAandDataEncryptionStandard(DES)areused toestablishthenon-repeatable connections.Condentialityismaintainedbyencryption.Encryptionus essymmetricandasymmetric encryptionalgorithmssuchasTriple-DESandBlowshforma intainingthecondentiality. 1

PAGE 12

1.1Encryption Encryptionistheprocessofdisguisingreadablecommunica tions(plaintext)asascrambleof characters(ciphertext).Alltheencryptionalgorithmsar ebasedontwogeneralprinciples,the fundamentalrequirementbeingthatnoinformationislost.Substitution,inwhicheachelementintheplaintextismapp edintoanotherelement.Transformation,inwhichelementsintheplaintextarerear ranged. Thestrengthoftheencryptionisdependentonthedifculty indiscoveringthekey.Thedifcultyindiscoveringthekeydependsonthelengthofthekeya ndthecipherused.Hence,thekey lengthdeterminesthestrengthoftheencryption.Twotypes ofencryptionkeysystemsaregenerally used.PrivateKeyEncryptionPublicKeyEncryption 1.1.1PrivateKeyEncryptionanditsCharacteristics PrivateKey(Symmetric)Encryption:PrivateKeyencryptio nisalsoreferredtoasconventional orsymmetricorsingle-keyencryption.Thisencryptionisu sedinmilitary,governmentandprivate sectorapplications.Theprivatekeyencryptionalgorithm smakeuseofasinglekeyforencryption anddecryption.Alltheusersinvolvedinthetransferofdat ashareasinglekey.Thesecurityofthe conventionalencryptiondependsonthesecrecyofthekey.T heuseofthesinglesecretkeyforboth theencryptionanddecryptionisshownintheFigure1.1.The algorithmsaredesignedinsucha waythatitisimpracticaltodecryptamessageonthebasisof theciphertextandtheknowledgeof theencryption/decryptionalgorithmusedwithouttheknow ledgeofthekey. Thesealgorithmsoperateonxedsizeplaintextblocks(blo ckciphers)orastreamofplaintext bits(streamciphers).Usually,morethanoneroundofanonlineartransferfunctionisusedtoproducetheciphertextfromplaintext.Thisstructureiscalle dasFeistelstructure,inventedbyFeistel ofIBMintheearly70s.ThemainadvantageoftheFeistelstru ctureisitseasyinversionfordecryption.DEA,IDEAandBlowsharesomeexamplesoftheprivatekeysymmetriccryptosystems. Thefollowingsubsectiondescribesthesecryptographicsy stemsbriey. 2

PAGE 13

PlainText Encryption CipherText Decryption PlainText Key Figure1.1.EncryptionandDecryptionwithaSymmetricCiph er 1.1.1.1DataEncryptionAlgorithm TheDataEncryptionStandard(DES)istheFederalInformati onProcessingStandard(FIPS 46-1)[63],[64],whichdescribestheDataEncryptionAlgor ithm(DEA).TheDEAhasalsobeen denedintheANSIstandardX9.32.TherstdraftofDEA,know nasLucifer[80],wasdeveloped byIBM.NSA(NationalSecurityAgency)andNBS(NationalBur eauofStandards)contributedto thenalstagesofthedevelopment[10]ofthealgorithm.The DEA,alsocalledasDES,becamea federalstandardin1976.DEAhasa16roundFeistelstructur eanditoperateson64-bitplaintext blocksandrequiresa56-bitsecretkey(the8paritybitsare removedfromthe64-bitsecretkey blocktogeneratethe56-bitsecretkey).Themainadvantage oftheDEAisitsFeistelstructureas itiswellsuitedforhardwareimplementation.Themaindisa dvantageoftheDEAistherelatively smalllengthofitssecret-key(56bits)whichmakesthealgo rithmsusceptibletociphertext-only attacks.Severalbrute-forcerecoveriesofthe56-bitDEAs ecretkeyhavebeenreportedrecently. 1.1.1.2InternationalDataEncryptionAlgorithm IDEAisauniversallyapplicableblockencryptionalgorith mwhichprovidesaneffectiveprotectionoftransmittedandstoreddataagainstunauthorizedac cess.TheIDEAalgorithmwasoriginally proposedunderthenamePES(ProposedEncryptionStandard) byLaiandMassey[49].Theauthorsimprovedtheoriginalalgorithmagainstdifferentia lcryptanalysisdemonstratedbyBihamand Shamir[11]andchangeditsnametoIPES(ImprovedProposedE ncryptionStandard).Thename, IDEA,wassuggestedin1992.IDEAisasecret-keyblockciphe ralgorithm.Theplaintextand ciphertextblocksare64bitswidewhilethesecretkeyis128 bitswide.Thekeysequenceisusually userspecied.Aguidelineforgeneratinglongkeysequence srequiredbytheIDEAalgorithmcan 3

PAGE 14

befoundin[50].IDEA,asecret-keyblockcipheralgorithm, usesa128-bitsecretkeyandoperates upon64-bitplaintextblocks.TheIDEAalgorithmhas8round sofoperationsfollowedbyanoutput transformation.Thealgorithmissymmetricandtheencrypt ionprocessisthesameasthedecryptionprocess.Thedecryptionsubkeysarecalculatedfromth eencryptionsubkeysandtheirorder ismodiedtogivetheinverseofnonlineartransferfunctio nusedduringencryptionprocess.The secretkeylengthof128-bitmakesexhaustivesearchofthek eyspaceimpractical.ThemostsignicantcryptanalyticresultagainstIDEAisduetoDaemenetal .[55],whodiscoveredalargeclassofweakkeyswhichcanberecoveredeasily.Thealgorithmiscon sideredsafefromcryptanalysis. However,acertainclassofweakkeyshavebeenidentiedand differentialattacksarebeingtried againstsomevariantsofthealgorithm.1.1.1.3BlowshAlgorithm Schneier[77]developedBlowsh,a64-bitblockcipher.Ith asaFeistelstructurewith16 rounds.Eachroundconsistsofakey-dependentpermutation andkey-and-data-dependentsubstitution.Alloperationsinthealgorithmarebasedonbitwiseex clusive-ORandadditionmodulooperations.Thesecretkeycanhaveavariablelength(maxim umbeing448bits)anditisusedto generateseveral32-bitsubkeys.Eachroundhasitsownseto fsubkeys.Themainadvantageofthe Blowshcipheristhatitisdesignedfor32-bitmachinesand issignicantlyfasterthantheDEA. Thealgorithmcanbeoptimizedforencryption/decryptiont hroughputratesbyusingon-chipstatic storagetostorethesubkeysforeachround.Thealgorithmis consideredsafe,however,acertain classofweakkeyshavebeenidentiedanddifferentialatta cksarebeingtriedagainstsomevariants ofthealgorithm. Theconventionalencryptionalgorithmsareattackedeithe rbybruteforcemethodsorbycryptanalysismethods.Cryptanalysisisaformofattackthatatt acksthecharacteristicsofthealgorithm todeduceaspecicplaintextorthekeyused.Privatekeyenc ryptionhasvemajorparts:Plaintext-Thisisthetextmessagetobeencrypted.EncryptionAlgorithm-Itperformsnecessarymathematical operationstoconductsubstitutionsandtransformationstotheplaintext. 4

PAGE 15

Ciphertext-Thisistheencryptedorscrambledmessageprod ucedbyapplyingtheencryption algorithmtotheplaintextmessageusingthesecretkey.SecretKey-Thisistheinputforthealgorithmastheencrypt edoutcomedependsonthevalue ofthekey.DecryptionAlgorithm-Thisistheencryptionalgorithminr everse.Itusestheciphertext,and thesecretkeytoderivetheplaintextmessage. Characteristics:Thetypicalcharacteristicsofconventi onalencryptionalgorithmare:Largertheblocksizeandthekeysize,thegreateristhesecu rity.Thelargerblocklength increasestherangeofpossiblepatternsthatcanbeapplied atinput/outputofasequenceof rounds.Thisextendstheattacksbyonemoreroundofoperati ons.Thekeythatisgivenasinputtothealgorithmisusedtogener atetherequirednumberofsubkeysusingthesub-keygenerationalgorithm.Thegreaterth ecomplexityofthealgorithm,the greateristhedifcultyofcryptanalysis. 1.1.2PublicKeyEncryptionanditsCharacteristics Theconceptofpublic-keyencryptionwasproposedbyDifea ndHellman[30]inordertosolve thekeymanagementproblem.Theschemerequireseachuserto gettwokeys-onepublicandone private.Eachencryption/decryptionprocessrequiresatl eastonepublickeyandoneprivatekey. Thepublickeyisusedfortheencryptionandprivatekeyisus edfordecryption.Thepublickeys withusernamesarepublishedinadirectorywhereitispossi bleforanyonetolookitup.This allowsacompletestrangertosendanencryptedmessagefora user.Sincethedecryptionrequires privatekey,onlythevalidrecipientofthemessagecandecr yptthemessage.Theuseofthepublic andprivatekeysforboththeencryptionanddecryptionissh owninFigure1.2.RSA,ElGamal, andLUC[46]areexamplesofthepublic-keycryptographicsy stems.Thefollowingsubsections describethesecryptographicsystems. 5

PAGE 16

PlainText Encryption CipherText Decryption PlainText Public KeyPrivate Key Figure1.2.EncryptionandDecryptionwithaPublicKeyCiph er 1.1.2.1RSA Rivest,Shamir,andAdleman[65]proposedtheRSApublic-ke yencryptionalgorithmin1978. ThealgorithmhasbeensuccessfullyimplementedinthePGP( PrettyGoodPrivacy)systemand theNetscapeNavigator.Thealgorithmprovidespublic-key encryptionandameansforsigning documents.RSAalgorithmtakestwolargeprimenumbers,and,andcomputestheirproduct .Aisselectedsuchthat and, arerelativelyprimetoeachother. Since,and havenofactorsincommonexcept1,avaluencanbeobtainedsuch that n iscompletelydivisibleby .Thevaluesandnarecalledasthepublic andprivateexponentsrespectively.[,n]isthepublickeypairand[,]istheprivatekeypair. ThesecurityoftheRSAalgorithmisbasedontheassumptiont hatfactoringintoprimenumbers iscomputationallydifcult.Ifitispossibletofactorintoand,thealgorithmcanbeeasily broken.TheRSAalgorithmcanalsobeusedasasignaturealgo rithmthatcanbeusedtosignand authenticatedata.1.1.2.2ElGamal TheElGamal[33]public-keycryptosystemisbasedonthedis cretelogarithmproblem.ItconsistsofbothencryptionandsignaturevariantsliketheRSA algorithm.Theencryptionalgorithmis similarinnaturetotheDife-Hellman[30]keyagreementpr otocol[30].TheElGamalsignature algorithmissimilartotheElGamalencryptionalgorithm.T heElGamalencryptionalgorithmtakes alargeprimenumberandaninteger.Thepublickey,,hastheform r wherenistheprivatekey.Inordertoencryptthemessage,theencry ptionroutineneedstocalculate arandomnumbersuchthat andthencalculate and where 6

PAGE 17

isthemessagetobeencryptedandrepresentsbitwiseexclusive-ORoperation.Theciphertex t block canthenbedecodedas r .ElGamalalsoproposedasignature algorithmthatissimilartotheElGamalencryptionalgorit hm.Thepublickeyandtheprivatekey havethesameforminthesignaturealgorithm.Themaindisad vantageofElGamalcryptosystemis theneedfortherandomnessingeneratingandslowspeedofoperationofthesignaturealgorithm. TheDigitalSignatureAlgorithm(DSA)isbasedontheElGama lsignaturealgorithm. 1.1.2.3LUCAlgorithm SmithandSkinner[79]proposedtheLUCpublic-keycryptosy stem.Thecipherimplementsthe analogsofDife-Hellman[30],ElGamal,andRSAoverLucass equences.Lucassequencesused incryptosystemsarethegeneralsecond-orderlinearrecur rencerelationsgivenas whereandarelargeprimenumbers.Theencryptionmakesuseofiterati verecurrence unlikeexponentiationasinRSAandDife-Hellman[30].LUC DIFistheLucassequenceanalog ofDife-Hellman[30],LUCELGandLUCRSAofElGamalandRSAr espectively.LUC,however, isnotassecureastheexponentiationbasedalgorithmssuch asRSAandElGamal. Thepublic-keyencryptionalgorithmshavefoundanincreas inguseontheInternetformessage authenticationandintegrity.Theideaofhavingtwodiffer entkeysforencryptionanddecryption isthateventheknowledgeofonekeyshouldnotgiveinformat ionabouttheotherkey.PublicKey Encryptionhassixmajorparts:Plaintext-Thisisthetextmessagetobeencrypted.EncryptionAlgorithm-Itperformsnecessarymathematical operationstoconductsubstitutionsandtransformationstotheplaintext.PublicandPrivateKeys-Thisisapairofkeyswhereoneisuse dforencryptionandtheother fordecryption.Ciphertext-Thisistheencryptedorscrambledmessageprod ucedbyapplyingtheencryption algorithmtotheplaintextmessageusingkey.DecryptionAlgorithm-Usingthematchingkeythisalgorith mdecipherstheciphertextto producetheplaintext. 7

PAGE 18

DataEncryptionStandardhadbeenusedasthestandardofthe U.S.NationalInstituteforSecurityTechnologies(NIST)asFIPSPUB46since1977[77],[12] ,[43].Itisasymmetrickeyblock cipherwithablocklengthof64andandakeylengthof64.Ofth is64bits,56bitsarerandomly generatedandareusedbythealgorithmandtheother8bits,a reusedforerrordetection.Theyare settomaketheparityofeach8-bitbyteofthekeyodd.Inthee ncryptionmodeoftheDES,theblock tobeencryptedissubjectedtoanintialpermutation,thent o16roundsofcomplexkey-dependent computations,andnallysubjectedtoainverseinitialper mutation.Inthedecryptionmode,same algorithmisusedontheencryptedblock,takingcarethatea chiterationofthecomputationusesthe sameblockofkeybitsaswasusedintheencryption.Therewer emethodsimplementedtoincrease thestrengthoftheDESencryptionalgorithm[8],[44],[72] .TherewerevariationsoftheDESdue totheinsufcientsecurityoftheDES.Oneofthevariations isTriple-DES.ThekeyfortheTripleDESconsistsofthreeDESkeys.U.SGovernmenthasinitiated forthedevelopmentofanAdvanced EncryptionStandardAlgorithmasDESandTriple-DESarevul nerabletocryptanalyticattacks. InOctober2000,NationalInstituteofStandardsandTechno logy(NIST)haschosentheRijndaelalgorithmtobeadoptedastheAdvancedEncryptionSt andardbytheU.S.Departmentof Commerce,replacingtheagingDataEncryptionStandard(DE S),whichhasbeenthestandard since1977[3].TheRijndaelalgorithmwasdesignedbyJoanD aemenandVincentRijmenasa candidatealgorithmfortheAES[27].Rijndaelalgorithmis around-basedsymmetricblockcipher,whichprovidesaneffectiveprotectionoftransmitte dandstoreddataagainstcryptanalytic attacks[23],[60],[81]. Rijndaelalgorithmisaniterated,symmetricblockcipher[ 22],[26]thatencryptsanddecrypts datain128-bitdatablocks(B)usinga128-bitor192-bitor2 56-bitkey(K).Thealgorithmconsists ofaninitialround-keyaddition,therequirednumberofsta ndardroundsandanalround.Each standardroundhasfourdifferenttransformationsthatare appliedonthedatablocksequentially. Theplaintextandciphertextsare128or192or256bitswide. TheAESstandard[4]hasxedthe datablocklengthtobe128bitswide.Adatablocktobeencryp tedbyRijndaelissplitintoanarrayof bytes,andeachencryptionoperationisbyte-oriented.The algorithmhasdifferenttransformationsto beappliedonthedatablockandtheintermediateresultisca lledState.TheBlockStateisrepresented asarectangulararrayofbytes.Thisarrayhas ,numberofcolumnsandhas fourrowsrepresenting32-bitword.Soforablocklengthof1 28bits,theBlockStatehasfourrows 8

PAGE 19

Table1.1.BlockState Table1.2.KeyState andfourcolumns.TheKey(K)isalsorepresentedasarectang ulararray.Thenumberofrowsin theKeyStateisfourandthenumberofcolumns, .Therefore,theKeyState hasfourrowsandfourcolumnsforakeylengthof128bits.Bot htheKeyStateandtheBlockState arearrangedincolumnmajororder.Therepresentationofth eBlockStateandtheKeyStatecanbe shownasinTable1.1andTable1.2respectively.Eachentryi nthematrixisof8bitsor1bytein length.The128bitlengthdatablockcanberepresentedinth eBlockStateas16entries. Theroundkeygeneratedfromthekeyschedulemoduleusingth einitialkeyshouldbeoflength equaltothedatablock,whichis128bits.Thetotalnumberof standardroundsthatshouldbe implementeddependsonthedatablockandkeylength[9].For adatablocklengthof128bits,the numberofroundsis10or12or14forakeysizeof128or192or25 6bitsrespectively.Thepresent implementationfocusesondatablocklengthof128bitsandk eyoflength128bits.Thenumber ofstandardroundsthatneedstobeimplementedis includingthenalround.Hence thenumberofround-keysneededis11.These11roundkeysare obtainedfromtheinitialkeyby expansionofthekey[78].Thenumberofroundkeybitsiscalc ulatedas .Forablock lengthof128bitsisequalto4.Hencethetotalnumberofroundkeybitsare1408 i.e.,4432-bit words. Rijndael'sstandardroundconsistsoffoursteps[51].Inth erststep,an8x8S-boxtransformationisappliedtoeachbyte.Thesecondandthirdstepsareli nearmixinglayers,inwhichtherows ofthearrayareshifted,andthecolumnsaremixed.Inthefou rthstep,subkeybytesareXORed intoeachbyteofthearray.Thenalroundissimilartothest andardroundexceptthatitdoesnot 9

PAGE 20

containthethirdstepi.e.,mixingofcolumns.Theseoperat ionsareperformedon128-bitblocks. TheRijndaelencryptionmodulecanbebrieydepictedasinF igure1.3.TheRijndaelalgorithm consistsofthefollowingmathematicaloperations:Bitwiseexclusive-OR(XOR)oftwon-bitblocksMultiplicativeinverseofabyteover Cyclicalleftshiftofn-bitblocksoveracertainoffsetPolynomialmultiplicationover ,modulus .LogicalAND,OR,NAND,NORoperations SincetheadoptionofRijndaelalgorithmasAES,thealgorit hmhadbeenimplementedinsoftwareusingC,C++,JAVAandAssemblyLanguages[25],[38].So ftwareencryptionismoreexible andallowsdifferentalgorithmstobeimplemented.Theadva ntagesofthesoftwareimplementation areeaseofuse,portabilityandexibility.Softwareimple mentationofferlimitedphysicalsecurity withrespecttokeystorageasitcanbealteredeasily.Hardw areimplementationofthecryptographic algorithmsismorephysicallysecureandcannotbeattacked aseasilybycryptanalysts.Hardware encryptionusesapurpose-specialchipforencryption,whi lesoftwareencryptionusesageneral purposecomputertoexecuteencryptionasaprogram.Hardwa reencryptionhastheadvantagesof speedandalsoofbeingresistanttotamperingoraccidental change,butcanlimittheexibilityof adeviceandisoftenmoreexpensivethansoftwareencryptio n.Hardwareimplementationsprovide signicantlyhigherprocessingspeedthansoftwareimplem entations.Inrecentyears,manyhardwarearchitectureswereproposedbyStefanMangardet.al[5 2],Sklavoset.al[61],Satohet.al[74]. Inthisthesis,ahardwareimplementationofRijndaelalgor ihtmisinvestigatedandiscompared withanalogousarchitecturesreportedintheliterature.T heproposedarchitectureprovideshigh performance,maintainingthechipsizetobesmallusingthe samedesignforsimilaroperationsin bothencryptionanddecryptionmodes.Highperformanceisa chievedusingparallelismandthe feedbacktechinque. 10

PAGE 21

ByteSubShiftRowMixColumnRound Key Addition ByteSubShiftRowRound Key Addition Plain Text Standard Round Final Round Cipher Text Figure1.3.RijndaelEncryption 1.2ThesisOrganization Inthisthesis,thearchitectureforimplementingtheRijnd aelalgorithmisproposed.Theproposedarchitectureisoptimizedforhighthroughputinterm softheencryptionanddecryptiondata ratesusingfeedbacktechnique[13].Themultiplicationby apolynomialisimplementedusingXOR operation[61]insteadofusingmultiplierssothatthecomp lexityofimplementationdecreases.In thepresentarchitectureboththeencryptionanddecryptio nprocessesareoperatedonthesamedevice.Thearchitectureisproposedinsuchawaythatboththe modesusethecommonhardware resources[52].ThearchitecturewasdesignedusingCadenc eVirtuosodesignlayoutandsimulated 11

PAGE 22

usingAvantihspiceandnanosimtechnology.Theencryption anddecryptionthroughputratesof 232Mbit/swereachievedatasystemclockfrequencyof20MHz .Abriefreviewofrelatedwork onAdvancedEncryptionStandard,hardwareissuesofallthe veAEScandidatealgorithmsand themodesofoperationarepresentedinChapter2.Thedescri ptionoftheRijndaelalgorithmis presentedinChapter3.Thedesignoftheoperations[48],[5 2],[61],[78],[56]inthealgorithmis discussedindetailinChapter4.Thedesignmethodology,si mulationresultsandtheperformance estimatesarepresentedinChapter5.Conclusionandfuture plansarediscussedinChapter6. 12

PAGE 23

CHAPTER2 BACKGROUND Inrecentyearsmanyresearchershaveproposedhardwarearc hitecturesforthecryptographic algorithmsthatcanbeimplementedinVLSI.Thefeaturesofh ardwareimplementationhaveresultedinasignicantamountofworkatthedesignofthearch itecturetoincreasethethroughput. AnumberofarchitectureshavebeenproposedfortheVLSIimp lementationoftheDESalgorithm anditsvariations,AESalgorithms,someofwhicharediscus sedhereinbrief.Wehavediscussed thebasicrequirementsoftheAdvancedEncryptionStandard algorithms.Wehavealsoprovideda briefsummaryoftheimplementationofthefourotheralgori thmsconsideredforAES.Themodes ofoperationinwhichthecryptographicalgorithmscanbeim plementedarealsodiscussed.Wenow presentasummaryoftheseworksclassiedintoDES,AESalgo rithms. 2.1RelatedWork2.1.1DataEncryptionStandardAlgorithmandItsVariation s DEShasbeenapopularsecretkeyencryptionalgorithmstand ardforAEStillrecentlywhenit wasreplacedbyRijndael.Itwasandisstillisusedinmanyco mmercialandnancialapplications asitisresistanttoallformsofcryptoanalysis.Thealgori thmscanbeimplementedinsoftware aswellasinhardware.Hardwareimplementationsgiveasign icantimprovementsinspeedby exploitingperformanceenhancementfeatureslikeparalle lism,pipeliningandothermethodslike loopunrollingetc,. SNLDESASIC[69]wasdevelopedbytheSandiaNationalLabora tories.Itwasthefastestimplementationwithaspeedthatis10timesfasterthanthethe nknowncurrentlyavailableDESchips accordingtoasurveybySNL.TheSNLDESASICchipisahighspe edfullypipelinedimplementationwhichprovidesencryptionanddecryptionwithaunique keyinput.Itisanalgorithmbypassing oneachclockcyclewhereforeachclockcycledatamaybeencr yptedordecryptedusingaunique 13

PAGE 24

keyormaybepassedonwithoutanychange.Whenoperatedon64 bitdataat105MHzthethroughputwasgreaterthan6.4Gbpswhilethesimulationsshowedit capableofspeedofover9.8Gbps.It wasfabricatedusing0.6micronCMOStechnologyanditsoper ationalfrequencywastestedover voltagerangeof4.5to5.5Vwithatemperaturerangeof-55to 125degreesCandconsumeda powerof6.5Watts. Trimbergeretal[82]describestheimplementationandopti mizationofEncryptionanddecryptionforFPGAcorewhichhasadatarateof8.4Gbpswith16cycl esoflatencyand12Gbpsfor 48cyclesoflatencyandthecoretakesakeywhichencryptsan ddecryptsdatabothofwhichmay changeonacycletocyclebasis.ThedesignwasVerilogsimul atedandtargetedforFPGA.The implementationusesamultiplexertoselectthekeybitsdep endingontheroundandonwhetherthe dataisencryptedordecrypted.Lookuptablesof64x4areuse dfortheimplementationofS-box calculationsandapipelineregisterhasalsobeenusedtost oretheresultsofS-boxcalculations.The resultingcircuitranat132MHzencryptingat8.4Gbpsfora1 6cyclesoflatency.Thespeedofthis designisapproximatelythreetimesfasterthanthethenfas testcomparableFPGAimplementations. Mistry[54]proposedanewVLSIarchitecturefortheInterna tionalDataEncryptionAlgorithm (IDEA).Theproposedarchitectureisa16-bitcoprocessort hatoperatesunderthecontrolofamaster processor.Thedatapathcomprisesofasemi-systolicarran gementof9processingelements.The rst8processingelementsimplementthe8roundsofoperati onoftheIDEAalgorithmandthe lastprocessingelementimplementstheoutputtransformat ion.Itincludesahighspeedmultiplier designandanovelalgorithmforcalculatingthemultiplica tiveinverseoftheencryptionsubkeys inachievedbyextendingthebinaryGCDalgorithm.Thearchi tecturewasimplementedusingthe CadenceOpusDesignSuiteandBerkeleyVLSICADtools.Theen cryptionanddecryptionratesof 2GBit/swereachievedatasystemclockfrequencyof125MHz.2.1.2AdvancedEncryptionStandardAlgorithms AsthetechnologygrewDESwasnotenoughtogivesufcientse curity,thusanewAESwas selectedbyNISTfromvariousalgorithms.ThusRijndaelbec amethenewAESinOctober2000 replacingDESbecauseofit'senhancedsecuritylevel. 14

PAGE 25

Mangardetalintheirpaper[52]discusstwoimplementation softhearchitectureproposedfor theRijndaelalgorithm.Thehighperformanceversionprovi desenoughthroughputtobeusedasan accelerationmoduleinhigh-endservers.Theproposedarch itectureisverymodularandprovidesa highlevelofscalability.Inthisarchitecture,combinati onalpathsarerelativelyshortandbalanced. TheS-boxesareimplementedusingthecombinationallogic. PipelinedimplementationoftheSboxisimplementedinthearchitecture.Standardversionda taunitconsistsof16datacellsand4 S-boxes.Highperformanceversionofthearchitecturecons istsof16datacellsand16S-boxes. BothversionsoftheAES-128modulewereimplementedwithVH DLandweresynthesizedfor a0.6CMOSprocess.Thestandardversionneeds64clockcyclesand thehighperformance versionneeds34cyclestoperformanAES-128encryptionord ecryption.Thestandardandhigh performanceversionsachieveathroughputof128Mbits/sec and241Mbits/sec. ThedesignofSatohetal[74]requires54clockcyclestoperf ormanencryption,andhasa throughputof311Mbits/sec.Thegatecountisbasedonacore datapathwithoutmechanismsfor I/O,CBCregisters,orakeystorage.Thecriticalpathinclu desthefourtransformationswithina clockcyclewhichcausesadelay. DifferentarchitecturaloptimizationsfortheVLSIimplem entationoftheRijndaelalgorithm, wereproposedbyKuoetal[48].Therearetwomodulestothisi mplementation,oneistheencryptionmodulewhichgeneratestheintermediateencryptionda taandakeyschedulingmodulewhich generatestheintermediateroundkeysusingtheinitialkey .Intheimplementationofthealgorithm onlyonehardwareisusedforencryptionanditisreusedtoco mpletethewholeencryptionprocess toconservemostareaandkeysaregeneratedinrealtimetore ducetheamountofstorageforbuffer. Thusthisimplementationhardwaregeneratesonesetofsubk eysandreusesitforcalculatingall othersubkeysandoneclockcycleforonesubkeygeneration. Togetalltherequiredoperations doneinoneclockcycleforoneround,someofthemodulessuch asS-boxesneedtobeduplicated. ThehardwarearchitectureforthedesignwasdescribedinVe rilogXLandsynthesizedbySynopsis witha0.18MicronstandcellLibrary.Theresultsshowsthat thedesignhasabout173,000gatesand thedataencryptioncanbedoneatarateof1.82Gbps. 15

PAGE 26

Kuoetal[48]useslookuptablefortheimplementationofthe shiftrowmoduleandforthe generationoftheroundconstantsinthekeyschedulingmodu le.Thelookuptableusedintheshift rowmodulehas24entriesandisrepeatedforfourtimesinthe encryptionmodule.Thelookup tableusedinthekeyschedulingmodulehas30entries.Itisa ccessedonceineachround,toread theroundkeyconstantappropriateforthatround. McLooneetalintheirpaper[7]discussedhighperformances ingle-chipFPGAimplementationsoftheRijndael.Thesedesignswereimplementedonthe Virtex-EFPGAfamilyofdevices. Theirencryptorcorewascapableofsupportingdifferentke ysizes,192bitkeydesignswhichrun at5.8Gbpsand256keybitdesignswhichrunat5.2Gbps.Alsot he128bitkeyencryptorhada throughputof7Gbpswhichwasthen3.5timesfasterthanthes imilarexistinghardwaredesigns andwas21timesfasterthanthethenknownsoftwareimplemen tationandwasclaimedasthefastest fullypipelinedsinglechipFPGARijndaelencryptorcore. Gajetal[36]presentsandanalyzestheresultsofimplement ationsofallveAESnalistsusing XilinxFieldProgrammableGateArrays.Performanceoffour alternativehardwarearchitectures isdiscussedandcompared.TheAEScandidatesaredividedin tothreeclassesdependingontheir hardwareperformancecharacteristics.Recommendationre gardingtheoptimumchoiceofthealgorithmsforAESisprovided.TherstclassincludesTwosh andRC6.Bothciphersguarantee compactlow-costimplementationswithmediumspeedcompar edtoothercandidates.Inparticular, becauseoftheareaconstraints,TwoshandRC6aretheonlyc iphersthatcanbeimplementedusing lowcostFPGAdevicesfromtheXilinxXC4000family.Bothcip herscanbesubstantiallysped-up byouter-roundpipelining.Amongthetwo,Twoshisinsomer espectssuperiortoRC6.Itisabout 70%fasterandismoresuitableforinner-roundpipelining. Bothciphersusecomparablearea,and asaresulttheirpotentialforloopunrollingandouter-rou ndpipeliningissimilar.Thesecondclass includesSerpentandRijndael.Bothciphersguaranteevery highspeedatthecostoftherelatively largeareacomparedtotheciphersfromtherstclass. Theprimarywayofspeedinguptheseciphersfornon-feedbac kciphermodes(ECBandcounter mode)isinner-roundpipelining.Bothciphershaveasimila rspeedinthebasicarchitecture.Rijndaelcanbeimplementedusingabout35%lessarea.Themorere gulararchitectureofSerpentmakes itsignicantlymoresuitableforamulti-stageinner-roun dpipelining.Thethirdclassiscomposed ofMarsitself.Thisciphershowstheworsthardwarecharact eristicsofallvecandidates.Itisover 16

PAGE 27

twiceasslowthanthenextslowestcandidate(RC6),andover 8timesslowerthanthefastestAES cipher(Serpent).Italsotakesovertwicetheareausedbyci phersfromtherstgroup,Twoshand RC6.FurtheroptimizationsoftheMARSimplementationarec ertainlypossible,butwouldrequire thehigherdevelopmenteffortthanthatdevotedtootherAES candidates. 2.2RequirementsofAdvancedEncryptionStandardAlgorith m AdvancedEncryptionStandard(AES)wasissuedasFederalIn formationProcessingStandards (FIPS)byNationalInstituteofStandardsandTechnology(N IST)asasuccessortoDESalgorithms. Thebasicrequirementsforanalgorithmtobeconsideredasa nAESalgorithmare[1]:Thealgorithmmustimplementsymmetric(secret)keycrypto graphywhichusessinglekey thathastobesharedbetweentheusers.Thealgorithmmustbeablockcipherwhichoperatesonxedsi zeplaintextblocksrather thanastreamofplaintextbitsThecandidatealgorithmshouldbecapableofsupportingkey -blockcombinationswithsizes of128-128,192-128,and256-128bits Asmentionedbefore,RijndaelwasselectedbytheNISTasthe AESalgorithminOctober 2000.IntheThirdAESCandidateConference[2],thepropose dalgorithmsfromdifferentresearch groupswerepresented[29],[32],[31],[36],[86],[37].Th emainevaluationcriteriafortheAESnalistalgorithmswasintermsofthehardwareimplementati onperformanceconsideringthegeneral purposearchitecturesforeachalgorithm. Eachencryption-keysizecausesthealgorithmtobehavesli ghtlydifferently.So,theincrease inkeysizesnotonlyofferlargernumberofbitswithwhichth eplaintextcanbescrambled,but alsoincreasethecomplexityofthecipheralgorithm.TheAE Salgorithmrepeatsitscorearequired numberofroundsdependingonthekeysize.Theseloopiterat ionsarecalledrounds,asinDES. UnlikeDES,theAESalgorithmisnottrulysymmetric.Theter msthatarecommonlyusedinthe AESalgorithmscanbelistedas:A Round isaniterationofthemainpartoftheAESalgorithms.AESalg orithmscontaina variablenumberofrounds,dependingonthekeysize. 17

PAGE 28

Plaintext istheoriginalunencrypteddata.Ciphertext istheencrypteddata.An S-box isalook-uptablerepresentingthemultiplicativeinverse ofabyte.TheAESalgorithmexpandsthe128,192or256bitkeyintoanum berof32-bitvalues.Each roundofthealgorithmreceivesanew128-bitkeyfromthekey schedulemodule.Thetotal sizeofthekeyscheduledependsonthekeysize. ThealgorithmsselectedasAESnalistcandidatealgorithm s[41],[83]areMARS,RC6,Rijndael,SerpentandTwosh.Basedonthebasicrequirements,t heRijndaelalgorithmwasstandardizedasAESalgorithm.TheAESnalistswereanalyzedandeva luated[24]takingintoconsiderationsecurity,computationalefciency,memoryrequir ements,exiblity,hardwareandsoftware suitabilityandsimplicity.Regardlessofusingthefeedba ckornon-feedbackmode,theRijndael algorithmhashighperformanceinbothhardwareandsoftwar e.Rijndael'sverylowmemoryrequirementsmakeitverywellsuitedforrestricted-spaceen vironments,inwhichitalsodemonstrates excellentperformance.TheoperationsinvolvedintheRijn daelalgorithmarelesscomplextoimplementcomparedtothatofotherAESnalistalgorithms.2.3DescriptionandImplementationofAESCandidateAlgori thms DuringsecondAdvancedEncryptionStandard(AES)candidat econference,vealgorithmswere chosenbytheNIST.TheveAESnalists[7]chosenbyNISTare MARS,RC6,Rijndael,Serpent andTwosh.Thissectionbrieysummarizesthehardwareimp lementationofthealgorithms[40]. 2.3.1MARS MARSwassubmittedbyIBM.[75],[32],[31],[14],[15]Thisa lgorithmsupports128bitdatablockandavariablekeysizefrom128bitsto448bits.Thede signofthealgorithmresultsina muchimprovessecurity/performancetradeoffoverexistin gciphers.MARSoffersbettersecurity thanTripleDESandrunssignicantlyfasterthanDESalgori thm.MARShasseverallayerssuchas keyadditionaspre-whittening,8roundsofunkeyedforward /backwardmixing,8roundsofunkeyed forward/backwardtransformationandkeysubtractionaspo st-whittening.Themixingandthekey 18

PAGE 29

roundsarethemodicationsofthefeistel-cipherrounds.T heencryptionmoduleconsistsoffour kindsofroundfunctions.MARSalgorithmisnotsuitedforfa sterhardwareimplementationbecause ofthecomplexarithmeticoperationsinvolved,especially additionmod andmultiplicationmodoperations.ThecomponentsusedintheencryptionpartofMA RSare:Theinitialkeyadditioncanbeimplementedusingfouraddit ionmod .Eightroundsoftheunkeyedforward/backwardsmixingcanbe implementedusingtwoadditionmodoperations/twosubtractionmodoperationsand4look-uptableswith8bitinput/32bit-outputEightroundsofthekeyedforwardandbackwardstransformat ionsindividuallycanbeimplementedusingsixadditionmod operationandtwomultiplicationsmod operationand fourdata-dependentrotations.Thenalkeyadditionimplementedusingfoursubtractionmo doperations. 2.3.2RC6 RC6wassubmittedbyRSAlaboratoriesandwasdesignedbyRon Rivest,MattRobshaw,Ray SidneyandYiqunLisaYin.[18],[32],[31],[68].Thisalgor ithmsupports128bitdatablock,variablekeysizeupto2040bitsand20rounds.Thealgorithmuses thefeistel-cipherstructure.The roundfunctionsincludetherotationofthedatabyaquadrat icfunction.RC6algorithmisnotsuited forfasterhardwareimplementationbecauseofthecomplexa rithmeticoperationsinvolvedthattake longertime.Thehardwarecomponentsusedare:twoadditionmod andtwomultiplicationmod operations.twodatadependentrotations 2.3.3Rijndael Thisalgorithm[34],[58],[57]supportsavariabledatablo ckandavariablekeylengthof128, 192or256bits.Thenumberofroundsdependsuponthedatablo ckandkeylengths.Ifthemaximum lengthofthedatablockorkeyis128,192or256,thenthenumb erofroundsis10,12or14respectively.Thecomponentsusedbeinglogicaloperationsandlo ok-uptablesimplementedusingthe 19

PAGE 30

memoryelementssuchasROM,theRijndaelalgorithmiswells uitedforthehardwareimplementation.Theroundkeysaregeneratedbythekeygenerationmodu lebyexpandingtheinitialcipherkey. TheroundfunctionofRijndaelin128-bitdatablockiscompo sedoffourtransformations,namely, ByteSubstitution,ShiftRow,MixColumnandKeyAddition.T hehardwarecomponentsusedare:TheByteSubtransformationcanbeimplementedusing16look -uptableswith8bit-input/ 8bit-output.TheShiftrowtransformationcanbeimplementedusingcycli cleftshiftoperationTheMixColumntransformationcanbeimplementedusinglogi calANDandXORoperationsTheRoundKeyAdditionisimplementedusinglogicalXORoper ations.Theadditionisappliedonthe128-bitdataandtheappropriatekeygeneratedb ythekeygenerationmoduleTheroundkeysaregeneratedfromtheinitialkeyusinglogic alXORoperation,4look-up tableswith8bit-input/8bit-output. 2.3.4Serpent SerpentwassubmittedbyRossAnderson,EliBiham,LarsKnud sen[5],[6],[32],[31].Serpent isasubstitution-lineartransformation.Serpentrunsmuc hfasterthantheDESalgorithmandits designsupportsaveryefcientbitsliceimplementation.T hisalgorithmhas32roundswithinitial andnalpermutations.Theroundfunctionconsistsofthree layers:thekeyadditionoperation, 32parallelapplicationsoftheS-boxesandalineartransfo rmation.Inthelastround,thelinear transformationisreplacedbyakeyadditionoperation.The algorithmissuitedforhardwareexcept forthereasonthatthenumberofroundsthatneedtobeimplem entedislargerwhichmakesthe processingspeedless. Thehardwarecomponentsusedare:32look-uptableswith4bit-input/4bit-outputlogicalandrotateshiftslogicalXORoperations 20

PAGE 31

2.3.5Twosh TwoshwassubmittedbyBruceSchneier,JohnKelsey,NielsF erguson,DougWhiting,David WagnerandChrisHall[17].Twoshisa128-bitblockciphert hatacceptsavariable-lengthkey upto256bits.Thecipherisa16-roundFeistelnetworkwitha bijectiveFfunctionmadeupof fourkey-dependentS-boxes,axed4-by-4maximumdistance separablematrixover ,a pseudo-Hadamardtransform,bitwiserotations,andakeysc hedule.Thisalgorithmhas16-round Feistel-likestructurethatconsistsofaXORoperationatt heinputandoutput.Thenumberofrounds (n)is12,16or20forkeylengthof128,192or256bits.Thehar dwarecomponentsusedare:fouradditionmodoperationsnlook-uptableswith8bit-input/8bit-outputlogicalXORandANDoperations Thenumberof8*8look-uptablesneededforimplementationw ithkeylengthof128bitsis 48whereasitis10incaseofRijndaelalgorithm.ThismakesT woshalgorithmnotsuitablefor hardwareimplementation.2.4BlockCipherModesofOperation TheRijndael,ortheAESstandard,asasymmetricblockciphe r,couldbeusedinmanyforms, calledmodes,byhigh-levelsecurityprotocols.Forexampl e,somekindsofpipeliningarenot allowedtobeusedwithsomemodes.Becauseofthat,itisnece ssarytodenethedifferentmodes thatwillbeaddressed.Themodeofoperation,isanalgorith mthatfeaturestheuseofsymmetric blockcipheralgorithmtoprovideaninformationservice,s uchascondentialityorauthentication. Thepropertiesofmodeofoperationcanbelistedas:PerformanceParallelizabilityErrorExpansionCryptoSynchronization 21

PAGE 32

TheRijndaelalgorithmoperatesonxedsizeplaintextbloc ks(blockciphers)ratherthanonastream ofplaintextbits(streamciphers).HencetheRijndaelalgo rithmiscalledablockcipheralgorithm. Thekeyusedinboththeencryptionanddecryptionmodesisth esame,whichistheconceptof private-keycryptosystems.BlockCiphersaresymmetric-k eyencryptionalgorithmsthattransform axedlengthplaintextintoaxedlengthciphertextusinga singleprivatekey.Thedecryptionis similartotheencryptionexceptthattheinversetransform ationsareappliedinareverseorderusing thesamekeyusedintheencryption.Blockciphersoperatein theElectronicCodeBook(ECB)or CipherBlockChaining(CBC)modeandthestreamciphersoper ateintheCipherFeedback(CFB) orOutputFeedback(OFB)mode.AsRijndaelisablockciphera lgorithmitoperatesineitherECB orCBCmode.TheAESmodeofoperationsarebrieydescribed. Theseare:ElectronicCodeBook(ECB)Mode:IntheECBmodeofoperation ,theplaintextblockisencryptedindependentlyintoaciphertextblock.Thisallows parallelismandcanbeefciently implementedinhardware.Theparallelismprovidesfasteri mplementation.Theoccurenceof anyerrorduringtransmissionisnotpropagatedtoothercip hertextblocks.Themaindisadvantageisthattheidenticalplaintextblocksresultinide nticalciphertextblockswhichmakes thedatavulnerabletocryptanalystattacks.TheECBmodeof operationcanbeillustratedas inFigure2.1TheECBmodeofoperationcanbecharacterizeda s: ,forencryption ,fordecryptionCipherBlockChaining(CBC)Mode:IntheCBCmode,bitwiseXO Roperationisapplied betweenplaintextblockandthepreviousciphertextblocka ndtheresultisthenencryptedto gettheciphertextblock.Therefore,weneedaninitialseed valuetoencrypttheinitialdatablock.UnlikeECB,inCBCmodeiftheseedvalueischosencar efully,identicalciphertext blocksarenotgeneratedforidenticalplaintextblocks.Th eCBCmodeofoperationcanbe illustratedasinFigure2.2TheCBCmodeofoperationcanbec haracterizedas: ,forencryption ,fordecryption 22

PAGE 33

EncryptionAlgorithm EncryptionAlgorithm EncryptionAlgorithm Cipher text block Plain text block Cipher text block 1 Cipher text block Plain text blockPlain text block 12 2 n nFigure2.1.ElectronicCodeBookMode Encryption Algorithm Encryption Algorithm Plain text block 1 Plain text block 2 Cipher text block 1 Cipher text block 2 Encryption Algorithm Cipher text block n Plain text block nFigure2.2.CipherBlockChainingMode 23

PAGE 34

Encryption Algorithm Encryption Algorithm Encryption Algorithm Plain text block Plain text block Plain text block Cipher text blockCipher text block Cipher text block IV 1 2 n 12 nFigure2.3.CipherFeedBackModeCipherFeedback(CFB)Mode:IntheCFBmodeofoperation,the previousciphertextblock isencryptedandtheoutputproducediscombinedwiththeinp utplaintextblockusingbitwise XORoperationtoproducethecurrentciphertextblock.Ifap laintextischanged,allthesubsequentciphertextblockswillbeaffected.Initialseedva lueisrequiredfortherstplaintext blocktobeencrypted.TheCFBmodeofoperationcanbeillust ratedasinFigure2.3The CFBmodeofoperationcanbecharacterizedas: ,forencryption ,fordecryptionOutputFeedback(OFB)Mode:TheOFBmodeofoperationissimi lartotheCFBmode ofoperation,exceptthattheplaintextisXORedwithadatab lockthatisindependentofthe plaintextandciphertextblocks.Aninitialseedvalue isencryptedandtheresultisXORed withtheplaintextblocktogeneratetheciphertext.Subseq uentvaluesforthe aregenerated byapplyingtheencryptiontransformationon .Ifplaintextischangedonlythecorrespondingciphertextblockisaltered.TheOFBmodeofoperat ioncanbeillustratedasin Figure2.4TheOFBmodeofoperationcanbecharacterizedas: ,forencryption ,fordecryption,where 24

PAGE 35

EncryptionAlgorithm Encryption Algorithm EncryptionAlgorithm IV Plain text block Plain text block Plain text block Cipher text blockCipher text block Cipher text block 1 2 n n 2 1Figure2.4.OutputFeedBackMode Intheproposedarchitecture,anumberofencryptionstanda rdroundsarepipelinedtoenhancethethroughput.Suchpipelinedimplementationshave onlylimitedapplicationforblock cipher[21],[42]andaregenerallyusedintheElectronicCo deBook(ECB)mode.Astheother threemodesnamely,CBC,CFBandOFBusefeedbackoftheciphe rtexttotheinput,thepipelining doesnotcontributetoincreaseinthroughput.Hencethepre sentimplementationtakesElectronic Codebookmodeintoconsiderationallowingforthepipelini ng. 2.5Cryptanalysis Cryptanalysisisthescienceofbreakingacipher.Theneedf orrobustencryptionalgorithmshas madecryptanalysisnecessarytodetectandcorrectweaknes sinthealgorithmandthekeyschedule. Thecryptanalysisattacksoncryptosystems[47],[62],[67 ]canbeclassiedintothefollowingsix categories:Ciphertext-onlyAttack:Aciphertext-onlyattackrequire ssamplesoftheciphertextwithout theplaintextassociatedwithit.Known-plaintextAttack:Aknownplaintextattackrequires thesampleoftheciphertextalong withtheassociatedplaintext.Chosen-plaintextAttack:Achosenplaintextattackisaspe cialcaseofknown-plaintextattack wheretheciphertextforachosensamplesetofplaintextiso btained. 25

PAGE 36

AdaptiveChosen-plaintextAttack:Anadaptivechosen-pla intextattackisaspecialcaseof thechosen-plaintextattackwherethecryptanalystisable tochoosetheplaintextblocksdynamicallydependingupontheresultsofpreviousencryptio nsChosen-ciphertextAttack:Achosenciphertextattackrequ iresachosensamplesetofthe ciphertextandthecorrespondingplaintextresultingfrom thedecryptioniscollected.AdaptiveChosen-ciphertextAttack:Anadaptivechosen-pl aintextattackisaspecialcaseof thechosen-ciphertextattackwherethecryptanalystisabl etochoosetheciphertextblocks dynamicallydependingupontheresultsofpreviousdecrypt ion. Anyblockciphercantheoreticallybeattackedbyexhaustiv elytryingallpossiblekeys.The feasibilityofthisapproachdependsuponthesizeofthekey .Ifanattackfasterthantheexhaustive searchispossible,thenthecipheristheoreticallyregard edasbroken. 26

PAGE 37

CHAPTER3 RIJNDAELALGORITHMANDITSARCHITECTURE ThischapterdescribestheoperationsoftheRijndaelAlgor ithmindetailanddiscussesthepossibleimplementationsoftheoperationsinhardware.Theop erationsinthealgorihtmareclassied intodataunitoperationsandkeyunitoperations.Thedescr iptionandimplementationofdataunit andkeyunittransformationsareexplainedinthefollowing subsections. 3.1RijndaelAlgorithm Rijndaelalgorithmisablockcipher[28]algorithmthathas beendevelopedbyJoanDaemen andVincentRijmen[27].TheRijndaelalgorithmisaniterat edblockcipherwithvariablekey lengthandvariableblocklength.Theblockandthekeylengt hcanbeindependentlyspeciedto 128,192or256bits.Thealgorithmconsistsof:Aninitialdata/keyadditionNine(128-bits),eleven(192-bits)orthirteen(256-bits) roundsofstandardroundAnalroundwhichisavariationofastandardround. Thenumberofstandardrounds[88],[59]dependsontheblock andkeylength.Theinitial keyisexpandedtogeneratetheroundkeys,eachofsizeequal toblocklength.Eachroundofthe algorithmreceivesanewroundkeyfromthekeyschedulemodu le.Duetoit'sregularstructureit canbeimplementedveryefcientlyinhardware[85],[53]so ftware.Hardwareimplementationof Rijndaelalgorithmgivesthefasterdataencryptionanddec ryptiontimeandthehighersecuritythan softwareimplementation[32].Thelengthoftheexpandedke yforvariablekeysizesisillustrated intheTable3.1. 27

PAGE 38

Table3.1.LengthofExpandedKeyforVaryingKeySizes DataBlockLength, 4 4 4 KeyBlockLength, 4 6 8 NumberofRounds, 10 12 14 ExpandedKeyLength, 44 52 60 TheAESstandard[4]hasxedthedatablocklengthtobe128bi tswide.Ina128-bitblock Rijndaelalgorithm,plaintextandciphertextareprocesse dinblocksof128bits.Adatablockto beencryptedissplitintoanarrayofbytes,andeachencrypt ionoperationisbyte-oriented.The algorithmhasdifferenttransformationstobeappliedonth edatablockandtheintermediateresult iscalledState.TheBlockStateisrepresentedasarectangu lararrayofbytes.BoththeKeyState andtheBlockStatearearrangedincolumnmajororder.Thein termediatevaluesofRijndaelare representedasaBlockStatematrixof(4*)bytes, asshowninTable 1.1.Similarly,asshowninTable1.2theinitialandroundke ysofsize128bitsarerepresented asaKeyStatematrixof(4* )bytes, .Hencefor128bitblockand128 bitkey,thevaluesofand are=4and =4.Fora128-bitblockand128-bitkey,the numberofroundsneededis10,whichincludesthestandardro undsandthenalround.Arijndael roundtransformsthedatausingpermutations,nonlinearsu bstitutions,additionsandGaloiseld multiplications. Eachstandardroundincludesfourfundamentalalgebraicfu nctiontransformationsonarraysof bytes.Thesetransformationsare:ByteSubTransformationShiftRowTransformationMixColumnTransformationRoundKeyAddition Thenalroundofthealgorithmissimilartothestandardrou nd,exceptthatitdoesnothave MixColumnoperation.Decryptioniscomputedbytheapplica tionoftheinversetransformations oftheroundfunctions.Thesequenceofoperationsforthest andardroundfunctionindecryption differsfromencryption.Thecomputationalperformancedi ffersbetweenencryptionanddecryption 28

PAGE 39

becausetheinversetransformationsintheroundfunctiona remorecomplexthanthecorresponding transformationforencryption.Theinversemixcolumntran sformationismorecomplexthanthe mixcolumntransformationbecausethecoefcientsofthex edpolynomialusedduringthedecryptionareofhigherorder.Thedataunitconsistsofthefourtr ansformationsincludedinthestandard round,namely,ByteSubstitution,ShiftRow,MixCloumnand RoundKeyAddition.Thekeyunit consistsofthedescriptionofthekeyexpansionandkeysche duling. 3.1.1DataUnit3.1.1.1ByteSubstitutionTransformation(ByteSub) TheByteSubstitutionTransformationisconstructedofthe compositionoftwotransformations. Thetransformationsare:Multiplicativeinversein Anafnemapping/inverseafnemappingoverGF(2)fortheen cryption/decryptionprocess Theeld canbedenedndingapolynomialf(x)ofdegree8whichisirr educibleover GF(2).Thereare30ofthesetochoosefrom.Thenthepolynomi alsofdegreelessthan8over GF(2)formasetofsize .ByteSubstitutionTransformationoperatesoneachoftheS tatebytes individuallyinanon-linearmanner.Itconsistsofsubstit utionboxeswhichreplaceeachbytebyits multiplicativeinversecomputedfromthe (GaloisFields).TheRijndaelS-boxisbasedon themapping ,where denotesthemultiplicativeinverseintheeld.Thereexist several efcientmethodstocalculatemultiplicativeinversesina niteeld .In[39],analgorithm ispresented,thatisbasedonEuclid'salgorithm.Ithasana reacomplexityofO(m)andrequires 2mtimesteps.In[66]thecalculationscanbedoneinGF(16)b ytakingairreduciblepolynomial ofdegree2.ThepresentimplementationconsiderstheGaloi sFieldrepresentedasapolynomial moduloanirreduciblepolynomialofdegree8.Theirreducib lepolynomialofdegree8is (3.1) 29

PAGE 40

GF( 2 ) of a state byte 8over GF( 2 ) 8GF( 2 ) of a state byte 8over GF( 2 ) 8 ENCRYPTION Inverse linear affine mappingMultiplicative inverse DECRYPTION Multiplicative inverse overLinear affine mapping Figure3.1.ByteSubstitutionTransformation Intheencryptionprocess,themultiplicativeinverseofas tatebyteistakenrstandthenthe afnemappingtransformationisapplied.Inthedecryption process,theinverseoftheafnemappingisappliedonthestatebyteandthenthemultiplicative inverseoftheresultistaken.The operationsinthedecryptionaretheinversesoftheoperati onsthatareperformedintheencryption.Theyareperformedinthereverseorder.Theinversetr ansformationiscalled InverseByteSub Transformation (InvByteSub).TheByteSubTransformationisdepictedasin Figure3.1 Theafnemapppingandtheinverseafnemappingtransforma tionsareappliedtobytes.The afnemappingover isalineartransformationappliedtothemultiplicativein verseofthe databyte.Thereexiststwoconstants'0x63'and'0x05'(0xs tandsforhexadecimalrepresentation), whichareusedintheafnemappingtransformationfortheen cryptionanddecryptionrespectively. Theafnemappingover isdenedas: 30

PAGE 41

BBBB B B B BBBBB B B B B 0,0 1,1 2,2 3,3 0,10,20,3 1,0 1,21,3 2,02,12,3 3,03,13,2 B 2,2 B 0,0 BBB BBBBBB BBB B 1,1 B 2,0 3,3 8 x 256 0,10,20,31,3 1,2 1,0 2,12,3 3,2 3,1 3,0 S-BOX State byte State byte Figure3.2.MultiplicativeInverseusingS-Box = + Themultiplicativeinverseover canbeimplementedeitherbyusingthecombinational logicorbyusingthelook-uptable.Thecombinationallogic takesalargernumberofclockcycleswhilelook-uptableimplementationconsumesmorearea .Inthepresentimplementation,the multiplicativeinverseisimplementedusingalook-uptabl ecalledS-Box.Therepresentationof multiplicativeinversefunctionusingthelook-uptableis shownasinFigure3.2 31

PAGE 42

Table3.2.ShiftOffsetsforDifferentBlockLengths DataBlockLength ShiftOffsetforRow1 ShiftOffsetforRow2 ShiftOffsetforRow3 (C1) (C2) (C3) 4 1 2 3 6 1 2 3 8 1 3 4 3.1.1.2ShiftRowTransformation(ShiftRow) InShiftRowTransformation,therowsoftheStatematrix[20 ]arecyclicallyshiftedoverdifferentoffsets.Theoffsetcorrespondstotherownumber.TheSt atematrixhasfourrowsasmentioned intheSection1.1.Theshiftrowoperationensuresthatthed ifferentbytesofeachrowdonotinteractwiththecorrespondingbyteinotherrows.Eachrowof thestatematrixisshiftedtotheleft duringencryptionandtotherightduringdecryption,byace rtainoffset.Theoffsetbywhichiseach rowofthestatematrixisshiftedisgivenbytherownumber.T heRow0isnotshifted.TheRow1, Row2,Row3areshiftedbyC1,C2,C3bytesrespectively.Thes hiftoffsetsC1,C2andC3depend ontheblocklength.ThedifferentvaluesoftheoffsetsarespeciedintheTabl e3.2. FromtheTable3.2,wecaninferthatforadatablockoflength 128bits,theRow0isnot shifted,Row1isshiftedover1byte,Row2isshiftedover2by tesandRow3isshiftedover3bytes respectively.Inthedecryptionprocess,theinverseshift rowtransformationisapplied.Therows Row1,Row2,Row3arecyclicallyshiftedoveranoffsetof-C1,-C2,-C3respectively. Alternately,inthedecryption,therowsRow1,Row2,Row3,a recyclicallyshiftedoveranoffset ofC1,C2,C3bytesrespectivelytotheright. Intheencryptiontheshiftoffsetsisexplainedas:Therstrow(Row0)iscyclicallyshiftedby0bytes.Thesecondrow(Row1)iscyclicallyshiftedby1bytetothele ft(i.e.,8bits)Thethirdrow(Row2)iscyclicallyshiftedby2bytestothele ft(i.e.,16bits)Thefourthrow(Row3)iscyclicallyshiftedby3bytestothel eft(i.e.,24bits) 32

PAGE 43

B 0,0 BBB BBBBBB BBB B B B 1,1 3,3 2,2 State byte State byte 0,10,20,3 1,0 1,2 1,3 2,02,12,3 3,03,13,2 BBB B B B BBBBB B B B 0,0 1,1 2,2 3,3 B0,10,20,3 1,0 1,2 B 1,3 2,02,12,3 3,03,13,2Figure3.3.ShiftRowTransformationforEncryption Inthedecryptiontheshiftoffsetsaretheinversesoftheof fsetsintheencryptionovertheblock length.Thisisexplainedas:Therstrow(Row0)iscyclicallyshiftedby0bytes.Thesecondrow(Row1)iscyclicallyshiftedby3bytestothel eft(i.e.,24bits)or1byteto theright(i.e.,8bits)Thethirdrow(Row2)iscyclicallyshiftedby2bytestothele ft(i.e.,16bits)or2bytestothe right(i.e.,16bits)Thefourthrow(Row3)iscyclicallyshiftedby1bytetothele ft(i.e.,8bits)or3bytestothe right(i.e.,24bits) TheShiftrowtransformationfor128bitsforencryption,de cryptionisrepresentedasinFigure 3.3andFigure3.4respectively.Thedatablockoflength128 bitsisstoredintheStatematrixas showninTable3.3.Intheencryption,decryptionprocess,a ftertheimplementationoftheshiftrow transformation,theorderofbitsintheStatematrixisassh owninTable3.4,Table3.5respectively. 33

PAGE 44

B 0,0 BBB BBBBBB BBB B B B 1,1 3,3 2,2 State byte State byte 0,10,20,3 1,0 1,2 1,3 2,02,12,3 3,03,13,2 BBB B B B BBBBB B B B 0,0 1,1 2,2 3,3 B0,10,20,3 1,0 1,2 B 1,3 2,02,12,3 3,03,13,2Figure3.4.ShiftRowTransformationforDecryption Table3.3.DataBlockRepresentedinaStateMatrix 8-1 40-33 72-65 104-97 16-9 48-41 80-73 112-105 24-17 56-49 88-81 120-113 32-25 64-57 96-89 128-121 Table3.4.StateMatrixAftertheShiftrowTransformation, inEncryption 8-1 40-33 72-65 104-97 48-41 80-73 112-105 16-9 88-81 120-113 24-17 56-49 128-121 32-25 64-57 96-89 Table3.5.StateMatrixAftertheShiftrowTransformation, inDecryption 8-1 40-33 72-65 104-97 112-105 16-9 48-41 80-73 88-81 120-113 24-17 56-49 64-57 96-89 48-41 80-73 34

PAGE 45

3.1.1.3MixColumnTransformation(MixColumn) TheMixcolumntransformationperformsalinearoperationo nthecolumnsofthestatematrix. Themixcolumntransformationoperatesonthecolumnsofthe statematrixi.e.,32bits.Itcauses everybyteinacolumntoaffecteveryotherbyte.Thestatema trixisviewedascolumnpolynomials over andthetransformationconsistsofmatrixmultiplicationo fthestatewithapolynomial overaniteeld.Themixcolumntransformationstepistheo nlyplaceinRijndael'sroundtransformationwherethecolumnsaremixed.Thisstepworkswitht heShiftRowsteptoensurethatall partsoftheblockeffecteachother.Foradatablockof128bi ts,thestatematrixhas4rows.Therefore,thecolumnsofthestatematrixareeachviewedasthepo lynomialofdegree8over Inencryption,eachcolumnofthestatematrixismultiplied bythexedpolynomial (3.2) Thecoefcientsofthexedpolynomialareinhexadecimalan daretheelementsof .The MixColumntransformationforencryptionanddecryptionis reprensentedasinFigures4.7,3.6 respectively.Thiscanberepresentedinalgebricformasam atrixmultiplication.LetB(x)=C(x)* A(x). = Themultiplicationofaxedpolynomialover iscalculatedusingshiftsandexclusiveORoperations.Theresultingequationsforeachbyteinthec olumnareasfollows: 35

PAGE 46

BBB B B B BBBBB B B B 0,0 1,1 2,2 3,3 0,10,20,3 1,0 1,2 B 1,3 2,02,12,3 3,03,13,2 B BBBB i,ji,j+1 i,j+2 i,j+3 02 03 01 0101 02 03 0101 01 02 0303 01 01 02 BBBB i,ji,j+1 i,j+2 i,j+3 BBB B B B BBBBB B B B 0,0 1,1 2,2 3,3 0,10,20,3 1,0 1,2 B 1,3 2,02,12,3 3,03,13,2 B State byte State byte = Figure3.5.MixColumnTransformationforEncryption Indecryption,eachcolumnofthestatematrixismultiplied bythexedpolynomial (3.3) Thecoefcientsofthexedpolynomialareinhexadecimalan daretheelementsof .Galios Field isaniteeldof256elementsgeneratedbyanirreduciblepo lynomialofdegree8. Theelementsof arerepresentedaspolynomialsofdegreelessthaneightinn umbers(mod 2).Asthenumbersaretakenmodulo2,theadditionisequival enttoXORoperation.Anirreducible polynomialisapolynomialwhichisdivisibleby1anditself .In ,nrepresentsthenumberof bitsrequiredtorepresentthepolynomialsinbitrepresent ation.Thiscanberepresentedasamatrix multiplication.LetA(x)=D(x)*B(x) = 36

PAGE 47

BBB B B B BBBBB B B B 0,0 1,1 2,2 3,3 0,10,20,3 1,0 1,2 B 1,3 2,02,12,3 3,03,13,2 B BBBB i,ji,j+1 i,j+2 i,j+3 BBBB i,ji,j+1 i,j+2 i,j+3 BBB B B B BBBBB B B B 0,0 1,1 2,2 3,3 0,10,20,3 1,0 1,2 B 1,3 2,02,12,3 3,03,13,2 B 0B 0D 09 0E 0D 09 0E 0B State byte = State byte 0E 0B 0D 0909 0E 0B 0D Figure3.6.MixColumnTransformationforDecryption Themultiplicationofaxedpolynomialover iscalculatedusingshiftsandexclusiveORoperations.Theresultingequationsforeachbyteinthec olumnisasfollows: 3.1.1.4RoundKeyAddition(AddRoundKey) IntheRoundkeyadditiontransformation,aroundkeygenera tedfromthekeyscheduleris appliedtothestatebyasimplebitwiseXOR.Thistransforma tionisselfinverting.Hence,thekey additionissameforboththeencryptionanddecryptionproc esses.TheKeyAdditiontransformation forencryptionanddecryptionisrepresentedasinFigure3. 7 InRijndaelalgorithmShiftRowandMixColumnoperationsar eresponsiblefordiffusion[19]. DiffusionisdenedastheminimumnumberofactiveS-boxesi nalinearordifferentialcharacteristic.Diffusionisimportantbecausewithinblockciph ercryptanalysis,almostalltheattacks haveacomplexitythatdependsonthenumberofactiveS-boxe s.Thecryptanalyticcomplexityalso 37

PAGE 48

BBB B B B BBBBB B B B 0,0 1,1 2,2 3,3 0,10,20,3 1,0 1,2 B 1,3 2,02,12,3 3,03,13,2 B W i WWW i+1 i+2 i+3 BBB B B B BBBBB B B B 0,0 1,1 2,2 3,3 0,10,20,3 1,0 1,2 B 1,3 2,02,12,3 3,03,13,2 B XOR State byte State byte 128 bits=Figure3.7.KeyAdditionTransformation isaffectedbytheinput/outputcorrelationoftheindividu alS-boxes.Diffusionseekstomakethe statisticalrelationshipbetweentheplaintextandthecip hertextascomplexaspossibleinorderto thwartattemptstodeducethekey.3.1.2KeyUnit Theroundkeysthatarecreatedbythekeyschedulingprocess canbeevaluatedofinebefore theencryptionprocessstartsortheycanbegeneratedon-th e-y.Theadvantageofgenerating theroundkeysbeforehandisadvantageouswhenthesamekeyi susedforallthedatablocks. Thegeneratedroundkeysarestoredinthelook-uptable.Thi sapproachisgoodforbatchmode encryptionapplications,likeforinstancesecuredocumen tstorage,butnotforapplicationswhere keysarechangedfrequently.Theapproachofgeneratingthe roundkeyson-the-yisusedinthe applicationslikeInternetrouterswithIPSECsupport[71] .InanIPSECrouter,thekeyandtheround keymaterial(1280bitsfor128-bitdatablocks)ispotentia llydifferentforeachsecuredpacketroute. Duetothelargenumberofactiveroutes,roundkeyscannotbe storedinon-chipmemory.Theyneed tobeeithercalculatedon-lineorelsemovedtogetherwitha packetpayloadontochip.However, halfoftheInternetpacketsareonly64bytesinlength(512b its)[87].Withofinecalculationand off-chipstorageofsubkeystherouterwillusemorebusband widthforthecommunicationsofthe roundkeysthanforthecommunicationofusefuldatapayload s.Theeffectivesolutionforthisis onlinecomputationofroundkeyssuchasisdoneinourarchit ecture. 38

PAGE 49

RoundKeyscanbegeneratedeitherduringtheencryptionord ecryptionprocesswhenneeded orcanbegeneratedearlierandstoredinthelookuptable.Th eadvantageofstoringthegenerated sub-keysisthat blocksofdatacanbeprocessedatthesametime,ifthekeyiss ameforallthe blocksoftheinput.Butthedisadvantageisthat,itconsume slotofareaandtakes clockcycles forallthesub-keystobegeneratedusingthepipeliningmet hodforthemtobestoredinthelook-up table.TheKeySchedulingconsistsoftwoparts.TheyareKey ExpansionandKeyScheduling. Theseareexplainedinbriefinthefollowingsubsections.3.1.2.1KeyExpansion Theroundkeysaregeneratedbyexpandingtheinitialkey.Th e128-bitinitialkeyneedstobe expandedto 32-bitwords.Thisresultsin4432-bitwords.Astheroundke yshould beoflength128-bits,thelengthoftheexpandedkeycanbeex peressedintermsof128-bitround keys.Thisresultsineleven128-bitroundkeys.Theinitial keyisthecipherkeyandisusedinthe initialroundofthealgorithm.Allthesubsequentkeysared erivedfromtherespectivepredecessor usingafunctionf.Thefunctioncanbewrittenas: forall Theinitialkey,isrepresentedasalineararrayW.Thenextr oundkeysareobtainedfromthe initialkey, where where 39

PAGE 50

W(i) W(i+1)W(i+2)W(i+3) W(i+4)W(i+5)W(i+6)W(i+7) ROT SBOX RCON Figure3.8.KeySchedulingforEncryption 3.1.2.2KeyScheduling Thekeylengthisvariable,astheinputblocklengthisxedt o128bitsaspertheAESstandards. Thekeylengthisvariablebetween128,192,256bits.Totaln umberofroundkeystobegenerated dependsupononthevalue roundkeysareneededfor roundsofbasicblock implementationplustheinitialkey.Eachkeyisoflengthof theinputblock.istheblocklength i.e.,thenumberofcolumnsinthematrixrepresentationoft heinputblock.Astheblocklengthis xedto128-bits,thenumberofcolumnsinthematrixisequal to4.SotheTotalnumberofround keystobegeneratedisgivenby isthenumberofroundsthebasicblockneedsto beimplemented.Thevalue varieswiththekeylength.Forkeylengthof128,192,256bit s,the numberofroundsare10,12,14respectively. For128bitkeyand128bitinputblock,thetotalnumberofbit sforalltheroundkeysthatneeds tobegeneratedare4*(10+1)=4*11=44wordsof32-bits.Simi larly,for192bitkeyand128bit inputblock,thetotalnumberofbitsforalltheroundkeysth atneedstobegeneratedare4*(12+1) =4*13=52wordsof32-bits.Similarly,for256bitkeyand128 bitinputblock,thetotalnumber ofbitsforalltheroundkeysthatneedstobegeneratedare4* (14+1)=4*15=60wordsof32bits. Thesubkeysgenerationfortheencryptionanddecryptionmo desareshownintheFigures3.8,3.9 40

PAGE 51

W(i+5) W(i+4) W(i+6) W(i+7) ROT SBOX W(i)W(i+1)W(i+2)W(i+3) RCONFigure3.9.KeySchedulingforDecryption Inthedecryptionmode,roundkeysusedintheencryptionmod eareusedbutinthereverseorder. Usingtheinverseofthefunction ,theroundkeysarederivedrecursivelyfromtheroundkeyus ed inthenalroundofthealgorithm.Theroundkeysthusgenera tedarescheduledtothedataunit.As eachroundofdataunittakesoneclockcycle,theroundkeysh ouldbegeneratedandscheduledto thedataunitforeveryoneclockcycle. 41

PAGE 52

CHAPTER4 DESIGNOFSUBSYSTEMS Thischapterdescribesthearchitectureusedfortheimplem entationofRijndaelalgorithm.The VLSIimplementationofByteSubstitution,ShiftRow,MixCo lumnandRoundKeyAdditiontransformationsofthealgorithmarediscussedindetail.Theimp lementationofthekeyunitisalso discussed.Wealsodiscusstheimplementationanalysisand thememoryoptimization. 4.1HardwareArchitectureandVLSIImplementation ThearchitectureweproposefortheRijndaelalgorithmisai medatachievinghighthroughput andreducingtherequiredhardwareresources[70],[73].Fe edbacklogicisusedaftereachstandard roundtoenhancethethroughput.Boththeencryptionandthe decryptionareimplementedonthe samedevice[35].Butthekeyshouldbeusedinthereverseord erfordecryption.Thearchitecture hasboththeencryptionanddecryptionprocessinthesameha rdwaredevice[84],[76].Thisreducesthenumberofhardwareresourcesneededfortheimplem entation.Intheimplementationof thealgorithmonlyonehardwareroundisusedforencryption anditisreusedtocompletethewhole encryptionprocesstoconservemostareaandkeysaregenera tedinreal-timetoreducetheamount ofstorageforthebuffer.Forachievinganincreaseinspeed ,anumberofsingleroundencryption modulesarepipelined.A128bitdatablockisencryptedinev eryclockcycle,althoughtherewill bealatencyof10clockcyclesthroughtheentiresystem.The implementationofthetransformationsusedinthealgorithmisdiscussedinthefollowings ubsections.Asdiscussedinsection 3.1,thedataunitconsistsoffourtransformations.Theord erofoperationandcontrolbetweenthe transformationsisshownintheFigure4.1 42

PAGE 53

Final Round / Initial Round Input Initial Key Encryption/ Decryption ByteSubTransformaiton ShiftRowTransformation MixColumnTransformation AddRoundKeyTransformation Output Figure4.1.ToplevelViewoftheRijndaelAlgorithm 43

PAGE 54

4.1.1ImplementationofDataUnit Thedataunitconsistsoftheintialroundofkeyaddition, roundsofStandardround,a nalround.ThearchitecturefortheStandardRoundisshown intheFigure4.2.Asmentionedin thepreviouschapter,eachstandardroundiscomposedoffou rbasicblocks:ByteSub,ShiftRow, MixColumnandAddRoundKey.Decryptionisperformedbythea pplicationoftheinversetransformationsoftheroundfunctions.Thesequenceofoperationsf orthestandardroundfunctiondiffers fromencryption.Foreachblock,boththetransformationan dtheinversetransformationneededfor encryptionanddecryptionrespectivelyaredesignedonthe samedevice. Asthekeysaregeneratedinreal-time,wegenerateonesetof subkeyandreuseitforcalculating allothersubkeys.Eachsubkeyisgeneratedinoneclockcycl e.Butsomeofthemodulesneedtobe duplicatedtogetalltherequiredoperationsdoneinoneclo ckcycleforoneround. 4.1.1.1ImplementationofByteSubstitutionTransformati on Inthistransformationeachblockisreplacedbyitssubstit utioninanS-Boxtable.TheimplementationofS-Boxconsistsofmathematicalfunction,them ultiplicativeinverseofeachbyteofthe BlockStateintheniteeld Therearetwowaysforimplementingthemultiplicativeinve rseinhardware.Onewayisto calculatethemultiplicativeinverseusingthecombinatio nallogic.Thearchitecturesforthemultiplicativeinversein usearraysofbasicinversionblockcells[39],[16],[45].T hisapproachhaslargetimeandarearequirements.Thesearchitec turesneedhighernumberofcycles perinversioni.e.,betweenmand(3m+2),inordertoachieve multiplicativeinversein whichisunacceptableforahighspeedimplementationofacr yptographicalgorithm.Inorderto overcometheaboveperformancebotttleneck,thesecondimp lementationofmultiplicativeinverses usinglook-uptablesisusedinthisarchitecture.Themulti plicativeinverseofeachbyteisstoredin thelook-uptable.Theexecutiontimeissignicantlylessa ndtakesonetimestep.Theimplementationwithlook-uptablescosumesmoreareacomparedtoother architectures[39],[16],[45]which isaminorconsiderationforthecurrentFPGAtechnology.St raightforwardwaytoimplementbyte substitutiontransformationistostorethevalues,obtain edaftertakingthemultiplicativeinverseand applyingtheafnemapping,andalsotheirinversesintheRO M.Itrequiresa512byteROM,with 44

PAGE 55

Round Key Addition XOR block Z Y XOR block MUX Mix Column Multiplexer Block Shift Row MUX AffineMapping InverseAffineMapping 8 256 ROM cells multiplicative inverse ByteSub Round key 128 128 128 128 128 128 Figure4.2.StandardRoundArchitecture 45

PAGE 56

aoverheadforaddressdecodingandoutputsignalcondition ing.Thisoverheadoutweighsthearea requirementsoftheROMmatrix. Alternatively,themultiplicativeinversesin canbestoredina256byteROMandthe afnetransformationanditsinverseiscalculatedontheou tputobtainedfromtheROMcells.SboxesareimplementedusingROMcells.Themultiplicativei nversesofallthepossible values with8-bitbinarynumberarestoredinthetable.Ithas8*256 entries. An8-bitinputvalueisgivenasinputtothelook-uptableand usesthatastheindexofthetable. Themultiplicativeinverseofthe8-bitinputwhichis8bits inlengthisgivenastheoutput.Sofora 128-bitinputblockweneed16copiesofthe8*256look-uptab les.Thecomponentsusedhereare ROMcells,decodersandC-gates. ROMcellisusedtostorethemultiplicativeinverseofallpo ssible256values.Allthe8-bitsof agiveninputbyteareANDedsothattheoutputoftheparticul arinputbecomeshigh.Theoutput oftheANDoperationisusedastheaddresstotheROMcellinwh ichthecorrespondingmultiplicativeinverseisstored.Adecoderisusedtoaddressthe correspondingROMcellandretreive itsmultiplicativeinversetakingtheinput.C-gatesareus edtoimplementthedecoderfunction.The controlsignaltothedecoderistheoutputoftheANDoperati onoftheinputbits.Multiplexersare usedtogivean8-bitoutputfromthepossible2568-bitvalue s,thecontrolbeingthedecoderoutput. TheoutputoftheS-boxisgivenasinputtothelinearafnema ppingfunction.Theimplementationincludestheafnemappingoftheinputinbothencryp tionanddecryptionprocesses.The componentsusedforimplementingafnemappingareC-gates andXORblocks. C-gatesareusedtochoosebetweenencryptionanddecryptio n.Thelineartransformationisoperatedon8-bitsatatime.Therefore,theexecutionoflinea rtransformationon128bitsisperformed byrepeatingthe8-bitlineartransformationimplementati onfor16timesandtheconnectionsforthe inputsforeachreplicateisgivenproperly.Theafnemappi ngasmentionedinchapter3canbe representedas: 46

PAGE 57

= + Theaboverepresentationoftheafnemappingcanbeimpleme ntedinthehardwareasfollows:AfneMapping: whereCE=01100011,leftmostbitbeingthemostsignicantb it.InverseAfneMapping: whereCD=00000101,leftmostbitbeingthemostsignicantb it. CEandCDareconstantsinencryptionanddecryptionrespect ively. TheAfneMappingfunctioninencryptioniscarriedoutamon gthebitsofthebyte.Thebits onwhichX-ORoperationisdonetogetthetransformedbyteis shownintheTable4.1.Similarly, thebitsthatneedstobeX-ORedtogetthetransformedbytein thedecryptionisshownintheTable 4.2.Thehardwareimplementationofthelineartransformat ionforabyteand128bitsisshownin Figure4.3,Figure4.4respectively.4.1.1.2ImplementationofShiftRowTransformation Inthistransformationtherowsoftheblockstateareshifte doverdifferentoffsets.Thenumber ofshiftsisdeterminedbytheblocklengthasshownintheTab le3.2.Thiscanbeimplementedeither byusingalook-uptableorbyusingacombinationallogic.Th epresentarchictectureimplements 47

PAGE 58

Table4.1.TheAfneMappingOperation BitNumber CE[i] In[i] In[i+4] In[i+5] In[i+6] In[i+7] bit 0 bit bit bit bit bit bit 1 bit bit bit bit bit bit 1 bit bit bit bit bit bit 0 bit bit bit bit bit bit 0 bit bit bit bit bit bit 0 bit bit bit bit bit bit 0 bit bit bit bit bit bit 1 bit bit bit bit bit Table4.2.TheInverseAfneMappingOperation BitNumber CD[i] In[i+5] In[i+7] In[i+2] bit 0 bit bit bit bit 0 bit bit bit bit 0 bit bit bit bit 0 bit bit bit bit 0 bit bit bit bit 1 bit bit bit bit 0 bit bit bit bit 1 bit bit bit In[i+6] In[i+5] In[i+7] In[i+2] CD[i]XORXORXORXORXOR XOR XOR Affine Mapping Inverse Affine Mapping MULTIPLEXER (Encryption -> Affine Mapping ; Decryption -> Inverse Affine Mapping) CE[i] In[i] In[i+4} Figure4.3.HardwareImplementationofAfneMappingandit sInverseforaByte 48

PAGE 59

--16th byte Linear Transformation (Encryption, Decryption) --15th byte Linear Transformation (Encryption, Decryption) --14th byte Linear Transformation (Encryption, Decryption) --3rd byte --2nd byte Linear Transformation (Encryption, Decryption) Linear Transformation (Encryption, Decryption) --1st byte Linear Transformation (Encryption, Decryption) Figure4.4.HardwareImplementationofAfneMappingandit sInversefor128Bits theshiftrowoperationusingcombinationallogicconsider ingtheoffsetbywhicharowshouldbe shifted.Itisimplementedusingmultiplexerssuchthatbot htheencryptionanddecryptionmodes areimplementedonthesamedevice.Inbothencryptionandde cryptionalgorithms,theoffsetsby whichtherstandthirdbyteoftherowneedstobeshiftediss ame.Hencethewehavesingleinput tothemultiplexer.Thesecondandfourthbyteoftherowtheo ffsetsaredifferentforencryptionand decryption,soweneedtwoinputs.Thehardwareimplementat ionoftheshiftrowtransforamtion canbeshownasinFigure4.54.1.1.3ImplemenatationofMixColumnTransformation Inthistransformationeachcolumnoftheblockstateiscons ideredasapolynomialover ItismultipliedwithaconstantpolynomialC(x)orD(x)over aniteeldinencryptionordecryptionrespectively.Inhardware,themultiplicationbythec orrespondingpolynomialisdonebyX-OR operationsandMultiplicationofablockbyxi.e.,(hexadec imalvalue02). Multiplicationof8-bitnumber(representedasapolynomia l)byxisimplementedinthehardwareasfollows.Theinputblockoflength8bitsisrepresent edasapolynomialofdegree7.To multiplythe8-bithexadecimalnumberby02orx, 49

PAGE 60

2nd byte3th byte4th byte 1st byte encryption/decryption MUX MUX MUX MUX Figure4.5.HardwareImplementationoftheShiftRowTransf oramtionfor32-bits1.Leftshiftthe8-bitnumberby1-bitThisisdoneusingasim pleX-ORoperation.2.Ifthemostsignicantbitoftheinitialpolynomialis1,t hentheoutputisobtainedbyXORingtheleftshiftedvaluewith1Belseifthemostsignica ntbitoftheinitialpolynomial beforeleftshiftingis0,thentheoutputisjusttheleftshi ftedvalue. ThisisimplementedusingaMultiplexor,thecontrolbeingt hemostsignicantbit(being1or 0).TheequationsimplementedinhardwareforMixColumnine ncryptionanddecryptionprocesses areasshownbelow: Inencryptionprocess, Indecryptionprocess, 50

PAGE 61

whereIn2Trans(K)isthemultiplicationofthebytebyX(hex adecimalvalue02)over In0istheleastsignicant8bitsofacolumnofamatrix.Thes ameoperationsneedtobeperformedforallthefourcolumnsinthematrix.Onemultiplexe risusedtochoosebetweenencryption anddecryption.Theconnectionsaregivenappropriatelyso thatYandZareusedatneededplaces. ThecomputationofYandZforthecorrespondingcolumnofthe blockstateisshowninthe Figure4.6.TheTrans()functionwhichisimplementedas”Mu ltiplicationbyX”isshowninthe Figure4.7.TheimplementationofMixColumnoperationfor1 28bitsisshownintheFigure4.8. Thefunctionalityof”out()”usedintheMixColumnoperatio nisshownintheFigure4.9 4.1.1.4ImplementationofRoundKeyAdditionTransformati on Inthistransformation,theroundkeyobtainedfromthekeys chedulerisXORedwiththeblock stateobtainedfromthemixcolumntransformationorshiftr owtransformationbasedonthetypeof roundbeingimplemented.Inthestandardroundtheroundkey isXORedwiththeoutputobtained fromtheMixColumntransformation.Inthenalroundtherou ndkeyisXORedwiththeoutput obtainedfromtheShiftRowtransformation.Intheinitialr ound,theXORoperationisperformed betweentheintialroundkeyandtheinitialstateblock.Bit wiseExclusive-ORisperformedonthe roundkeyandthestateblock.Thehardwareimplementationo ftheRoundKeyAdditiontransformationcanberepresentedasinFigure4.104.1.2ImplementationofKeyScheduling Inthiskeyschedulingmodule,theintialkeyisexpandedand thegeneratedroundkeysare storedintheregistersRegister0,Register1,Register2,R egister3.Inthepresentimplementation ofthekeyscheduling,boththeforwardandreversekeysched ulingaredoneinthesamedevice. TheByteSuboperationrequiredinthekeyexpansionunitisi mplementedusingtheS-Boxes.Four 51

PAGE 62

XOR XOR XOR Trans 8 8 8 88 8 8 8 8 8 Z YMULTIPLEXER (Encryption, Decryption)8 8 8 8 8 8 8 8 8 T0 IN3 IN2 IN1 IN0 XOR XOR Trans XOR XOR Trans Trans XOR 8 8 8 8 Figure4.6.HardwareImplementationofComputationofY,Zi nMixColumnTransformation 52

PAGE 63

Multiplexer Multiplexer Multiplexer Multiplexer XOR XOR XOR XOR XOR XOR XORXOR Multiplexer(a7 ; a7') (a7 ; a7') (a7 ; a7') (a7 ; a7') (a7 ; a7') (a7 ; a7') (a7 ; a7') (a7 ; a7')0 a0 a1 a2 a3 a4 a5 a6 a7 Multiplexer Multiplexer Multiplexer Figure4.7.HardwareImplementationofMultiplicationbyX (hex”02”) S-Boxesareneededfora128bitkeyand128bitdatablock.Mul tiplexersareusedasacontrol signaltodistinguishbetweentheinitialkeyandtheroundk eyobtainedfromtheinitialkeybykey expansionunit.Four32bitregistersdesignedusingtheDFl ip-Flopsareusedtostoretheroundkey generated.TheS-boxesareimplementedusingthe8*256ROMc ells.Theleastsignicant32bits ofthe128bitkeyiscyclicallyshiftedtotheleftbyabyte.T hisleftshiftoperationisimplemented usingthecombinationallogic.Theresultingwordafterthe leftshiftoperationaresentthroughthe S-boxesandtheafnemappingoperation,inordertoperform ByteSubtransformation.Theafne mappingisimplementedusingtheXORgatesandthemultiplex ers.TheByteSubtransformation isperformedinbothencryptionanddecryptionmodes.Then, thekeyresultingfromtheByteSub transformationisXORedwiththeRoundConstant(RCON).The roundconstantcanbegenerated eitherbyusinglook-uptablesorcanbegeneratedusingthec ombinationallogic.Inthepresent architecture,theroundconstantisgeneratedusingthecom binationallogic.Theroundconstant generatedshouldinsymmetrywiththeroundkeybeinggenera ted.Theresultobtainedafterthe XORoperationisstoredintheregister.Theregistersareim plementedbyanedgetriggeredD lp-ops. Thetotalnumberofroundconstantsthatneedstobegenerate dareequaltothenumberof rounds.Inthepresentimplementationthekeylengthisequa lto128bitsandtheblocklengthis equalto128bits.Hence,totalnumberofroundconstantsnee dedare10.Theroundconstantscan bestoredinthelook-uptableorcanbecomputedinrealtime. Thelook-uptableconsistsofallthe 53

PAGE 64

Y[0]Z Z Z 32 32 32 32 Fourth Column of State Matrix Matrix Matrix Second Column of StateThird Column of State Matrix First Column of State Z [0]MIXCOLUMN COMPUTATION OF Y,Z (Encryption, Decryption)Out[3][2] Out[2][1] Out[2][2] Out[1][3] Out[1][2] Out[1][1] Out[2][3]Out[3][3] Out[3][1] Out[3][0] Out[2][0] Out[1][0] Out[0][3] Out[0][2] Out[0][1] Out[0][0] [3] Y[3] [2] Y[2] [1] Y[1] Figure4.8.HardwareImplementationofMixColumnTransfor mationfor128bits 54

PAGE 65

XOR Trans XOR XOR IN[i mod 4]IN[(i+1) mod 4] OUT[i mod 4] 8 8 8 8 8 8 8Y \ Z OUT[i mod 4] Figure4.9.HardwareImplementationoftheShiftRowTransf oramtionfor32-bits 55

PAGE 66

K (i) 00 K (i) 01 00 B (i) 01 B (i) 00 B (i+1)01 B (i+1)B (i+1) 33 B (i) K (i) 32 K (i) 33 32 B (i) B (i+1) 3233 16th byte 15th byte Ist byte 2nd byteFigure4.10.HardwareImplementationofRoundKeyAddition Transformation required10roundconstants.Theroundconstantvaluesfor1 0roundscanbelistedinhexadecimal formatasinTable4.3.AsmentionedinChapter3,theroundco nstantisobtainedbymultiplying thepreviousroundconstantbyX.Thisisamenableforimplem entationinthehardwareusingXOR operations. Forthereversekeyscheduling,thelastroundkeyshouldbeg eneratedwithforwardkeyschedulingforthersttime.Thelastroundkeyisexpandedtogenera tethereverseroundkeys.The correspondingroundkeyisgeneratedinoneclockcycle.Dec ryptionrequiresmorecyclesthan encryptionbecauseitneedspre-schedulingtogeneratethe lastkeyvalue.Thegenerationtheround keysrequiredfortheimplementationofthealgorithmissho wnintheFigure4.11Theimplementationofthegenerationoftheroundconstantsisshowninth eFigure4.12.TheImplementationof ”MultiplicationbyX”moduleisshownintheFigure4.7 SincetheRijndaelalgorithmallowsdifferentkeylengthsa ndblocklengths,eachroundkeyis carefullysettohavethesamelengthasthedatablock.Fromt hespecicationofthealgorithm,the originalkeyisusedtogenerateasequenceoftheentireroun dkeystream,andchunksofroundkeys areselectedfortheencryptionmoduleaccordingtothebloc klength. 56

PAGE 67

round constant enc_dec enc_dec enc_dec Round Key 0Round Key 1Round Key 2Round Key 3 Register 0Register 1Register 2Register 3 MUX MUX MUX MUX initialkey0initialkey1initialkey2initialkey3 MUXleftshiftS-box 32 32 32 32 128 MUX MUX MUX Feedback initial_norrmalinitial_norrmalinitial_norrmalinitial_norrmal enc_dec Figure4.11.KeySchedulingforEncryptionandDecryptionP rocesses57

PAGE 68

8 Multiplexer Normal Round 8-bit Register Multiplication by X (hex "02") 88 Initial Round FeedbackFigure4.12.ImplementationoftheRoundConstants 58

PAGE 69

Table4.3.ListofRoundConstantsforeachStandardRound RoundNumber RoundConstantValueinhexadecimal Initial 01 1 02 2 04 3 08 4 10 5 20 6 40 7 80 8 1B 9 36 10 6C Inthecaseof128-128(block-keylength)thegeneratedroun dkeyscouldbefedtotheencryption moduledirectlywithoutanyreorganization.Inthecasewhe rekeylengthandtheblocklength arenotequal,previous,currentandalsothenextroundkeys areneededinordertogeneratethe appropriatesetofroundkeysthatarefedintotheencryptio nmodule.Thekeyalignmentfor128128(block-key)isshownintheFigure4.134.1.3IterativeImplementation Thekeylengthisspeciedto128bitsinthepresentimplemen tationoftheRijndaelalgorithm. Theimplementationincludes9roundsofstandardroundando nenalround.Acontrolsignal ”RoundType”isusedtodistinguishthestandardroundandth enalround.Whentheinputto thecontrolsignalisnalroundtheoutputoftheShiftrowtr ansformationispassedontotheKey Additionmoduledirectlywhichresultsintheciphertext.T he”Mode”controlsignaldistinguishes betweentheencryptionandthedecryption.Whentheencrypt ionmodeisselectedtheorderofow ofthedataisasfollows: ByteSubShiftRowMixColumnKeyAddition.WhentheRoundTypesignalhas normalroundastheinput,wefeed-backtheoutputresulting fromtheKeyadditionmoduletothe ByteSubmoduleofthenextround.WhentheRoundTypesignalh asnalroundastheinput,the outputresultingfromtheKeyadditionmoduleistheciphert ext. Similarly,whenthedecryptionmodeisselected,theordero fowofthedataisasfollows: ShiftRowByteSubKeyAdditionMixCloumn.WhentheRoundTypesignalhasnormal 59

PAGE 70

128 128 Key Schedule Key SubkeyFigure4.13.128-128(block-key)KeyAlignment roundastheinput,wefeed-backtheoutputresultingfromth eMixColumnmoduletotheShift Rowmoduleofthenextround.WhentheRoundTypesignalhasn alroundastheinput,theoutput resultingfromtheKeyAdditionmoduleistheplaintext. 4-bitcounterisusedtodeterminethetypeofround.Whenthe countislessthanorequalto 9( )theroundtypeisconsideredasthestandardround.Whenthe counterhasthevalue10 ( ),theroundtypeisthenalround. 4.2OrderofImplementationoftheTransformations IntheRijndaelalgorithm,theencryptionanddecryptionus esameoperationsbutindifferent order.Inthedecryption,inversetransformationsofthero undfunctionsareapplied.Thesequencein whichthetransformationsoftheroundfunctionareapplied differsfromthatintheencryption.For encryption,initialroundkeyadditioninvolvesXORingofi nputkeywiththeplaintextduringencryptionandciphertextduringdecryption.Thesecondthro ughtenthkeyadditioninvolvesXORing oftheroundkeywiththeMixColumnoutputforencryptionand theinverseoftheByteSuboutput fordecryption.ThenalkeyadditioninvolvesXORingofthe nalroundkeywiththeoutputofthe ShiftRowforencryptionandtheinverseoftheByteSubforde cryptionrespectively.Theorderof operationsisshownintheFigure4.14 60

PAGE 71

KA INV -> LT -> SR -> MC -> KA KA INV -> LT -> SR -> KA Normal Round (N r 1 times) Normal Round (N r 1 times) Initial Key Round Key Final Key Final Key Round Key Initial Key Plain Text Cipher Text Initial Round Final RoundInitial Round Final Round Plain Text DECRYPTION ENCRYPTION INV -> MULTIPLICATIVE INVERSELT -> LINEAR TRANSFORMATIONSR -> SHIFT ROW MC -> MIX COLUMNKA -> KEY ADDITIONN r = 10 for 128-128 (block-key)length -1-1 MC
PAGE 72

Theoutputofeachstandardroundhastobestoredintheregis ters.Thecontentoftheregisters isfedbackasinputtothenextroundofthealgorithm.Theout putofthekeyadditiontransformation isstoredintheregistersandisthenfedbackasinputtotheb ytesubtransformation,thusallowing forpipelining.Theregistersusedtostoretheintermediat evaluearecalledinternalregisters.By usingthepipeliningconcept,thecomponentsoftheroundne ednotbeduplicatedwhichreducesthe amountofarearequired.4.3ImplementationAnalysis WhenimplementingtheRijndaelalgorithm,itwasrstdeter minedthattheRijndaelS-Boxes werethedominantelementoftheroundfunctionintermsofre quiredlogicresources.EachRijndaelroundrequiressixteencopiesoftheS-Boxes,eachofwh ichisan8-bitto8-bitlook-up-table, requiringsignicanthardwareresources.However,therema iningcomponentsoftheRijndaelround function–byteswapping,constantGaloiseldmultiplicat ion,andkeyaddition–werefoundto besimplerinstructure,resultingintheseelementsofther oundfunctionrequiringfewhardware resources.Additionally,itwasfoundthatthesynthesisto olscouldnotminimizetheoverallsizeof aRijndaelroundsufcientlytoallowforafullyunrolledor fullypipelinedimplementationofthe entiretenroundsofthealgorithmwithinthetargetFPGA[32 ]. Partiallypipelinedimplementationofoneroundwithonesu b-pipelinestageprovidedthemost area-optimizedsolution.Ascomparedtoaone-stageimplem entationwithnosub-pipelining,the additionofasub-pipelinestageaffordedthesynthesistoo lgreaterexibilityinitsoptimizations, resultinginamoreareaefcientimplementation.The2-sta geloopunrollingwasfoundtoyieldthe highestthroughputwhenoperatinginFeedBack(FB)mode.4.4MemoryArchitectureOptimization Sincethedesignisbasedononeclockcycleforeachencrypti onround,wehavetoduplicate thememorymodulesseveraltimes.Forexample,intheByteSu bstitutiontranformation,theSboxesneedtobeduplicatedfor16timestoobtaintheresulti noneclockcycle.Consequently,the choiceofmemoryarchitectureisverycritical.Sinceallth etableentriesarexedanddenedinthe standard,theusageofReadOnlyMemory(ROM)ispreferredto RandomAccessMemory(RAM). 62

PAGE 73

Specically,thealgorithmwillrequirealotofsmallROMmo dulesinsteadofonelargemodule, sinceeachlookupwillonlybebasedonamaximumof8-bitaddr ess,whichtranslatesto256entries. WeimplementedtheMultiplicativeinversesfunctionusing thelook-uptableofsize8X256.Itisa 256-entrytablewitheachentryoflength8-bits.Wehaveato talof20copiesoftheS-boxesinour design;16oftheminencryptionmoduleand4inthekeyschedu lingmodule. Kuoetal[48]useslookuptablefortheimplementationofthe shiftrowmoduleandforthe generationoftheroundconstantsinthekeyschedulingmodu le.Ourimplementation,hasusedthe combinationallogicinsteadofthelook-uptables,thusred ucingthearea. 63

PAGE 74

CHAPTER5 SIMULATIONANDPERFORMANCE Inthischapter,wepresentthemethodologyusedtodesign,s imulate,andverifytheproposed architecturefortheRijndaelalgorithm.Thischapterpres entsasetofexperimentalresultstoverify theaccuracyandefciencyofeachmoduleofthealgorithm.E achmodulewasdesignedandtested seperatelyandlaterthesemoduleswereintegratedtoforma comprehensivedesignforthealgorithm.Theperformanceisestimatedusingthesimulationre sultsanddesignrules.Weconcludethis chapterbypresentingacomparisionoftheproposedarchite cturewiththeexistingVLSIrealizations oftheRijndaelencryptionalgorithm.5.1DesignFlow ThedesignoftheRijndaelalgorithmincludescreationofth elayoutforthedesgin,extraction ofthelayout,simulationofitusingtoolslikeHSPICEorNan oSiminordertooptimizetheperformance.Alayoutdescribesthemasksfromwhichthedesignwil lbefabricated.Layoutiscritical asitdeterminestheworkingofthechip.Therearetwowaysto dothelayout:manualandautomated.Themanuallayoutenablesthedesignertopackhisdev icesinasmallerareacomparedto automatedlayout.Therearedesignrulesforthelayout.The desginrulesaretheMOSISScalableCMOSRules.Theserulesincludestheguidelinesaboutt hespacingbetweenwells,sizesof contacts,minimumspacerequirementsbetweenapolylayera ndametallayerandsoon.Thecell libraryof0.35CMOSprimitivestandardcellssupportsnwell,pwell,onepo lyandthreemetal layers.Thelayersinthelayoutdescribethephysicalchara cteristicsofthedevicebeingdesigned. ThelayouteditorusedforthedesignistheCadenceVirtuoso Layouteditor.VirtuosoLayoutEditor providesapowerfulstreamlined,powerfulsetofcommandsw hichincorporatesadvancedediting andlayouttechniquestosupportalldesignmethodologiesa ndprocesstechnologies.Therststep inthelayoutdesignistolayoutthetransistorsinthecircu itinsuchawaythattheareaoccupiedby 64

PAGE 75

thelayoutdesignissmall.Oncethecreationoflayoutdesig nisnished,theI/Opinshavetobe addedtothecircuit.Weneedtomakesurethatthereissingle VDDandGROUNDconnectionsfor thecircuit.Thenthecircuitischeckedfordesignrules.Th eserulesgivetheminimumrequirement toavoidafailureofthecircuitduetofabricationfaults. Thegeneralprocedureforanalysingacircuitistoextractt helayoutdesignandperformapostlayoutsimulation.Thepost-layoutsimulationgivesanide aofthehowthedesignwouldworkfrom thelayout.Thepost-layoutsimulationcreatesanetlistl eofthelayoutdesginusingtheHSPICE formatfromthecadence.ThenetlistleusingtheHSPICEfor mathas.spextension.Runthe simulatortogenerateoutputlesandviewthewaveforms.Th esimulatorsthatarecompatiblewith HSPICEformatnetlistleareHSPICEandNanoSim. HPICE:HSPICEisacommerciallyavailableextendedversion ofthetheSPICEcircuitsimulatordevelopedattheUniversityofCaliforniaatBerkeley.H SPICEperformsdetailedtransistorlevel simulations.HSPICEisatext-basedcircuitsimulatorcapa bleofperformingtransient,steadystate andfrequentdomainanalyses.Itallowsforcircuitoptimiz ation,hierarchicalnodenaming,input, outputforparameterizedcellsandinteractivewaveformsw ithAvanWaves.Aninputnetlistleis necessarytobeginthedesignentryandthesimulationproce ss.HSPICEiscase-insensitiveandsets allcharacterstolowercase. NanoSim:NanoSimisatransistor-levelpowersimulatorand analysistoolforCMOSandBiCMOSciruitdesigns.NanoSimrunstransistor-levelsimulat ionswitharuntime10to1000times fasterthanSPICEsimulationswiththesameaccuracy.Itdis playsinstantaneouscurrentwaveforms.NanoSimprovidesthefeatureAnalogCircuitEnginet hatperformsimulationonmuch largerdesignsthanthosethataresupportedbytheSPICEsim ulationswiththeaccuracyequalto thatprovidedbytheSPICEsimulations.NanoSimrunsinbatc hmodebydefault.Thenecessary andoptionallesrequiredbytheNanoSimSimulatorareshow ngraphicallyintheFigure5.1 Thenetlistleisrequiredtorunthesimulation.Itdescrib esthesystemorcircuittobesimulated. TheformatsthataresupportedbyNanoSimsimulatorare:EPI C,HSPICE/SPICE,EDIF,LSIM, VerilogandCadenceSPF.Theformatweusedforthesimulatio nisHSPICE.Thenetlistgenerated 65

PAGE 76

Necessary Input Files for NanoSim Optional Input Files for NanoSim Netlist Files Vector/Stimulus Files ModelLibraries Configuration files NanoSimSimulator Technology Files NanoSimOutput Files DISPLAY Figure5.1.NanoSimEnvironment intheHSPICEformathas.spextension.SimulationwithHSPI CEtakesmoretime.Soweused NANOSIMtosavethesimultiontime. TheproposedarchitecturewasimplementedusingtheCADENC Evirtuosolayoutdesigntool. Themethodadoptedwasacustomdesignedatthetransistorle velbasedonacustomcelllibraryof 0.35CMOSprimitivestandardcells.Ahierarchicalapproachwas followedintheimplementation ofthealgorithm.Customcelldesignmethodologywasusedfo rgeneratingthelayoutforthetransformations.Thelayoutforeachmodulewasgeneratedandlat erintegratedtoobtainthenalchip. ThesubsystemsdescribedinChapter4wereimplementedasmo dulesatthetransistorleveland testedexhaustivelybyapplyingsuitableteststimuli.The MOSISCMOSdesignruleswereusedto layoutprimitivecellsusingtheCadenceVirtuosolayouted itor.ThelayoutwasmadefreeofDesign RuleCheck(DRC)errorsandExtractionerrors.Thegenerate dcellswerethenconvertedtoanetlist. ThegeneratednetlistwasthensimulatedwithHSPICEusingt heMOSISCMOSmodelparameters togeneratethewaveforms. 66

PAGE 77

SimWave 3.18 Wed Oct 22 23:34:50 2003 time (ns) 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 v(10) v(9)v(4) Figure5.2.01and10PropagationDelaysfora2-inputCMOSXORGate ThebasicoperationsneededfortheimplementationoftheRi jndaelalgorithmare:XORgateANDgate1-bitregisterbuiltusingDip-opwithSetandResetcontr olsignals2X1Multiplexer1-bitRAMcell Toillustratethedesignmethodology,theFigure5.2showst he01and10transitionsatthe outputofa2inputbitwiseXORgateunderthebestoperatingc onditionsdueto01and10 transitionsattheeitheroftheinputs.Thepossibleoutput sofallthebasicgatesusedareshownin theFigures5.3-5.6 Asdiscussedinsection4.1.1,thedataunitoftherijndaela lgorithmconsistsoffourtransformations.TheyareByteSubstitution,ShiftRow,MixColumn, RoundKeyAdditiontransformations implementedinaspecicorderintheencryptionandthedecr yptionasmentionedintheFigure 4.14Theabovementionedtranformationsaredesignedusing theCadenceVirtuosolayoutdesigner usingcustomdesignedbasiccellssuchasXOR,AND,NAND,Mul tiplexor,RAMcell,counter, registeretc..Thedesignlayoutsofallthefourtransforma tionsareshowninthegures5.7-5.10. 67

PAGE 78

SimWave 3.18 Wed Oct 22 23:22:49 2003 time (ns) 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 v(27)v(17)v(14) Figure5.3.1BitCounterOperation SimWave 3.18 Wed Oct 22 23:42:06 2003 time (ns) 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 v(20) v(3) v(17)v(14) Figure5.4.Register 68

PAGE 79

SimWave 3.18 Wed Oct 22 23:48:10 2003 time (ns) 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 v(7)v(1)v(5) Figure5.5.ANDOperation SimWave 3.18 Thu Oct 23 00:00:39 2003 time (ns) 0.0 10.0 20.0 30.0 40.0 50.0 60.0 70.0 80.0 90.0 100.0 v(2)v(6)v(5) Figure5.6.1-bitRAMimplementation 69

PAGE 80

Figure5.7.MultiplicativeInverseLayoutforEncryptiona ndDecryption 70

PAGE 81

Figure5.8.AfneandInverseAfneMappingLayoutforEncry ptionandDecryption 71

PAGE 82

Figure5.9.MixColumnandInverseMixColumnTransformatio nLayoutforEncryptionandDecryption 72

PAGE 83

Figure5.10.KeyGenerationLayoutforEncryptionandDecry ption 73

PAGE 84

Table5.1.ComponentsoftheAES-128Module Module/Component NumberofComponents Mangardetal ProposedArchitecture Architectutre[52] DATAUNIT S-Boxes 16 16 32-bitRegistersusingD-cells 8 16 Multiplexers 240 384 32-bitMultiplexers 180 NA 128-bitMultiplexers 60 NA Multipliers 0 16 KEYUNIT S-Boxes 4 NA 32-bitRegistersusingD-cells 4 NA 32-bitMultiplexers 4 NA 5.2PerformanceEvaluation AnAES-128encryption/decryptionofa128-bitclockwasdon ein11clockcyclesusingthe feedbacklogic.Inaclockcycle,onetransformationisexec utedand,atthesametime,theappropriatekeyforthenextroundiscalculated.Thewholeproces sreachestheendwhen10rounds oftransformationsarecompleted.TheInputRegisterisuse dtokeepthetransformedStateafter everyroundofoperation.TheStateisforcedtothisregiste rwiththeuseofafeedbacktechnique. Theanalysisofthecomponentsusedfortheproposedarchite ctureisshowninthetable5.1.The architectureproposedbyMangardetal[52]usesmultiplier sfortheimplementationofthemixcolumntransformation.Inthepresentimplementation,themix columntransformationisimplemented usingtheXOR,multiplexors,invertersetc.basedonthealg ebraicequationsdiscussedinSection 4.1.1.3.Thisreducesthecomplexityofthedateunit,asthe multipliersoccupymoreareaandare morecomplex. Thethroughputachievedfortheencryption/decryptionpro cesswas232Mbits/sec.Thefrequencyoftheexternalclockwithwhichthearchitectureope rateswas20Mhz.Thecriticalpath was50ns.Thethroughputcanbecalculatedasfollows: Throughput=(blocksize*clockfrequency)/totalclockcyc les,whereclockfrequency=1/ clockperiodforthecriticalpath. =(128*20MHz)/11=232.7Mbits/sec 74

PAGE 85

Ahighthroughputof1.83Gbits/secisachievedusingthepro posedarchitecturewithsomevariations.Whenthepipeliningtechniqueisusedinsteadofthe iterativefeedbacklogic,thestandard roundsareduplicatedfor timescascadedbythepipeliningregisters.Thisincreases theeffectivearea.Ataparticularclockcycle, blocksofdatacanbeencryptedordecryptedusingthe pipeliningtechnique.Basedonthecriticalpathobtainedu singourimplementation,thethroughput achievedwithpipeliningcanbecalculatedas: Throughput=(blocksize)/TotalClock,whereTotalClockis thedelayofthesingleroundincludingthedelayscausedbythepipelinedregisters =128/70ns=1.83Gbits/sec Thesummaryoftheperformanceobtainediscomparedwiththe otherexistingimplementations. ThesummaryoftheperformanceisgivenintheTable5.2 Table5.2.SummaryofthePerformanceoftheAES-128Module Architecture Clockcycles Throughput [Mbits/sec] ProposedArchitecture 11 232 Mangardetal[52]-Standard 64 128 Mangardetal[52]-Highperf. 34 241 75

PAGE 86

CHAPTER6 CONCLUSIONS WehavepresentedaVLSIarchitecturefortheRijndael,AESa lgorithm.Theproposedarchitectureusesfeedbacklogic.Weperformboththeencryptionand decryptionmodules,withdatablock andkeyequalto128bits.ElectronicCodeBook(ECB)modewas usedforthedesignofthearchitecture.Thedatapathofthearchitecturecomprisesof roundsofrijndaelbasicblock whichconsistsoffoursequentialoperationsandthenalpr ocessingelementwhichimplementsthe outputtransformation.S-boxesareusedfortheimplementa tionofthemultiplicativeinversesand aresharedbetweenencryptionanddecryption.Thereisatra de-offbetweenspeedandtheuseof resources.Toincreasethespeed,theresourcessuchas8X25 6S-boxesarerepeatedfor16times forthedatablockoflength128bits.Theroundkeysneededfo reachroundoftheimplementation aregeneratedinreal-time.Theforwardandreversekeysche dulingisimplementedonthesamedevice,thusallowingfortheareaminimization.Althoughthe algorithmissymmetrical,thehardware requiredisnot,sincetheencryptionissimplerthanthedec ryption.ThecomplexityofthedecryptionliesintheinverseMixColumnoperationandthekeysche dulingfordecryptionrequiresmore numberofcyclesthanencryptionbecauseithastheoverhead ofpre-schedulingtogeneratethelast roundkey.TheCadenceVirtuosodesignlayoutandnanosim,h spicesimulationtoolshavebeen usedfordesign,simulationandvericationofthearchitec ture.Usinga0.35CMOS-standardcell libraryforthesynthesis,weobtainedaVLSIrealizationwh ichperformstheencryptionatarateof 232Mbits/secwiththekeylengthof128bits. Intheproposedarchitecture,thestandardroundisduplica tedfor9( )timesfollowedby thenalround.Theseroundsarecascadedbyusingthepipeli ningregisters.Usingthisarchitecture 10blocksofdatacanbetransformedatthesametime,whichre sultsinhighthroughput.The limitationforthisarchitectureisthatthehighthroughpu tisachievedatthecostofarea.Usingthe 76

PAGE 87

pipeliningprinciples,athroughputof1.83Gbits/secisac hievedforathekeyandblocklengthof 128bits. TheRijndaelalgorithmsupportsakeylengthof128,192or25 6bits.Theimplementationof thekeyunitintheproposedarchitecture,canbeextendedfo rthekeysoflength192and256bitsat thecostofthroughput. 77

PAGE 88

REFERENCES [1]”U.S.DEPARTMENTOFCOMMERCE/NationalInstituteofSta ndardsandTechnology.Announcingrequestforcandidatealgorithmnominationsfort heAdvancedEncryptionStandard(AES).”,Sep1997. [2]”ThirdAdvancedEncryptionStandard(AES)CadidateCon ference”,Apr2000. http://crscr.nist.gov/encryption/aes//round2/conf3/ aes3conf.htm. [3]”AdvancedEncryptionStandardHomePage”,2001.http:/ /csrc.nist.gov/encryption/aes. [4]”NISTFederalInformationProcessingStandards(FIPS) PUB197AdvancedEncryptionStandard”,Nov2001. [5]R.Anderson,E.Biham,andL.Knudsen.”Serpent:APropos alfortheAdvancedEncryption Standard”.availablefromhttp://www.cl.cam.ac.uk/rja1 4/serpent.html. [6]R.Anderson,E.Biham,andL.Knudsen.”SerpentandSmart cards”,1999. [7]K.AokiandH.Lipmaa.FastImplementationsofAESCandid ates.In TheThirdAdvanced EncryptionStandardCandidateConference ,pages106–120,NewYork,NY,USA,13–14Apr. 2000.NationalInstituteofStandardsandTechnology. [8]BihamandBiryukov.”HowtoStrengthenDESUsingExistin gHardware”.In ASIACRYPT: AdvancesinCryptology–ASIACRYPT:InternationalConfere nceontheTheoryandApplicationofCryptology .LNCS,Springer-Verlag,1994. [9]E.Biham.”NewTypesofCryptanalyticAttacksUsingRela tedKeys”. LectureNotesin ComputerScience ,765,1994. [10]E.Biham.”AFastNewDESImplementationinSoftware”. LectureNotesinComputer Science ,1267,1997. [11]E.BihamandA.Shamir.”Differentialcryptoanalysiso fDES-likecryptosystems”. Journalof Cryptology, ,4(1):3–72,1991. [12]A.G.BrosciusandJ.M.Smith.Exploitingparallelismi nhardwareimplementationofthe DES. LectureNotesinComputerScience ,576:367–376,1991. [13]W.P.Burleson,M.Ciesielski,F.Klass,andW.Liu.”Wav e-Pipelining:ATutorialandResearchSurvey”. IEEETransactionsonVeryLargeScaleIntegrationSystems ,6(3):464–474, Sep1998. [14]C.Burwick,D.Coppersmith,E.D'Avignon,R.Gennaro,S .Halevi,C.Jutla,S.Matyas, L.O'Connor,M.Peyravian,D.Safford,andN.Zunic.”MARS—A CandidateCipherfor AES”.citeseer.nj.nec.com/burwick99mars.html. 78

PAGE 89

[15]C.Burwick,D.Coppersmith,E.D'Avignon,R.Gennaro,S .Halevi,C.Jutla,S.M.Matyas, L.O'Connor,M.Peyravian,D.Safford,andN.Zunic.”TheMAR SEncryptionAlgorithm”. citeseer.nj.nec.com/315913.html. [16]C.C.Wang,T.Truong,H.Shao,L.Deutsch,J.Omura,andI .Reed.”VLSIArchitecturesfor ComputingMultiplicationsandInversesin ”. IEEETransactionsonComputers 34(8):709–717,Aug1985. [17]P.ChodowiecandK.Gaj.”ImplementationoftheTwoshC ipherUsingFPGADevices”, 1999. [18]S.Contini,R.Rivest,M.Robshaw,andY.Yin.”TheSecur ityoftheRC6BlockCipher”, 1998. [19]J.Daemen.”Cipherandhashfunctiondesignstrategies basedonlinearanddifferentialcryptanalysis”,March1995. [20]J.Daemen,R.Govaerts,andJ.Vandewalle.”Anefcient nonlinearshift-invarianttransformation”.InB.Macq,editor, 15thSymp.onInformationTheoryintheBenelux ,Louvain-la-Neuve (B),30-311994.WerkgemeenschapInformatie-enCommunica tietheorie,Enschede(NL). [21]J.Daemen,L.Knudsen,andV.Rijmen.”Linearframework sforblockciphers”. toappearin Design,CodesandCryptography ,1267:149,1997. [22]J.Daemen,L.Knudsen,andV.Rijmen.”TheBlockCipherS QUARE”. LectureNotesin ComputerScience ,1267,1997. [23]J.DaemenandV.Rijmen.”ASpecicationforRijndael,t heAESalgorithm”. [24]J.DaemenandV.Rijmen.”AESPublicCommentfromtheRij ndaelTeam”. [25]J.DaemenandV.Rijmen.”TheRijndaelBlockCipher,AES Proposal”,March1999. http://www.esat.kuleuven.ac.be/rijmen/rijndael. [26]J.DaemenandV.Rijmen.”RijndaelforAES”.In AESCandidateConference ,pages343–348, 2000. [27]J.DaemenandV.Rijmen.”AESProposal:Rijndael”.In NationalInstituteofStandardsand Technology ,July2001. [28]J.DaemenandV.Rijmen.”TheDesignofRijndael”.In Springer-Verlag ,2002. [29]A.Dandalis,V.K.Prasanna,andJ.D.P.Rolim.Acompara tivestudyofperformanceofAES nalcandidatesusingFPGAs. Proc.ThirdAdvancedEncryptionStandard(AES)Candidate Conf., ,April2000. [30]W.DifeandM.E.Hellman.”NewDirectionsinCryptogra phy”. IEEETransactionson InformationTheory ,IT-22(6):644–654,1976. [31]A.Elbirt,W.Yip,B.Chetwynd,andC.Paar.”AnFPGABase dPerformance EvaluationoftheAESBlockCipherCandidateAlgorithmFina lists”,2001.citeseer.nj.nec.com/elbirt01fpgabased.html. 79

PAGE 90

[32]A.J.Elbirt,W.Yip,B.Chetwynd,andC.Paar.”AnFPGAIm plementationandPerformance EvaluationoftheAESBlockCipherCandidateAlgorithmFina lists”.In AESCandidateConference ,pages13–27,2000. [33]T.ElGamal.”Apublic-keycryptosystemandasignature schemebasedondiscretelogarithms”. IEEETransactionsonInformationTheory ,IT-31:469–472,1985. [34]N.Ferguson,J.Kelsey,S.Lucks,B.Schneier,M.Stay,D .Wagner,andD.Whiting.”Improved CryptanalysisofRijndael”.In SeventhFastSoftwareEncryptionWorkshop ,page19,Berlin, Germany/Heidelberg,Germany/London,UK/etc.,2000.Spri nger-Verlag. [35]V.FischerandM.Drutarovsky.”TwoMethodsofRijndael ImplementationinRecongurable Hardware”.In CHES ,volume2162,pages77–92,2001. [36]K.GajandP.Chodowiec.”ComparisonoftheHardwarePer formanceoftheAESCandidatesUsingRecongurableHardware”. Proc.ThirdAdvancedEncryptionStandard(AES) CandidateConf., ,April2000. [37]K.GajandP.Chodowiec.”FastImplementationandFairC omparisonoftheFinalCandidatesforAdvancedEncryptionStandardUsingFieldProgram mableGateArrays”. Proc.RSA SecurityConf. ,April2001. [38]B.Gladman.”Gladman,http://www.seven77.demon.co. uk/cryptographytechnology/AES2/index.htm”. [39]H.Brunner,A.Curiger,andM.Hofstetter.”OnComputin gMultiplicativeInversesin ”. IEEETransactionsonComputers ,42(8):1010–1015,Aug1993. [40]T.Ichikawa,T.Kasuya,andM.Matsui.”HardwareEvalua tionoftheAESFinalists”.In AES CandidateConference ,pages279–285,2000. [41]T.IwataandK.Kurosawa.”OnthePseudorandomnessofth eAESFinalists—RC6and Serpent”. LectureNotesinComputerScience ,1978,2001. [42]T.JakobsenandL.R.Knudsen.”TheInterpolationAttac konBlockCiphers”. LectureNotes inComputerScience ,1267:28+,1997. [43]J.Kaps.”HighSpeedFPGAArchitecturesfortheDataEnc ryptionStandard”,May1998. [44]J.-P.KapsandC.Paar.”FastDESImplementationforFPG AsandItsApplicationtoaUniversalKey-SearchMachine”.In SelectedAreasinCryptography ,pages234–247,1998. [45]K.Araki,I.Fujita,andM.Morisue.”FastInvertersove rFiniteFieldBasedonEuclidsAlgorithm”. TransactionsIEICE ,72(11):1230–1234,Nov1989. [46]J.Kelsey,B.Schneier,andD.Wagner.”Key-ScheduleCr yptanalysisofIDEA,G-DES,GOST, SAFER,andTriple-DES”. LectureNotesinComputerScience ,1109:237–251,1996. [47]J.Kelsey,B.Schneier,D.Wagner,andC.Hall.”Cryptan alyticAttacksonPseudorandom NumberGenerators”. LectureNotesinComputerScience ,1372:168–188,1998. [48]H.KuoandI.Verbauwhede.”ArchitecturalOptimizatio nfora1.82Gbits/secVLSIImplementationoftheAESRijndaelAlgorithm”.In CHES ,volume2162,pages51–64,2001. 80

PAGE 91

[49]X.LaiandJ.Massey.”Aproposalforanewblockencrypti onstandard”. Proceedingsof Eurocrypt ,pages389–404,1990. [50]X.LaiandJ.Massey.”Markovciphersanddifferentialc ryptoanalysis”. Proceedingsof Eurocrypt ,pages17–38,1991. [51]Y.LeeandY.Park.”ImplementationofRijndaelBlockCi pherAlgorithm”. [52]S.Mangard,M.Aigner,andS.Dominikus.”AHighlyRegul arandScalableAESHardware Architecture”. IEEETransactionsonComputers ,52(4):483–491,April2003. [53]M.McLooneandJ.McCanny.”HighPerformanceSingle-Ch ipFPGARijndaelAlgorithm Implementations”. Proc.WorkshopsCryptographicHardwareandEmbeddedSyste ms. ,CHES 2001:65–76,2001. [54]V.Mistry.”AVLSIArchitectureforIDEAEncryptionAlg orithm”. Master'sThesis ,August 1999. [55]M.Matsui.”LinearcryptoanalysismethodforDESciphe r”. InAdvancesinCryptology,Eurocrypt'93 ,pages386–397,1993. [56]P.Mroczkowski.”ImplementationoftheBlockCipherRi jndaelUsingAlteraFPGA”,2001. [57]S.MurphyandM.Robshaw.”FurtherCommentsontheStruc tureofRijndael”,2000. [58]S.MurphyandM.Robshaw.”NewobservationsonRijndael ”,2000. [59]S.MurphyandM.Robshaw.”EssentialAlgebraicStructu reWithintheAES”.2002.To appearinProc.ofCrypto2002. [60]J.Nechvatal.”ReportontheDevelopmentoftheAdvance dEncryptionStandard”,2000. [61]N.SklavosandO.Koufopavlou.”ArchitecturesandVLSI ImplementationsoftheAESproposalRijndael”. IEEETransactionsonComputers ,51(12):1454–1459,Dec2002. [62]K.Nyberg.”Differentiallyuniformmappingsforcrypt ography”. InAdvancesinCryptology, Eurocrypt'93 ,pages55–64,1993. [63]N.B.ofStandards.”DataEncryptionStandard”. U.S.DepartmentofCommerce ,FIPSPUB 46,January1977. [64]N.B.ofStandards.”DataEncryptionStandard”. U.S.DepartmentofCommerce ,FIPSPUB 46-1,January1988. [65]A.S.R.RivestandL.Adleman.”Amehtodforobtainingdi gitalsignaturesandpublic-key cryptosystems”. CommunicationsoftheACM ,21(2):120–126,February1978. [66]V.Rijmen.”EfcientImplementationoftheRijndaelSbox”. [67]V.Rijmen.”Cryptanalysisanddesignofiteratedblock ciphers”,October1997. [68]R.L.Rivest,M.J.B.Robshaw,R.Sidney,andY.L.Yin.”T heRC6BlockCipher”.citeseer.nj.nec.com/rivest98rc.html. 81

PAGE 92

[69]P.J.Robertson,E.L.Witzke,D.C.Wilcox,L.G.Pierson ,andK.Gass.”ADESASIC SuitableforNetworkEncryptionat10GbpsandBeyond”.In CHES ,volume1717,pages 37–48,2000. [70]P.RommensandW.Fichtner.”2Gbit/sHardwareRealizat ionsofRIJNDAELandSERPENT: AComparativeAnalysis”. TechnicalReport ,2002. [71]R.Oppliger.”SecurityattheInternetLayer”. IEEEComputers ,Sep1998. [72]G.Rouvroya,F.-X.Standaer,J.-J.Quisquater,andJ.D.Legat.”EfcientusesofFPGAsfor implementationsofDESanditsexperimentallinearcryptan alysis”. IEEETrans.onComputers ,52(4):473–482,2003. [73]A.Rudra,P.K.Dubey,C.S.Jutla,V.Kumar,J.R.Rao,and P.Rohatgi.”EfcientImplementationofRijndaelEncryptionWithCompositeFieldArithme tic”. [74]A.Satoh,S.Morioka,K.Takano,andS.Munetoh.”ACompa ctRijndaelHardwareArchitecturewithS-BoxOptimization”. Proc.AdvancesinCryptology-ASIACRYPT2001 ,pages 171–184,2001. [75]A.Satoh,N.Ooba,K.Takano,andE.D'Avignon.”High-Sp eedMARSHardware”.In AES CandidateConference ,pages305–316,2000. [76]P.Schaumont,H.Kuo,andI.Verbauwhede.”Unlockingth eDesginSecretsofa2.29Gb/s RijndaelCore”. DesignAutomationConference ,2002. [77]B.Schneier.”AppliedCryptographyProtocols,Algori thmsandSourceCodeinC”,1996. [78]J.H.Shim,D.W.Kim,Y.K.Kang,T.W.Kwon,andJ.R.Choi. ”ARijndaelCryptoprocessor UsingSharedOn-the-yKeyScheduler”.In ASIC ,2002. [79]P.SmithandC.Skinner.”Apublic-keycryptosystemand adigitalsignaturesystembasedon theLucasfunctionanaloguetodiscretelogarithms”. AdvancesinCryptology,Asiacrypt'94 1995. [80]A.Sorkin.”LUCIFER,acryptographicalgorithm”. Cryptologia ,8(1):22–41,1984. [81]E.ThiagarajanandM.Gourishetty.”StudyofAESandits EfcientSoftwareImplementation”. [82]R.P.S.TrimbergerandA.Singh.”A12GbpsDESEncryptor /DecryptorCoreinanFPGA”. In CHES ,volume1717,pages156–163,2000. [83]S.VaudenayandS.Moriai.”Comparisonoftherandomnes sprovidedbysomeAEScandidates”. [84]I.Verbauwhede,P.Schaumont,andH.Kuo.”DesignandPe rformanceTestingofa2.29-GB/s RijndaelProcessor”. IEEEJournalofSolid-StateCircuits ,38(3):569–572,Mar2003. [85]N.WeaverandJ.Wawrzynek.”AComparisionoftheAESCan didatesAmenabilitytoFPGA Implementation”. Proc.ThirdAdvancedEncryptionStandard(AES)CandidateC onf. ,pages 28–39,April2000. [86]B.Weeks,M.Bean,T.Rozylowicz,andC.Ficke.”Hardwar ePerformanceSimulationsof Round2AdvancedEncryptionStandardAlgorithms”.In AESCandidateConference ,pages 286–304,2000. 82

PAGE 93

[87]D.WhitingandS.B.B.Schneier.”AESKeyAgilityIssues inHighSpeedIPsecImplementations”.In PublicCommentsonAESCandidateAlgorithmsRound2 [88]A.M.YoussefandS.E.Tavares.”OnSomeAlgebraicStruc turesintheAESRoundFunction”. 83


xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001441490
003 fts
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 031203s2003 flua sbm s000|0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0000163
035
(OCoLC)53962291
9
AJM5930
b SE
SFE0000163
040
FHM
c FHM
090
TK7885
1 100
Kosaraju, Naga M.
2 245
A VLSI architecture for Rijndael, the advanced encryption standard
h [electronic resource] /
by Naga M. Kosaraju.
260
[Tampa, Fla.] :
University of South Florida,
2003.
502
Thesis (M.S.Cp.E.)--University of South Florida, 2003.
504
Includes bibliographical references.
516
Text (Electronic thesis) in PDF format.
538
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
500
Title from PDF of title page.
Document formatted into pages; contains 93 pages.
520
ABSTRACT: The increasing application of cryptographic algorithms to ensure secure communications across virtual networks has led to an ever-growing demand for high performance hardware implementations of the encryption/decryption methods. The inevitable inclusion of the cryptographic algorithms in network communications has led to the development of several encryption standards, one of the prominent ones among which, is the Rijndael, the Advanced Encryption Standard. Rijndael was chosen as the Advanced Encryption Standard (AES) by the National Institute of Standard and Technology (NIST), in October 2000, as a replacement for the Data Encryption Standard (DES). This thesis presents the architecture for the VLSI implementation of the Rijndael, the Advanced Encryption Standard algorithm. Rijndael is an iterated, symmetric block cipher with a variable key length and block length. The block length is fixed at 128 bits by the AES standard [4]. The key length can be designed for 128,192 or 256 bits. The VLSI implementation, presented in this thesis, is based on a feed-back logic and allows a key length specification of 128-bits. The present architecture is implemented in the Electronic Code Book(ECB) mode of operation. The proposed architecture is further optimized for area through resource-sharing between the encryption and decryption modules. The architecture includes a Key-Scheduler module for the forward-key and reverse-key scheduling during encryption and decryption respectively. The subkeys, required for each round of the Rijndael algorithm, are generated in real-time by the Key-Scheduler module by expanding the initial secret key. The proposed architecture is designed using the Custom-Design Layout methodology with the Cadence Virtuoso tools and tested using the Avanti Hspice and the Nanosim CAD tools. Successful implementation of the algorithm using iterativearchitecture resulted in a throughput of 232 Mbits/sec on a 0.35[mu] CMOS technology. Using 0.35[mu] CMOS technology, implementation of the algorithm using pipelining architecture resulted in a throughput of 1.83 Gbits/sec. The performance of this implementation is compared with similar architectures reported in the literature.
590
Adviser: Varanasi, Murali
653
cryptography.
aes.
hardware architecture.
real time key scheduling.
0 690
Dissertations, Academic
z USF
x Computer Engineering
Masters.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.163