USF Libraries
USF Digital Collections

Embedded cryptography

MISSING IMAGE

Material Information

Title:
Embedded cryptography an analysis and evaluation of performance and code optimization techniques for encryption and decryption in embedded systems
Physical Description:
Book
Language:
English
Creator:
Kandi, Jayavardhan R
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla.
Publication Date:

Subjects

Subjects / Keywords:
co-synthesis
dsp
rijndael
aes
starcore
Dissertations, Academic -- Electrical Engineering -- Masters -- USF   ( lcsh )
Genre:
government publication (state, provincial, terriorial, dependent)   ( marcgt )
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Summary:
ABSTRACT: It is clear that Cryptography is computationally intensive. It is also known that embedded systems have slow clock rates and less memory. The idea for this thesis was to study the possibilities for analysis of cryptography on embedded systems. The basic approach was the implementation of cryptographic algorithms on high-end, state-of-the-art, DSP chips in order to study the various parameters that optimize the performance of the chip while keeping the overhead of encryption and decryption to a minimum. Embedded systems are very resource sensitive. An embedded system is composed of different components, which are implemented in both hardware and software. Therefore, hardware-software co-synthesis is a crucial factor affecting the performance of embedded systems. Encryption algorithms are generally classified as data-dominated systems rather than ubiquitous control-dominated systems. Data-dominated systems have a high degree of parallelism. Embedded systems populate the new generation gadgets such as cell phones and Smartcards where the encryption algorithms are obviously an integral part of the system. Due to the proliferation of embedded systems in all the current areas, there is a need for the systematic study of encryption techniques from the embedded systems point of view. This thesis explored the different ways encryption algorithms can be made to run faster with much less memory. Some of the issues investigated were overlapped scheduling techniques for high-level synthesis, structural partitioning, real-time issues, reusability and functionality, random number and unique key generators, seamless integration of cryptographic code with other applications and architecture specific optimization techniques.
Thesis:
Thesis (M.S.E.E.)--University of South Florida, 2003.
Bibliography:
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
Statement of Responsibility:
by Jayavardhan R. Kandi.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 84 pages.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001441478
oclc - 54018230
notis - AJM5918
usfldc doi - E14-SFE0000151
usfldc handle - e14.151
System ID:
SFS0024847:00001


This item is only available as the following downloads:


Full Text

PAGE 1

i Embedded Cryptography: An Analysis and Evaluation of Performance and Code Optimization Techniques for Encryption and Decryption in Embedded Systems by Jayavardhan R. Kandi A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering Department of Electrical Engineering College of Engineering University of South Florida Major Professor: Dr. Wilfrido Moreno, Ph.D. Dr. James Leffew, Ph.D. Dr. Kenneth Buckle, Ph.D. Date of Approval: July 17th, 2003 Keywords: AES, Rijndael, DSP, Co-synthesis, StarCore Copyright 2003, Jayavardhan R. Kandi

PAGE 2

ii DEDICATION To Dr. Wilfrido Moreno

PAGE 3

iii ACKNOWLEDGMENTS It gives me a great pleasure in acknowledging the persons who have helped me in this endeavor. I would like to thank and express my sincere gratitude to my Major Professor and advisor Dr. Wilfrido Moreno for all his support and the freedom he provided me in my Masters career. I would also like to thank Dr. James Leffew and Dr. Kenneth Buckle for their consent to be my committee members. I thank my colleagues Mr. Eduardo Zurek, Mr. Luis Navarrete and Mr. Jorge Galvis for teaching me a lot about how things go about in this world. I thank my friends Mr. Lolla, Mr. Barri and Mr. White Murthy for their moral support. I also thank my roommates and the whole bunch of the Miguel Ct. Volley Ball team for their everlasting trust and belief in my work. I thank my family, for whom my life and all my achievements are dedicated. My final thanks to the inspiration I ha ve been receiving from beyond this physical realm.

PAGE 4

i TABLE OF CONTENTS LIST OF TABLES v LIST OF FIGURES vi LIST OF CODE SAMPLES viii ABSTRACT ix CHAPTER 1. INTRODUCTION 1 CHAPTER 2. EMBEDDED SYSTEMS 4 2.1 Hardware/Software Co-design 5 2.2 Structural Partitioning 5 2.3 Functional Partitioning 5 2.4 Classification 5 2.5 Programming Languages 6 CHAPTER 3. CRYPTOGRAPHY 8 3.1 Symmetric Algorithms 9 3.2 Asymmetric Algorithms 9 3.3 Building Blocks of an Algorithm 9 3.4 Key Length 10 3.5 Algorithm Modes 10 3.5.1 Electronic Code Book 10 3.5.2 Cipher Block Chaining 11 3.5.3 Cipher Feedback Mode 11 3.5.4 Output Feedback Mode 12 3.5.5 Counter Mode 13

PAGE 5

ii 3.6 Selection of Algorithm 14 3.7 Hardware/Software Co-synthesis 15 3.7.1 Hardware Encryption 15 3.7.2 Software Encryption 15 3.7.3 Hardware/Software Encryption 15 3.8 Advanced Encryption Standard 15 CHAPTER 4. SYSTEM DESIGN 16 4.1 StarCore-Hardware Overview 16 4.1.1 SC140 Core 16 4.1.2 System Interface Unit 17 4.1.3 Communications Processor Module 18 4.1.4 Buses 18 4.2 Advanced Encryption Standard 18 4.2.1 Round Transformations 19 4.2.2 Key Expansion 19 4.2.3 SubBytes( ) Transformation 21 4.2.4 ShiftRows( ) Transformation 22 4.2.5 MixColumns( ) Transformation 22 4.2.6 AddRoundKey( ) Transformation 23 4.2.7 Inverse Cipher 24 CHAPTER 5. IMPLEMENTATION 25 5.1 Structure 25 5.2 Development Process 25 5.2.1 High-level Synthesis 26 5.2.2 Low-level Synthesis 26 5.2.3 Portability 27 5.2.4 Modularization 27 5.2.5 Compiler Exploitation 27 5.3 Optimizations 27

PAGE 6

iii 5.3.1 Structural Partitioning 28 5.3.2 Critical Paths 28 5.3.3 Computational Complexity 28 5.3.4 Reusability and Functionality 29 5.3.5 Parallel Tasks 30 5.3.6 Instruction-level Parallelism 30 5.3.7 Recursive Tasks 30 5.3.8 Pipelining Tasks 31 5.3.9 Conditional Tasks 32 5.4 Critical Issues 34 5.4.1 Interrupt Service Management 34 5.4.2 Time-sliced Multi-tasking 35 5.4.3 I/O Queues Management 35 CHAPTER 6. ATTACKS AND COUNTER MEASURES 36 6.1 Implementation Attacks 36 6.2 Side-channel Cryptanalysis 37 6.2.1 Timing Attacks 37 6.2.2 Power Attacks 37 6.2.3 Probing Attacks 37 6.2.4 Fault Induction Attacks 38 6.3 Counter Measures 38 6.3.1 Constant-time Implementation 38 6.3.2 Power Attacks 38 6.3.3 Probing Attacks 38 6.3.4 Random Number and Unique Key Generators 38 CHAPTER 7. RESULTS AND DISCUSSION 40 7.1 Results 40 7.2 Discussion 52 7.2.1 8-bit Platforms 53

PAGE 7

iv 7.2.2 32/64-bit Platforms 53 7.3.3 Optimization 53 CHAPTER 8. CONCLUSION AND RECOMMENDATIONS FOR FUTURE WORK 55 8.1 Conclusion 55 8.2 Recommendations for Future Work 55 REFERENCES 57 BIBLIOGRAPHY 59 APPENDICES 60 APPENDIX A. Optimized C Code for the AES 61 INDEX 70

PAGE 8

v LIST OF TABLES Table 4.2.1: Different key lengths and corresponding number of rounds 19 Table 7.1.1: Execution time in clock cycles at various stages of code development 41 Table 7.1.2: Profiling information for ‘opt-g 3’ stage 42 Table 7.1.3: Speed performance of various modules 52

PAGE 9

vi LIST OF FIGURES Figure 3.5.1: Block cipher encryption in electronic code book (ECB) mode 10 Figure 3.5.2: Block cipher encryption in cipher block chaining (CBC) mode 11 Figure 3.5.3: Stream cipher encryption in cipher feedback (CFB) mode 12 Figure 3.5.4: Stream cipher encryption in output feedback (OFB) mode 13 Figure 3.5.5: Stream cipher encryption in counter (CTR) mode 14 Figure 4.1.1: Block diagram of MSC8101 – courtesy of Motorola Inc. 16 Figure 4.2.1: Input data layout in a 2-D array 19 Figure 4.2.2: SubBytes() Transformation acts on the individual bytes 21 Figure 4.2.3: ShiftRows( ) transformation operating on individual rows 22 Figure 4.2.4: InvShiftRows( ) transfor mation operating on individual rows 22 Figure 4.2.5: MixColumns() operation on each column of the state 23 Figure 4.2.6: AddRoundKey() transformation 24 Figure 5.1.1: Encryption and decryption modules 25 Figure 5.2.1: Block diagram of the system development process 26 Figure 7.1.1: Graphical profile for encrypt() of ‘opt-g 3’ Stage 43 Figure 7.1.2: Functions main(), encrypt() and decrypt() from stages Level 0 to Opt-d Space stages 43 Figure 7.1.3: Functions main(), encrypt() and decrypt() from stages Opt-e to Opt-h Space stages 44 Figure 7.1.4: Cryptographic modules from Level 0 to Opt-d Space stages 44 Figure 7.1.5: Cryptographic modules from Opt-e 0 to Opt-h Space stages 44

PAGE 10

vii Figure 7.1.6: SubBytes() module at all stages 45 Figure 7.1.7: ShiftRows() module at all stages 46 Figure 7.1.8: AddRoundKey() module at all stages 46 Figure 7.1.9: MixColumns() module at all stages 47 Figure 7.1.10: Encrypt() module at all stages 47 Figure 7.1.11: Decrypt() module at all stages 47 Figure 7.1.12: Main() function for compiler optimization 48 Figure 7.1.13: Encrypt() function for compiler optimization 48 Figure 7.1.14: Decrypt() function for compiler optimization 49 Figure 7.1.15: SubBytes() function for compiler optimization 49 Figure 7.1.16: ShiftRows() function for compiler optimization 50 Figure 7.1.17: MixColumns() function for compiler optimization 50 Figure 7.1.18: AddRoundKey() function for compiler optimization 51 Figure 7.1.19: Speed performance of various modules 51 Figure A.1: Function call tree 69

PAGE 11

viii LIST OF CODE SAMPLES Code Sample 4.2.1: Pseudo-code for an AES encrypt round transformation 20 Code Sample 4.2.2: Code for KeyExpansion( ) 20 Code Sample 4.2.3: Pseudo-code for AES decryption 24 Code Sample 5.1: Modification of MixColumns( ) 29 Code Sample 5.2: Loop unrolling for AddRoundKey( ) 31 Code Sample 5.3: Loop unrolling for SubBytes( ) 31 Code Sample 5.4: Loop unrolling and merging 32 Code Sample 5.5: Modification of InvMixColumns( ) 33 Code Sample 5.6: Removal of If-ThenElse Conditions from ShiftRows( ) 34

PAGE 12

ix EMBEDDED CRYPTOGRAPHY: AN ANALYSIS AND EVALUATION OF PERFORMANCE AND CODE OPTIMIZATION TECHNIQUES FOR ENCRYPTION AND DECRYPTION IN EMBEDDED SYSTEMS Jayavardhan R Kandi ABSTRACT It is clear that Cryptography is computationally intensive. It is also known that embedded systems have slow clock rates and less memory. The idea for this thesis was to study the possibilities for analysis of cryptography on embedded systems. The basic approach was the implementation of cryptogr aphic algorithms on high-end, state-of-theart, DSP chips in order to study the various parameters that optimize the performance of the chip while keeping the overhead of encryption and decryption to a minimum. Embedded systems are very resource sensitive. An embedded system is composed of different components, which are implemented in both hardware and software. Therefore, hardware-software co-s ynthesis is a crucial factor affecting the performance of embedded systems. Encryption algorithms are generally classified as data-dominated systems rather than ubiquitous control-dominated systems. Datadominated systems have a high degree of parallelism. Embedded systems populate the

PAGE 13

x new generation gadgets such as cell phones and Smartcards where the encryption algorithms are obviously an integral part of the system. Due to the proliferation of embedded systems in all the current areas, there is a need for the systematic study of encryption techniques from the embedded systems point of view. This thesis explored the different ways encryption algorithms can be made to run faster with much less memory. Some of the issues investigated were overlapped scheduling techniques for high-level synthesis, structural partitioning, real-time issues, reusability and functionality, random number and unique key generators, seamless integration of cryptographic code with ot her applications and architecture specific optimization techniques.

PAGE 14

1 CHAPTER 1 INTRODUCTION Ever since man developed his communication skills, he has embarked on a journey of technological developments. These communication skills have been developed to such an extent that the information passed must, at times, be secret and authenticable. The new conditions of secrecy, authenticity and integrity have given rise to a new field of science called cryptology. Cryptology is divided into cryptography and cryptanalysis. Cryptography, deals with the art and science of encoding and decoding information, whereas, cryptanalysis deals with breaking the encoded information. As the human race advanced, it developed machines to perform strenuous physical tasks and computers to perform logical tasks. Presently, technology has advanced to a level where computers have invaded all spheres of science and technology. In the future computing power will increase and become more pervasive through transformations in the form of embedded systems. An embedded system can be defined as a computing system assigned to a specific task, which is embedded in a larger multifarious system. A typical example of an embedded system is a router in a local area network. As the need for secure data transmission grows, there is a major urgency of integrating cryptography into the embedded systems, in order to enable secure and reliable data transfer. This research explored the different factors that would enable a propitious insertion of the cryptography into the embedded systems. Embedded systems are comprised of microprocessors, microcontrollers, DSPs and FPGAs. The software that runs on these hardware devices must be both concise and precise. The cryptographic modules that help to encode and decode the data must be designed and implemented in a transparent manner in order not to consume too much of the memory and processing resources. The basic structure of this research was to incorporate the advanced encryption

PAGE 15

2 standard (AES) algorithm onto the network DSP, StarCore, of Motorola. The hardwaresoftware co-design formed the major breakthrough of this research. The AES algorithm was divided into different sub-modules, which could be run in parallel and scheduled in such a manner that processing resources were consumed only when needed. The objective of this research was to run the AES algorithm fast enough to enable the embedded system to work in a real-time environment without compromising either the secure transfer of the data or incurring any data loss. The current technology uses specialized ASICs running DES and 3-DES algorithms for encryption. With the advent of AES, as successor to DES, and the prospects of embedded systems, this research holds a prominent position in the evaluation and analysis of the structure of the Rijndael algorithm (AES) from the embedded systems point of view. Implementing a mere algorithm on hardware doesn’t ensure that the system is secure. Most of the cryptographic systems on the market are not as secure as they claim. This is due to the lack of importance given to the cryptography since the programmers deal with it as just another component of the program. A system cannot be made absolutely secure unless the cryptographic issues are kept in mind from the conception to completion. Cryptographic systems are very much different than other products. There is no outward difference between a strong cryptographic and a weak cryptographic system. Even though both may use the same algorithm and the same hardware, the secure system needs to consider all aspects of attacks and the means to prevent them. After all, a cryptographic system is only as strong as its weakest point. What makes implementing a cryptographic system challenging is that attackers do not follow any rules. Attackers try to breach the security protocols and tamper with the system in new ways that the designer might not even have thought about. Many algorithms appear to be very strong from the mathematical point of view. The most often neglected part is the implementation of these algorithms in a successful manner. The first step for a secure system is to define the threat model. The threat model should comprehensively consider how secure the data should be and what are the motivations of the attackers. Consideration of how to detect an attack and prevent system crashes is crucial. The threat model di ffers for different applications and roles. A good cryptographer is one who is adept in areas such as number theory, complexity

PAGE 16

3 theory, information theory, probability theory and abstract algebra. Implementing a good cryptographic system entails far more than just understanding the algorithm. A simple flaw like a poor random number generator or not discarding the key after its use can render the system useless. Therefore, it was a major effort of this thesis to study the aspects, which the programmers have to deal with when attempting to implement a better cryptographic system.

PAGE 17

4 CHAPTER 2 EMBEDDED SYSTEMS An embedded device is a computing system that is part of a bigger system. The major difference between embedded devices and computers is that, unlike computers, embedded devices are designed and developed for fast and efficient execution of the assigned specific task. Generally, a single em bedded device is assigned a fixed specific task for its lifetime. An embedded device must be quick enough to respond to the high priority events. In order to do so care must be taken to reduce the functional overhead as much as possible. The functions of the embedded device must be kept in mind from the hardware design inception until the end of software execution [SJBW96]. There is a great deal of hardware-software interaction involved in the sharing and execution of the algorithms to be run over the embedded devices. Therefore, programming for an embedded device is different from programming a conventional computer. The application and the controllers of the embedded systems are integrated both into the hardware and software. Therefore, the embedded systems programmer must have a thorough knowledge of both the advantages and limitations of the hardware architecture. An embedded system’s efficiency, [RL00], is invariably related to the extent of the code. The efficiency of a program increases, as the code size decreases and the execution speed increases. Therefore, implementation of programs with very tight memory constraints is a challenge and a requirement for every embedded systems programmer.

PAGE 18

5 2.1 Hardware/Software Co-design Some functions are better suited to run on hardware, and some others are suited to run on software. Co-design considers both the abilities of the hardware and the flexibility of the software so as to optimize the performance of the system. Some functions are implemented on both hardware and software. Such seamless integration forms the core of the type of programming required for embedded systems. 2.2 Structural Partitioning Interfaces with the outside world are as important as the internal structure. Special attention must be given to the interfaces and in almost all cases; they are kept outside the internal processing. This kind of structural partitioning helps in uninterrupted execution of assigned tasks, irrespective of possible overload in other structures. 2.3 Functional Partitioning Some applications are better described by functionality rather than structure. Such functional partitioning yields simpler hardware design and results in time multiplexing of the signals. Digital Signal Processors are better suited for this type of behavioral synthesis. 2.4 Classification Embedded systems are mainly classified as follows: Microcontroller Design: These systems are principally used for controldominated systems. They have a rich set of instructions for efficient bit-level data manipulation. They usually have Complex Instruction Set Computer, (CISC), architectures. RISC Architecture Systems: These systems have Reduced Instruction Set Computer, (RISC), architectures and are suited for fast execution. They usually have a large number of registers in order to speed up instruction execution. The instruction set is composed of a deliberately chosen set of instructions capable of executing multiple tasks. In other words, a single RISC instruction is equivalent to multiple CISC instructions.

PAGE 19

6 Digital Signal Processors: These kinds of systems are used for arithmeticintensive systems such as speech analysis, encryption and image processing. The DSP architecture supports hardware multiplication, address generation units and separate data and address buses. Field Programmable Gate Arrays: These kind of systems often run different functions in parallel to maximize system performance. These systems are reconfigurable in nature, which means that their functionality can be upgraded or changed altogether when needed. This provides them with fault-tolerance ability since they can be reconfigured to remove a hardware or software fault. Application Specific Processors: These kinds of systems are composed of specially designed integrated circuits called Application Specific Integrated Circuits or ASICs. ASICs are often specialized enough that they are used as common of the shelf components. These systems do not have the capability of being upgraded and are mostly hardware oriented. 2.5 Programming Languages Programming of an embedded system is a very important task. Although assemblylevel programming gives the optimum level of performance, high-level programming is still needed to design the program structure at higher abstract levels. Some of the programming languages that have gained importance in the design of embedded systems are as follows: C: The C language is a well-established and proven language in the programming community. The main advantages of the C language are that it is very easy to learn and offers an almost assembly like code to the processors. The strong point of the C language is direct memory access through pointers. C++: C++ evolved from C. When applied to the programming of embedded systems, C++ generates a considerable amount of overhead that is detrimental for embedded programming. In order to increase the run-time efficiency and reduce the code size, a new standard called EC++, which stands for Embedded C++, is being developed.

PAGE 20

7 Java: Java is an object-oriented programming language that was designed for Internet applications. However, it has become quite popular due its flexibility and some enthusiasts are applying Java to embedded systems programming. A modified version called embedded java is gaining popularity due to such features as portability and software reuse.

PAGE 21

8 CHAPTER 3 CRYPTOGRAPHY Cryptography is the science and art of encoding and decoding data in order to attribute the properties of secrecy to the data. The data to be encoded is termed as plain text. The encoded data is known as cipher text. Thus, the process of encoding and decoding can also be termed encryption and decryption respectively. The system for encrypting or decrypting is called a cryptosystem and the persons who design such systems are classified as cryptographers. The process of encryption involves an algorithm for combining the plain text with a key resulting in the cipher. A key is a selected number or string of characters that should be known only to the sender and the recipient. Cryptanalysis deals with the techniques of breaking the codes in order to extract the plaintext from the cipher without the consent of the sender or recipient. The persons who are adept at cryptanalysis are called as cryptanalysts. Both cryptography and cryptanalysis fall under the broad science of cryptology. The essence of using cryptography in this world is to validate the following three conditions: Secrecy: The data transmitted must be secret and any eavesdropper should not be able to understand it. Authenticity: The recipient must be guaranteed that the transmitted data is from an authentic sender and is not from any other person. Integrity: The transmitted data must be tamper resistant and any eavesdropper should not be able to meddle with the cipher. The security of a cryptographic algorithm should be based on the key and not on the secrecy of the algorithm. This means that the algorithm should be made public and allowed for scrutiny by the intellectual community. The types of attacks it can withstand

PAGE 22

9 are the only gauges of the strength of an algorithm. An ideal algorithm should be so strong that it can only be cracked by knowing the key. This type of attack, which exploits the different combinations of keys, is known as a bruteforce attack. The classification of cryptographic algorithms is based on the keys and is discussed next. 3.1 Symmetric Algorithms Symmetric algorithms, are those where the encryption key and the decryption key are the same or are calculated from each other. If the algorithm uses a single key for both encryption and decryption, then it is called as a single-key algorithm. Symmetric algorithms can be divided into two types: Block algorithms: These algorithms operate on a block of data each time in order to convert it into a block cipher. A typical block size would be 64 bytes. Stream algorithms: These algorithms operate on a stream of data at a single bit or a single byte at a time. 3.2 Asymmetric Algorithms Asymmetric algorithms are those that use different keys for encryption and decryption and one key cannot be computed from the other key. They are also called public-key algorithms due to the fact that the encryption key can be made public. The recipient has the private key with which only she can decrypt the cipher. 3.3 Building Blocks of an Algorithm The basic building blocks of an algorithm consist of activities such as: Substitution Cipher: This is a cipher where each character of the plain text is substituted for a preconceived cipher character. Substitution falls under the confusion scheme with the purpose of frustrating the eavesdropper. Transposition Cipher: This is a cipher where the order of the characters of a fixed block plain text is shuffled in a pred etermined fashion. Transposition falls under the diffusion scheme, which removes the redundancies in the cipher. Exclusive-OR Cipher: This is a cipher formed by a simple bit-wise XOR operation on the plain text and the keyword.

PAGE 23

10 Encryption Decryption Plaintext block ( P ) Encr yp tion ( E ) Ci p her ( C ) Decr yp tion ( D ) Plaintext block ( P ) Ke y ( K ) Ci p her ( C ) Ke y ( K ) 3.4 Key Length A bruteforce attack should be made as complex as possible. This is achieved by increasing the length of the key. If the key size is 60 bits, then there are 264 possible keys, which would require considerable processing time in order to test all the keys. If the key size is increased to 128 bits, then the number of possible keys is 2128 and this huge figure makes it very difficult to find the right key. As technology grows, computing power also increases. Therefore, care must be taken to consider the technological growth and the key-size should be aptly decided so that the algorithm will remain strong into the future. 3.5 Algorithm Modes Even though a basic algorithm is designed to be efficient, different cipher modes are implemented in order to make the algorithm efficient in concealing the patterns. The different types of cipher modes are as follows: 3.5.1 Electronic Code Book Electronic Code Book is a straightforw ard method of converting a block of plaintext into cipher text. The advantage of this mode is asynchronous enciphering of the data. However, this mode is vulnerable to attacks since the same plain text is always converted to the same cipher text. Figure 3.5.1 presents the block diagram for the Electronic Code Book (ECB) mode. Figure 3.5.1: Block cipher encryption in electronic code book (ECB) mode

PAGE 24

11 Encryption P1 E D Initialization Vector (IV) C1 K C1 K IV P1 P2 E D C2 K C2 K P2 P3 E D C3 K C3 K P3 Decryption 3.5.2 Cipher Block Chaining In Cipher Block Chaining mode, the plain text is XORed with the previous cipher text block before encryption. Thus, the encryption of each block depends on all the previous blocks. Figure 3.5.2: Block cipher encryption in cipher block chaining (CBC) mode Figure 3.5.2 presents the block diagram for the Cipher Block Chaining (CBC) mode. This mode removes the patterns when compared to the ECB mode. If a bit error occurs during encryption, it will affect all the subsequent blocks. However, during decryption, the effect is reversed and the recovered plain text will only have a single error. 3.5.3 Cipher Feedback Mode Cipher Feedback Mode is used when data must be transmitted in blocks smaller than as a full block. The incoming byte, or a group of bytes, of plaintext is XORed with the LSB of the self-synchronizing stream ciphe r. A self-synchronizing stream cipher has

PAGE 25

12 IV E P1 C1 K C1 C2 C2 P1 Encryption Decryption E P2 K C3 C3 E P3 K IV K D P2 K D P3 K D a key stream in which every bit is a function of a fixed number of previous cipher texts. Figure 3.5.3 presents the block diagram for the Cipher Feedback (CFB) mode. Figure 3.5.3: Stream cipher encryption in cipher feedback (CFB) mode 3.5.4 Output Feedback Mode The Output Feedback Mode (OFB) also uses a synchronous stream cipher but the feedback mechanism is independent of the plain text and the cipher text stream. Instead of inserting the cipher text bits as feedback, the n bits of the output block are moved to the right side of the shift register. Therefor e, this mode provides ease of processing even before the plain text arrives. When the plain text arrives, it is simply XORed with the output bits of the algorithm in order to form the cipher text. Figure 3.5.4 presents the block diagram for the Output Feedback mode.

PAGE 26

13 E P1 C1 K C1 C2 C2 P1 Encryption Decryption E P2 K C3 C3 E P3 K IV K D P2 K D P3 K D Figure 3.5.4: Stream cipher encryption in output feedback (OFB) mode 3.5.5 Counter Mode All the above modes were used in a pre-AES time. Except for the ECB mode, they all involve feedback, which is comparatively insecure and performance delay are observed. Therefore, a new type of mode was proposed, which is termed the counter mode. In this mode, a counter is encrypted to generate a key stream, which is simply XORed with the plain text in order to generate the cipher text. The advantage of counter mode is that there is no feedback or chaining. Figure 3.5.5 presents the block diagram for the Counter (CTR) mode.

PAGE 27

14 Encryption Decryption E CTR1 P1 C1 K C1 C2 C2 P1 E P2 K C3 C3 E P3 K K D P2 K D P3 K D CTR2 CTR3 CTR1 CTR2 CTR3 Figure 3.5.5: Stream cipher encryption in counter (CTR) mode 3.6 Selection of Algorithm The strength of the algorithm is as important as the length of the key. Therefore, selection of an algorithm to design a cryptosystem forms the first and most formidable task. There are many aspects that require consideration in choosing a particular algorithm, [BS96], such as: Relying on a published algorithm and hoping that the published algorithm is open to public scrutiny and hasn’t been broken Relying on some commercial product Relying on the algorithms proposed as the standards Writing a new algorithm The only way an algorithm can be considered reliable is when it has been scrutinized thoroughly and the intellectual community finds no known attacks.

PAGE 28

15 3.7 Hardware/Software Co-synthesis 3.7.1 Hardware Encryption Until recently, encryption was performed through specialized hardware chips. The inherent advantage of using hardware for encryption is speed. These hardware devices were connected at the data transfer links in order to enable encryption and decryption. Hardware is also preferred in order to make the encryption system tamperproof. 3.7.2 Software Encryption Software encryption is currently being widely used due to the software features of portability and flexibility. However, software encryption is very slow and is insecure in many aspects of key management and program manipulation. 3.7.3 Hardware/Software Encryption A new blend of hardware and software is currently being used for encryption in order to combine the best features of both. This provides a considerable speed advantage and security. Additionally, it provides for ease of programming. Future devices will contain encryption modules along with other applications. 3.8 Advanced Encryption Standard The National Institute of Standards and Technology has selected the Rijndael algorithm, [DR01], as its current encryption standard. Henceforth this algorithm will be called the Advanced Encryption Standard, (AES). The AES is supposed to be the current encryption standard, which is deemed stronger than the old DES and triple-DES standards. The Rijndael algorithm, which was invented by Vincent Rijmen and Joan Daemen, consists of data block and key lengths of 128, 160, 192, 224 or 256 bits. However, for the AES a fixed data block length of 128 bits was standardized with a variation in key sizes of 128, 192 and 256 bits. Thus, the corresponding names are AES128, AES-192 and AES-256. A detailed analysis of the AES is provided in chapter 4.

PAGE 29

16 CHAPTER 4 SYSTEM DESIGN 4.1 StarCore-Hardware Overview The Motorola MSC8101 is a 16-bit digital signal processor. This is based on the StarCore™ SC140 DSP core and is a fully static low-power CMOS device that operates from 0 to 300MHz. Figure 4.1.1 presents the block diagram of the MSC8101 processor. Figure 4.1.1: Block diagram of MSC8101 – courtesy of Motorola Inc 4.1.1 SC140 Core The SC140 core consists of the Data Arithmetic Logic Unit (Data ALU), the Address Generation Unit (AGU) and the Program Sequencer (PSEQ).

PAGE 30

17 The Data ALU performs the core’s arithmetic and logical operations on the data. It has sixteen 40-bit registers and four ALUs that execute in parallel. This provides the flexibility of executing four Multiply Accumulate (MAC) instructions in a single clock cycle. Each ALU consists of a MAC sub-unit and a bit field sub-unit (BFU). The MAC unit contains a high-speed adder and a multiplier that implement integer and fractional arithmetic instructions. The BFU handles the logical operations for the ALU. The AGU consists of two address arithmetic units (AAUs), two stack pointers, a bit mask unit (BMU) and sixteen 32-bit address registers. An AAU calculates the effective address for memory access. The AGU has two stack pointers. One pointer for normal mode execution (NSP) and one pointer for exception mode processing (ESP). The BMU performs the setting and resetting of the bits in any destination register. The Program Sequencer fetches and executes the instructions. The PSEQ has a Program Counter, (PC), which has four pair s of 32-bit loop start address registers and four hardware-based loop counters. The StarCore has an on-chip memory bank of 512 KB, which helps in running longer programs without a need for external memory. The SC140 has two extended components: The enhanced filter coprocessor (EFCOP) implements a real/complex adaptive filter machine in parallel with the SC140 core. The HDI16 provides a 16-bit parallel interface that allows the device to interconnect with other microcontrollers, microprocessors and DSPs. 4.1.2 System Interface Unit The system interface unit, (SIU), provides the control and data signals necessary for the processor to interact with other peripherals.

PAGE 31

18 4.1.3 Communications Processor Module The communications processor module, (CPM), is a 32-bit RISC processor that controls and manages the external interfaces for the device. The CPM controls the following modules: 155 Mbps ATM interface (including AAL 0/1/2/5) 10/100 Mbit Ethernet interface Up to four E1/T1 interfaces or one E3/T3 interface and one E1/T1 interface HDLC support up to T3 rates, or 256 channels 4.1.4 Buses The buses of the SC140 perform the following functions: The SC140 uses two different buses to access memory and data from the cache. SC140 has one 128-bit Program bus and two 64-bit data buses. An internal 64-bit PowerPC local bus moves data among the CPM, the DMA engine and the on-chip cache. The 64-bit PowerPC system bus manages data transfers among external memory/peripherals, the DMA engine and the SC140 core. A 128-bit QBus manages communications between the SC140 core and the extended core devices, EFCOP and HDI16. The QBus is also the interface between the processor core and the PowerPC system bus. 4.2 Advanced Encryption Standard The Advanced Encryption Standard,, or, (AES, ), operates on 128-bit data with variable key lengths of 128, 192 and 256 bits. The input plain text of 128-bits is arranged in a rectangular array of bytes that is called a state A state has four rows and the number of columns is denoted by Nb which is equal to the block length divided by 32 [G99]. Let the plaintext block be denoted by p0p1p2p3…p4.Nb-1. Where p0 denotes the first byte and p4.Nb-1 denotes the last byte of the plaintext. Figure 4.2.1 presents the arrangement of the input bits in a two-dimensional array form.

PAGE 32

19 p0p4p8p12p1p5p9p13p2p6p10p14p3p7p11p15 Figure 4.2.1: Input data layout in a 2-D array Similarly, the key is arranged into a rectangular array of bytes in four rows and a subsequent number of columns. The number of columns varies as the length of the key varies. Table 4.2.1: Different key lengths and corresponding number of rounds 4.2.1 Round Transformations The whole AES algorithm is divided into a fixed number of round transformations, which depends on the key length. Table 4.2.1 shows the number of round transformations for different key lengths Each round transformation consists of four different transformations. The final r ound is a bit different from the rest of the rounds. Code Sample 4.2.1 presents the pseudo code for an AES encrypt round transformation. 4.2.2 Key Expansion The KeyExpansion() function generates a key schedule for different rounds from the cipher key. The Key Expansion generates a total of Nb(Nr + 1) words: the algorithm requires an initial set of Nb words, and each of the Nr rounds requires Nb words of key data. The resulting key schedule consists of a linear array of 4-byte words, denoted by W[i ] with i in the range 0 i < Nb(Nr + 1) Code Sample 4.2.2 shows the code for KeyExpansion() Block size (bits) Key Length (bits) Number of Rounds AES-128 128 128 10 AES-192 128 192 12 AES-256 128 256 14

PAGE 33

20 Code Sample 4.2.1: Pseudo-code for an AES encrypt round transformation Code Sample 4.2.2: Code for KeyExpansion( ) Round(state, roundkey) { SubBytes (state); ShiftRows(state); MixColumns(state); AddRoundKey(state,roundkey); } /* Final round is special; there is no MixColumns */ FinalRound(state, roundkey) { SubBytes (state); ShiftRows(state); AddRoundKey(state, roundkey); } KeyExpansion(word8 k[4][MAXKC], word8 W[MAXROUNDS+1][4][MAXBC]) { int i, j, t=0, RCpointer = 1; word8 tk[4][MAXKC]; for(j=0; j< KC; j++) for (i=0; i<4; i++) tk[i][j] = k[i][j]; for (j=0; (j
PAGE 34

21 4.2.3 SubBytes( ) Transformation This is a non-linear byte-wise substitution of all bytes in the state The substitution transformation is simply called as an S-box. This transformation acts on the individual bytes of the state Figure 4.2.2 presents the SubBytes() Transformation. Figure 4.2.2: SubBytes() Transformation acts on the individual bytes The S-box can be implemented by a look-up table or by the following formula. The inverse of the S-box needs to be perf ormed in decryption a nd is implemented by the following formula. 1 1 0 0 0 1 1 0 1 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 00 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 ' ' ' '0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7a a a a a a a a a a a a a a a a 1 0 1 0 0 0 0 0 ' ' ' ' 0 0 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 1 0 1 0 0 1 0 0 0 1 01 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 0 0 1 0 1 00 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7a a a a a a a a a a a a a a a a a0,0a0,1a0,2a0,3a1,0a1,1a1,2a1,3a2,0a2,1a2,2a2,3a3,0a3,1a3,2a3,3 a'0,0a'0,1a'0,2a'0,3a'1,0a'1,1a'1,2a'1,3a'2,0a'2,1a'2,2a'2,3a'3,0a'3,1a'3,2a'3,3 a’ = S[a]

PAGE 35

22 4.2.4 ShiftRows( ) Transformation In the ShiftRows() transformation, each row of the state is considered separately and the bytes in that row are cyclically shifted to the left based upon the keysize of the algorithm. For the 128-bit key, the first row is unchanged. However, the second, third and fourth rows are shifted by 1, 2 and 3 bytes respectively. Figure 4.2.3 presents the ShiftRows() operation on the state Figure 4.2.3: ShiftRows( ) transformation operating on individual rows The inverse of ShiftRows is called InvShiftRows. The bytes are shifted towards the right in a cyclic shift in the InvShiftRows transformation. Figure 4.2.4 presents the InvShiftRows( ) operation on the state Figure 4.2.4: InvShiftRows( ) transf ormation operating on individual rows 4.2.5 MixColumns( ) Transformation The MixColumns() Transformation is a bricklayer permutation operating on each column of the state. This operation is depicted in the Figure 4.2.5. abcdefghijkl mnop abcd efgheij klij mno pmno abcdfgheklij pmno abcdefghe ijkl ij mnop mno

PAGE 36

23 Figure 4.2.5: MixColumns() operation on each column of the state The columns of the state are considered as polynomials over GF (28) and multiplied modulo x4+1 with a fixed polynomial c(x). The polynomial c(x) is given by The MixColumns operation is implemented by: The inverse operation for MixColumns is called InvMixColumns It is implemented by the following formula: 4.2.6 AddRoundKey( ) Transformation The AddRoundKey( ) Transformation is a simple bitwise XOR operation of the state and the round key. A round key is a special key generated for a particular round by the KeyExpansion( ). The length of the round key is equal to the block length. 3 2 1 0 3 2 1 002 01 01 03 03 02 01 01 01 03 02 01 01 01 03 02 ' ' a a a a a a a a) 1 )(mod ( ). ( ) ( '4 x x a x c x a 3 2 1 0 3 2 1 0' ' 0 09 0 0 0 0 09 0 0 0 0 09 09 0 0 0 a a a a E D B B E D D B E D B E a a a a02 01 01 03 ) (2 3 x x x x c a0,0a0,1a0,2a0,3a1,0a1,1a1,2a1,3a2,0a2,1a2,2a2,3a3,0a3,1a3,2a3,3 a'0,0a'0,1a'0,2a'0,3a'1,0a'1,1a'1,2a'1,3a'2,0a'2,1a'2,2a'2,3a'3,0a'3,1a'3,2a'3,3 MixColumns( )

PAGE 37

24 Figure 4.2.6 shows the AddRoundKey() Transformation. It performs bit-wise XOR of the state with the roundkey The AddRoundKey is its own inverse and hence the same transformation is also used in decryption. Figure 4.2.6: AddRoundKey() transformation 4.2.7 Inverse Cipher Reversing the encryption steps through the use of their inverse transformations performs decryption. Code Sample 4.2.3 presents the pseudo code for decryption round transformations. Code Sample 4.2.3: Pseudo-code for AES decryption a0,0a0,1a0,2a0,3a1,0a1,1a1,2a1,3a2,0a2,1a2,2a2,3a3,0a3,1a3,2a3,3 a'0,0a'0,1a'0,2a'0,3a'1,0a'1,1a'1,2a'1,3a'2,0a'2,1a'2,2a'2,3a'3,0a'3,1a'3,2a'3,3 k0,0k0,1k0,2k0,3k1,0k1,1k1,2k1,3k2,0k2,1k2,2k2,3k3,0k3,1k3,2k3,3 int Decrypt (word8 a[4][MAXBC], word8 rk[MAXROUNDS+1][4][MAXBC]) { int r; AddRoundKey(a, rk[ROUNDS]); InvSubBytes(a); InvShiftRows(a); for ( r=ROUNDS-1; r > 0; r--) { AddRoundKey(a,rk[r]); InvMixColumns(a); InvSubBytes (a); InvShiftRows(a); } AddRoundKey(a,rk[0]); }

PAGE 38

25 CHAPTER 5 IMPLEMENTATION 5.1 Structure Any embedded system has a set of specific tasks to perform. The objective here is to study the implementation issues of including the cryptographic modules into the embedded system. An embedded system possesses various functions. At some point of its execution, it may need to use the cryptographic module, either for encryption or decryption of the data. Figure 5.1.1 presents the cryptographic m odule that was developed and analyzed. The inputs from other modules were taken in to the cryptographic module in fixed block sizes. The encryption or decryption was performed on the data and then given back to the host function through the output buffer. Figure 5.1.1: Encryption and decryption modules 5.2 Development Process The coding of the program was performed in the C language, which is a highlevel language defined at higher abstract levels and is programmer-friendly. The highlevel language needs to be compiled into a low-level language before execution. A lowInput buffer Output buffer Encryption Decryption SubBytes ShiftRows MixColumns InvMixColumns InvShiftRows Key Scheduling

PAGE 39

26 C Source Code Assembly Code Libraries Assembler Linker Debugger Compiler Hardware Simulator and Profiler Optimization level language is defined at the register level in order to achieve optimum performance in terms of processing speed, low-memory requirements or both. Figure 5.2.1 presents the development process of the cryptographic system. Figure 5.2.1: Block diagram of the system development process The AES algorithm was implemented and compiled in C. Then the code was linked using the CodeWarrior Integrated Development Environment. 5.2.1 High-level Synthesis The AES was implemented in C rather than assembly due to the high-level nature of the C language. High-level programming gives the programmer a higher level of flexibility in terms of defining the objective of the code. A compiler that generated the machine code compiled the high-level language. High-level synthesis helped in realizing the project objectives in a lesser amount of time. The main advantage of using a highlevel language is code portability. 5.2.2 Low-level Synthesis Low-level programming involves coding in machine-level instructions. This requires a thorough knowledge at the register levels of the hardware system. Since low-

PAGE 40

27 level synthesis does not support the portability feature, it was not used in this implementation. In terms of very high levels of optimizations, it is advisable to code the program in low-level architecture-specific instructions. 5.2.3 Portability Portability is the ability of the code to be transferred to a different system or environment with minimal amounts of modification and redevelopment. Since the competition for quick release of products is so tremendous, rapid prototyping and timely marketing defines the success of a product. Therefore, portability aspects were strictly adhered to while developing the system. 5.2.4 Modularization Modularization is the technique of splitting a large program into smaller modules. The advantage of modularization is the ease of maintenance and code debugging. Modularization helps in code-reuse, which reduces run-time memory. A cryptographic system is developed as a separate module with sub-modules implementing the details. In the C language, modularization is achieved by dividing the code into various functions. When the embedded system needs to encrypt or decrypt data it invokes the corresponding module, which executes its tasks and then returns the output to the host function. The main program was divided into different modules termed encrypt() decrypt() and KeyExpansion() functions. 5.2.5 Compiler Exploitation The coding of the program in a high-level language should be such that the compiler would be able to optimize the code to the highest levels possible. Care must be taken to consider the abilities and the limitations of the compiler when the program is being optimized. The StarCore architecture has f our ALUs, which can perform four operations in parallel. Thus, the algorithm coding was modified to take advantage of the four ALUs. 5.3 Optimizations The AES algorithm was implemented in C for generalized key lengths of 128, 196 and 256-bits. The length of the key can be changed by the input parameter

PAGE 41

28 specifications. For optimization purposes, the key length was fixed for 128-bits and the various optimization techniques were applied and analyzed. The main aspects that were considered for optimization were: Exploring parallelism in the algorithm Multi sample processing and split summation Speed optimization at the expense of increasing the code size Function call; argument passing increases overhead Compiler may use inline functions Task priority Interrupt service management Time-sliced multi-tasking I/O queues management Interrupt disable while generating the key 5.3.1 Structural Partitioning Input and output buffers were kept aside so that the core could be processed without any interrupts. The external tasks must not be allowed to enter the critical path. 5.3.2 Critical Paths In many cases, programs have a high-cost critical path that needs to be optimized. It makes sense to optimize the critical paths to a higher extent than the less critical paths. The MixColumns() function takes much longer than other sub-modules. This is due to the mul() function that needs to be called numerous times. The optimization for the MixColumns() function is presented in Code Sample 5.1. 5.3.3 Computational Complexity Many programs need to perform highly complex sets of arithmetic functions. Such complex functions can be made simpler by exploring other alternatives such as look-up tables and bit-manipulation. The SubBytes() can be implemented by using the formula but it consumes lot of processor cycles. So, SubBytes() was implemented by using a look-up table.

PAGE 42

29 Initial code: word8 b[4][MAXBC]; int i, j; for(j=0; j
PAGE 43

30 5.3.5 Parallel Tasks Hardware devices typically have a high-level of parallelism when compared to software devices. Design of an embedded de vice should include consideration of such parallelism found in the hardware. ShiftRows() operates on each individual row at a time. So parallel implementation of four subsequent ShiftRows() was feasible. 5.3.6 Instruction-level Parallelism In the MixColumns() function, each byte of the column is a function of four bytes of that column. A matrix multiplication was performed to get the result. The single instruction was broken into different in structions capable of being executed in parallel. The result was the XOR of the individual multiplications. Different register sets were used to perform individual instruction, which resulted in instruction-level parallelism that made the code efficient for multi-processors. 5.3.7 Recursive Tasks Some tasks in a program need to be executed a finite number of times. Such tasks are called as recursive tasks. Recursive tasks have an overhead that needs to be checked when the instruction sequence should jump out of the loop. Loop Unrolling: For a small number of repetitions, the overhead could be removed altogether by replacing the loop with the code components for that fixed number of times. This technique is called loop unrolling. Code Sample 5.2 presents the loop unrolling for AddRoundKey() and Code Sample 5.3 presents loop unrolling for SubBytes()

PAGE 44

31 Initial code: int i, j; for (i=0; i<4; i++) for(j=0; j<4; j++) a[i][j] ^= rk[i][j]; Modified code: int i; for (i=0; i<4; i++) { a[i][0] ^= rk[i][0]; a[i][1] ^= rk[i][1]; a[i][2] ^= rk[i][2]; a[i][3] ^= rk[i][3]; } Code Sample 5.2: Loop unrolling for AddRoundKey( ) Code Sample 5.3: Loop unrolling for SubBytes( ) Loop Merging: When two loops are being executed with similar tasks that can be sequentially adjusted, it is better to combine the two loops into a single loop. This technique is called loop merging. This reduces the total overhead time of executing multiple loops to the overhead of a single loop. Code Sample 5.4 presents the loop unrolling and merging with constants substitution for ShiftRows( ). 5.3.8 Pipelining Tasks Two pipelining tasks were considered in order to reduce code execution time. Multi-sample Processing: Sometimes, different samples can be executed simultaneously when there is no inter-dependency among them. This helps in conserving valuable clock cycles. Initial code: int i, j; for (i=0; i<4; i++) for(j=0; j< BC; j++) a[i][j] = box[a[i][j]]; Modified code: int i; for (i=0; i<4; i++) { a[i][0] = box[a[i][0]]; a[i][1] = box[a[i][1]]; a[i][2] = box[a[i][2]]; a[i][3] = box[a[i][3]]; }

PAGE 45

32 Code Sample 5.4: Loop unrolling and merging Split Summation: A complex equation can be made simple by dividing it into smaller components so that they can be executed in parallel. Another advantage is that by dividing into smaller components, different registers can be used. This minimizes the number of memory transfers, which consume more cycles than simple register calls. Split summation was performed on MixColumns() and InvMixColumns() Code Sample 5.5 presents the modifications in InvMixColumns() 5.3.9 Conditional Tasks Some tasks use conditional statements like if-then-else, which consume a lot of cycles. A better way is to remove the conditional statements as much as possible. Code Sample 5.6 presents the removal of the if-then-else statement from ShiftRows() Initial code: int i, j; for (i=0; i<4; i++) { for(j=0; j< BC; j++) tmp[j] = a[i][(j + shifts[BC-4][i]) % BC]; for(j=0; j< BC; j++) a[i][j] = tmp[j]; } Modified code: int i; for (i=1; i<4; i++) { tmp[0] = a[i][(0 + i) % BC]; tmp[1] = a[i][(1 + i) % BC]; tmp[2] = a[i][(2 + i) % BC]; tmp[3] = a[i][(3 + i) % BC]; a[i][0] = tmp[0]; a[i][1] = tmp[1]; a[i][2] = tmp[2]; a[i][3] = tmp[3]; }

PAGE 46

33 Code Sample 5.5: Modification of InvMixColumns( ) Initial code: word8 b[4][MAXBC]; int i, j; for(j=0; j
PAGE 47

34 Code Sample 5.6: Removal of If-Then-Else Conditions from ShiftRows( ) 5.4 Critical Issues 5.4.1 Interrupt Service Management The cryptographic related modules should be given the highest priority. If the case arises to perform some other critical task, then an interrupt routine should be programmed to check whether any cryptographic module is running at that time. If so, then all cryptic data should be deleted until completion of the interrupt routine. Then the Initial code: ShiftRows(a,0); // for ShiftRows ShiftRows(a,1); // for InvShiftRows void ShiftRows(word8 a[4][MAXBC], word8 d) { word8 tmp[MAXBC]; int i, j; if ( d==0) // for ShiftRows-encryption { ... ... } else // for ShiftRows-decryption { ... ... } } Modified code: ShiftRows(a); // for ShiftRows InvShiftRows(a); // for InvShiftRows void ShiftRows(word8 a[4][MAXBC]) // for ShiftRows-encryption { ... ... } void InvShiftRows(word8 a[4][MAXBC]) // for InvShiftRows { ... ... }

PAGE 48

35 cryptographic module should be executed again. Under no circumstances should the cryptographic data be sent to the stacks in order to perform interrupt routines. 5.4.2 Time-sliced Multi-tasking Time-sliced multitasking of a cryptographic module with other applications also presents vulnerability to attacks. Time slicing could help the attacker to read the data of the registers in order to obtain crucial information, which could lead to knowledge of the key. 5.4.3 I/O Queues Management In order to run the cryptographic modul es efficiently, the input and output modules should be structurally separated. When the embedded device has multiprocessor capability, separate processing should be catered for I/O data management.

PAGE 49

36 CHAPTER 6 ATTACKS AND COUNTER MEASURES A deliberate Cryptanalysis to break a cryptosystem is called an attack. While developing a system, cryptography should not be thought of as the final task. Care should be taken to consider the cryptographic issues from the conception to the completion of the system. An insecure system is no different than a secure system when the system functionalities are considered. Security is the foremost priority in a cryptosystem. The evaluation of security cannot be made by the system functionality. Security is different than functionality. Any designer can design a system with specified functionalities. For a cryptographic system, functionality is necessary but not sufficient. A cryptographic system should be designed with a view to the kind of attacks it might face and the countermeasures to defeat the attacks. A good algorithm is only the starting point. 6.1 Implementation Attacks Attacks that rely on the flaws in implementation procedure are termed Implementation Attacks. In the process of code optimization for optimum speed and memory, programmers often neglect the transfer of variables and the deletion of trace elements. Care must be taken to delete the round keys and the main key when the encryption or decryption process is complete. It is also to the best advantage if only the specific authorized modules have access to the cryptographic modules. If any attack or discrepancy is detected in the run-time environment, a specific data log must be created or appended and key scheduling must be executed again. Key scheduling takes care of deleting the prior keys and generating or obtaining a new set of keys.

PAGE 50

37 6.2 Side-channel Cryptanalysis Attacks based on implementation flaws rather than algorithms are called as sidechannel attacks [K01]. These attacks rely upon the leaking of side-channel information such as execution time and power consumption. 6.2.1 Timing Attacks Timing attacks exploit the execution times of the application. The execution time of the algorithm is measured in order to obtain information about the key. In the MixColumns() of AES, there are different multiplication sequences that might take different times to execute depending upon the key. Such an attack is very powerful and will often compromise the security of the whole system. 6.2.2 Power Attacks Power analysis attacks explore the power intake of the system and can obtain vital information of the inner working [GLIPV03]. The power consumption of the device is measured to obtain information about the key. Simple Power Analysis: This deals with analyzing the recorded power data and the data sets. Differential Power Analysis: This deals with the statistical analysis of the power data by comparing different plain texts and ciphers. Since power analysis attacks are non-invasive, they are virtually non-detectable. This poses a considerable threat to the security of the system since the damage caused cannot be assessed. 6.2.3 Probing Attacks A probing attack is a direct physical a ttack where probes are inserted onto the hardware to examine the memory content and the data transfers on the buses. This can be avoided by physical shielding of the device. Optical probing is the newest technique for probing attacks. Apart from physical shieldi ng, other methods of attack detection must be incorporated in case the attacker breaks the physical shield.

PAGE 51

38 6.2.4 Fault Induction Attacks Faults or errors are introduced into the device by crude means like exposure to radiation [GLIPV02]. Errors are introduced into the system and the outcomes are analyzed. With a statistical induction of errors, relevant information may be extracted 6.3 Counter Measures 6.3.1 Constant-time Implementation Encryption time should be made independent of the value of the key. This can be accomplished by careful implementation of instructions by making them timeindependent. One way of achieving this is to use a look-up table that should take a fixed amount of time for every execution. 6.3.2 Power Attacks Preventing Power attacks can be accomplished in the following ways: Physical shielding of the device so that the leaking signal size is reduced. Adding noise to the power measurements. This ensures that the attacker will require more samples for analysis. Temporal obfuscation of the instructions. This is achieved by randomizing the execution of the instructions. This model controls the power attacks if implemented effectively. If the randomization is not spread properly, it might even aid the attacker in obtaining the relevant information. Temporal obfuscation can also be achieved by randomized clock signals. 6.3.3 Probing Attacks The system must have the hardware architecture designed in such a way that optical probing shouldn’t reveal the state of a b it. A bit should be made as ‘HL’ or ‘LH’ instead of a single ‘H’ or ‘L’. 6.3.4 Random Number and Unique Key Generators Random number generation forms a formidable task in cryptosystems. To state the truth, no finite machine could ever produce a true random number. Any number

PAGE 52

39 generated, which seems to be a random number is called as pseudo-random number. One way of generating pseudo-random numbers is through the use of an algorithm utilizing the linear feedback shift registers. Anothe r concept of generating random numbers is to use a conventional cryptographic algorithm. As input to the cryptographic algorithm, a user-generated number is used and this number is called the seed to the pseudo-random number generator. The seed should be carefully selected to prevent any malicious user from guessing it or reproducing it by any other means [KSWH98]. The keys for cryptosystems should be based on random numbers and care must be taken that such numbers pass the random number tests. The unique key generator for the cryptosystem should carefully perform the task of generating the session keys. The longer the session key is in use, the more vulnerable is the cryptosystem to the attacks. The registers that hold the keys should be volatile and the key should be deleted when it is no longer needed.

PAGE 53

40 CHAPTER 7 RESULTS AND DISCUSSION 7.1 Results The Advanced Encryption Standard (AES) was implemented in C and various optimization techniques were applied without compromising for the security issues. The initial program is called by the name ‘opt-a’ and different code optimizations were performed at different stages and the different stages of the code are named as ‘opt-c’, ‘opt-d’...‘opt-h’. The suffix to the code name indicates the compiler optimization level. The suffix ‘0’ indicates that no compiler optimizations have been performed. For instance, ‘LEVEL 3’ indicates that compiler optimizations of scheduling, pipelining and bundling are performed on the code. The suffix ‘space’ indicates that space optimizations were performed rather than speed optimizations. The objective was to optimize the code in terms of speed without degrading the space parameter. The compiler option of space optimization was considered to show how the space optimization techniques affect the speed optimization process. The original unoptimized code was conceived into ‘Opt a’. The program was compiled using the CodeWarrior IDE. The profiler of the CodeWarrior aided in obtaining the statistical information about the program execution. Table 7.1.1 presents the different stages of the code development and the corresponding execution time in clock cycles.

PAGE 54

41 Table 7.1.1: Execution time in clock cycles at various stages of code development Stages main( ) encrypt( ) decrypt( ) LEVEL 0 453897112706153576 LEVEL 3 2176173305752721 LEVEL 3 Space 2657664301966184 Opt-a 0 630850111987154316 Opt-a 3 2310683804054457 Opt-c 3 2150323423850244 Opt-d 0 599351106537143926 Opt-d 3 2150323423850244 Opt-d 3 Space 2607804273863989 Opt-d Space 2717284469367367 Opt-e 0 55799196577133206 Opt-e 3 2110023280149684 Opt-e 3 Space 2587344214663558 Opt-e Space 2684714392666504 Opt-f 0 53611191134127709 Opt-f 3 2072033176148802 Opt-f 3 Space 2540344106962291 Opt-f Space 2655024318965758 Opt-g 0 51659186264122819 Opt-g 3 1942682867645432 Opt-g 3 Space 2426333822359405 Opt-g Space 2520663981362432 Opt-h 0 52346485133127651 Opt-h 3 1979962912546882 Opt-h 3 Space 2467973796661774 Opt-h Space 2532633965663186

PAGE 55

42 Table 7.1.2 presents the profiling information for the ‘opt-g 3’ stage. The table depicts the function (F) and descendent (D) time in clock cycles of various functions. Table 7.1.2: Profiling information for ‘opt-g 3’ stage The CodeWarrior profiler also displays a graphical representation of the function and its descendents. Figure 7.1.1 presents the graphical display of the encrypt() of the ‘opt-g 3’ stage. It can be deduced that the encrypt() is consuming 57352 cycles per two calls, which means the average execution time for encrypt() is 28676 clock cycles. The main metrics for this thesis is the execution time defined in terms of the number of clock cycles taken by the hardware to execute a particular function

PAGE 56

43 0 100000 200000 300000 400000 500000 600000 700000LEV EL 0 LEV EL 3 LEV EL 3 S pa c e Opt-a 0 Opt-a 3 Opt c 3 Opt d 0 O ptd 3 Opt-d 3 S pace Op t -d Spac eClock Cycles main( ) encrypt( ) decrypt( ) Figure 7.1.1: Graphical profile for encrypt() of ‘opt-g 3’ Stage Figure 7.1.2: Functions main() encrypt() and decrypt() from stages Level 0 to Opt-d Space stages

PAGE 57

44 0 100000 200000 300000 400000 500000 600000O p t e 0 O p t e 3 O p t e 3 S p a c e O p t e S p a c e O p t f 0 O p t f 3 O p t f 3 S p a c e O p t f S p a c e O p t g 0 O p t g 3 O p t g 3 Sp a c e O p t g S p a c e O p t h 0 O p t h 3 O p t h 3 S p a c e O p t h S p a c eClock Cycles main( ) encrypt( ) decrypt( ) 0 1000 2000 3000 4000 5000 6000 7000 8000 9000LEV EL 0 LEVEL 3 LEVEL 3 S pace Opt-a 0 Op ta 3 Opt-c 3 Op td 0 O ptd 3 O pt -d 3 Spa c e Op td Spac eClock Cycles SubBytes( ) ShiftRows( ) MixColumns AddRoundKey( ) 0 1000 2000 3000 4000 5000 6000 7000O p t e 0 O p t e 3 O p t e 3 S p a c e O p t e S p a c e O p t f 0 O p t f 3 O p t f 3 S p a c e O p t f S p a c e O p t g 0 O p t g 3 O p t g 3 Sp a c e O p t g S p a c e O p t h 0 O p t h 3 O p t h 3 S p a c e O p t h S p a c eClock Cycles SubBytes( ) ShiftRows( ) MixColumns AddRoundKey( ) Figure 7.1.3: Functions main() encrypt() and decrypt() from stages Opt-e to Opt-h Space stages Figure 7.1.4: Cryptographic modules from Level 0 to Opt-d Space stages Figure 7.1.5: Cryptographic modules from Opt-e 0 to Opt-h Space stages

PAGE 58

45 SubBytes( ) 0 500 1000 1500 2000 2500L E V E L 0 L E V E L 3 L E V E L 3 S p a c e O p t a 0 O p t a 3 O p t c 3 O p t d 0 O p t d 3 O p t d 3 S p a c e O p t d S p a c e O p t e 0 O p t e 3 O p t e 3 S p a c e O p t e S p a c e O p t f 0 O p t f 3 O p t f 3 S p a c e O p t f S p a c e O p t g 0 O p t g 3 O p t g 3 S p a c e O p t g S pa c e O p t h 0 O p t h 3 O p t h 3 S p a c e O p t h S p a c eClock Cycles SubBytes( ) The encrypt() function is divided into four sub-functions: SubBytes() ShiftRows() MixColumns() and AddRoundKey() The function and descendent times of various functions are depicted in Figure 7.1.2 through Figure 7.1.5. Observations from the various stages of the code optimizations revealed that the MixColumns() function was consuming more time than other sub-modules combined in the encrypt() function. This was due to the mul() function in the MixColumns() function, which was used to perform the Galois Field (GF) multiplication on the data operands. GF multiplication was performed by implementing a look-up table to defeat any timing attacks. Figure 7.1.2 reveals that decrypt() takes more time than encrypt() This is due to the added complexity of the GF multiplication in InvMixColumns() of decrypt() The InvMixColumns() needs to perform four multiplications while the MixColumns() needs to perform only two multiplications per each byte of the state Figure 7.1.6: SubBytes() module at all stages The SubBytes() initial function consumed 927 cycles without compiler optimization and 93 cycles with compiler-optimization. After optimization techniques were applied, the function required 684 clock cycles without compiler optimization and 49 cycles with compiler optimization. Thus SubBytes() showed a performance gain of 26% withoutcompiler optimization and 47% with-compiler optimization in terms of execution speed. Figure 7.1.6 presents the execution time of SubBytes() at all stages of code optimization. The ShiftRows() initial function consumed 2974 cycles without compiler optimization and 1354 cycles with compiler-optimization. After optimization techniques

PAGE 59

46 ShiftRows( ) 0 1000 2000 3000 4000L E V E L 0 LE V E L 3 LE V E L 3 S p ac e O pt a 0 O pt a 3 O pt c 3 O pt d 0 O pt d 3 O pt d 3 S pa ce O p t d S pa c e O pt e 0 O pt e 3 O pt e 3 S pa ce O pt e S pac e O p t f 0 O p t f 3 O p t f 3 S pa c e O p t f S p ac e O pt g 0 O pt g 3 O pt g 3 S pa ce O pt g S pac e O pt h 0 O pt h 3 O pt h 3 S pa ce O pt h S pa c eClock Cycles ShiftRows( ) AddRoundKey( ) 0 200 400 600 800 1000 1200 1400L E V E L 0 L E V E L 3 L E V E L 3 S p a c e O p t a 0 O p t a 3 O p t c 3 O p t d 0 O p t d 3 O p t d 3 S p a c e O p t d S p a c e O p t e 0 O p t e 3 O p t e 3 S p a c e O p t e S p a c e O p t f 0 O p t f 3 O p t f 3 S p a c e O p t f S p a c e O p t g 0 O p t g 3 O p t g 3 S p a c e O p t g S pa c e O p t h 0 O p t h 3 O p t h 3 S p a c e O p t h S p a c eClock Cycles AddRoundKey( ) were applied, the function required 1453 clock cycles without compiler optimization and 945 cycles with compiler optimization. Thus ShiftRows() displayed a performance gain of 51% without-compiler optimization and 30% with-compiler optimization in terms of execution speed. Figure 7.1.7 presents the execution time of ShiftRows() at all stages of code optimization. Figure 7.1.7: ShiftRows() module at all stages The AddRoundKey() initial function consumed 975 cycles without compiler optimization and 83 cycles with compiler-optimization. After optimization techniques were applied, the function required 705 clock cycles without compiler optimization and 52 cycles with compiler optimization. Thus AddRoundKey() displayed a performance gain of 27% without-compiler optimization and 37% with-compiler optimization in terms of execution speed. Figure 7.1.8 presents the execution time of AddRoundKey() at all stages of code optimization. Figure 7.1.8: AddRoundKey() module at all stages

PAGE 60

47 decrypt( ) 0 50000 100000 150000 200000L E V E L 0 L E V E L 3 L E V E L 3 S p a c e O p t a 0 O p t a 3 O p t c 3 O p t d 0 O p t d 3 O p t d 3 S p a c e O p t d S p a c e O p t e 0 O p t e 3 O p t e 3 S p a c e O p t e S p a c e O p t f 0 O p t f 3 O p t f 3 S p a c e O p t f S p a c e O p t g 0 O p t g 3 O p t g 3 S p a c e O p t g S pa c e O p t h 0 O p t h 3 O p t h 3 S p a c e O p t h S p a c eClock Cycles decrypt( ) MixColumns 0 2000 4000 6000 8000 10000L E V E L 0 L E V E L 3 L E V E L 3 S p a c e O p t a 0 O p t a 3 O p t c 3 O p t d 0 O p t d 3 O p t d 3 S p a c e O p t d S p a c e O p t e 0 O p t e 3 O p t e 3 S p a c e O p t e S p a c e O p t f 0 O p t f 3 O p t f 3 S p a c e O p t f S p a c e O p t g 0 O p t g 3 O p t g 3 S p a c e O p t g S pa c e O p t h 0 O p t h 3 O p t h 3 S p a c e O p t h S p a c eClock Cycles MixColumns Figure 7.1.9 presents execution times of MixColumns() at all stages of code optimization. Figure 7.1.9: MixColumns() module at all stages Figure 7.1.10 presents the execution time of encrypt() at all stages of code optimization. Figure 7.1.10: E ncrypt() module at all stages Figure 7.1.11 shows the execution time of decrypt() at all stages of code optimization. Figure 7.1.11: D ecrypt() module at all stages encrypt( ) 0 20000 40000 60000 80000 100000 120000L E V E L 0 L E V E L 3 L E V E L 3 S p a c e O p t a 0 O p t a 3 O p t c 3 O p t d 0 O p t d 3 O p t d 3 S p a c e O p t d S p a c e O p t e 0 O p t e 3 O p t e 3 S p a c e O p t e S p a c e O p t f 0 O p t f 3 O p t f 3 S p a c e O p t f S p a c e O p t g 0 O p t g 3 O p t g 3 S p a c e O p t g S pa c e O p t h 0 O p t h 3 O p t h 3 S p a c e O p t h S p a c eClock Cycles encrypt( )

PAGE 61

48 main( ) 170000 180000 190000 200000 210000 220000 230000 240000L E VEL 3 Op t -a 3 Opt-c 3 Opt-d 3 Opt-e 3 Opt-f 3 Op t g 3 Op t h 3Clock Cycles main( ) encrypt( ) 0 5000 10000 15000 20000 25000 30000 35000 40000LE V EL 3 Opt-a 3 Opt-c 3 Op t -d 3 Op t -e 3 Op t -f 3 O pt -g 3 Opth 3Clock Cycles encrypt( ) The stages that were optimized by the compiler are illustrated in Figure 7.1.12 through Figure 7.1.18. Figure 7.1.12: M ain() function for compiler optimization Figure 7.1.13: E ncrypt() function for compiler optimization

PAGE 62

49 decrypt( ) 40000 42000 44000 46000 48000 50000 52000 54000 56000LE V EL 3 Opt-a 3 Opt-c 3 Op t -d 3 Op t -e 3 Op t -f 3 O pt -g 3 Opth 3Clock Cycles decrypt( ) SubBytes( ) 0 20 40 60 80 100 120 140LE V EL 3 Opt-a 3 Opt-c 3 Op t -d 3 Op t -e 3 Op t -f 3 Op t -g 3 Opth 3Clock Cycles SubBytes( ) Figure 7.1.14: D ecrypt() function for compiler optimization Figure 7.1.15: SubBytes() function for compiler optimization

PAGE 63

50 MixColumns 0 500 1000 1500 2000 2500LE V EL 3 Opt-a 3 Opt-c 3 Op t -d 3 Op t -e 3 Op t -f 3 Op t -g 3 Opth 3Clock Cycles MixColumns ShiftRows( ) 0 500 1000 1500 2000L EVE L 3 Op t a 3 Op t c 3 Op t -d 3 Op t -e 3 Op t -f 3 Opt-g 3 Opt-h 3Clock Cycles ShiftRows( ) Figure 7.1.16: ShiftRows() function for compiler optimization Figure 7.1.17: MixColumns() function for compiler optimization

PAGE 64

51 AddRoundKey( ) 0 20 40 60 80 100 120LEVEL 3 Opt-a 3 Opt-c 3 Optd 3 Opte 3 Op t f 3 Op t g 3 O p t h 3Clock Cycles AddRoundKey( ) Percentage Speed Performance0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00%main( ) encrypt( ) decr y pt( ) Su b Bytes( ) S hi ft Ro ws( ) M ixCo l u m n s Add Rou ndK ey ( )Clock Cycles 0 3 Space 3-Space Figure 7.1.18: AddRoundKey() function for compiler optimization Figure 7.1.19: Speed performance of various modules Figure 7.1.19 shows the speed performance gain ed for the AES for different compiler options. This shows that a 20% performa nce gain was obtained by optimizations for

PAGE 65

52 encrypt() and an 11% performance gain by optimizations for decrypt() ShiftRows() was optimized better than all other sub-functions. MixColumns() was the critical path in the code and it was the l east optimized function. This was due to the implementation of a look-up table for the GF multiplication. The MixColumns() data for ‘Space’ and ‘3-Space’ was not obtained due to the compiler’s option of inline functioning for better optimization. Table 7.1.3 presents the speed performance gain in percentage for each function. Table 7.1.3: Speed performance of various modules 7.2 Discussion This research started with an idea of implementing a cryptographic algorithm on a DSP chip. The idea expanded from the initial concept of a data encryption standard (DES) [SASR01] on the DSP 56824 to the present form of an AES on the StarCore. The AES was implemented on the StarCore using the CodeWarrior IDE. The initial inclination was to put the code into the DSP memory. Since the idea of a secure and reliable system was maintained and thrived from the beginning of this endeavor, care was taken about certain situations where the syst em might be attacked. Different ways of side-channel attacks were studied and the system was made robust to such attacks by taking care of details such as constant-time implementation and key deletion after the session. Even though the system was made strong, it cannot be denied that successful attacks can be made on it. In the context of smart attacks, the following security issues should be kept in mind while implementing a cryptographic system. Formulate the system to detect various types of attacks. Percentage Speed Performance ( in % ) 0 3 Space 3-Space main( ) 12.661539.6562376.9587397.235912 encrypt( ) 20.0906716.2451110.5643710.91894 decrypt( ) 11.307899.5753687.1637317.325545 SubBytes( ) 26.2135947.3118321.3675222.68908 ShiftRows( ) 51.1432430.2067928.4702529.64213 MixColumns 1.4055263.65209NA NA AddRoundKey( ) 27.6923137.4863422.330123.80952

PAGE 66

53 Turn off the power to the internal RAM when system tampering is detected. The keys should always be stored in volatile memory. Change the keys for every session. Metal shield the device to prevent the leakage of high-frequency pulses. Implement constant-time algorithms. Use double sized blocks with complement data to defeat power attacks. 7.2.1 8-bit Platforms The use of AES is imminent on the 8-bit platforms like smart cards. Smart cards are very vulnerable to side-channel cryptanalysis. This is due to their inherent weaknesses such as dependence on the external clock and their susceptibility to optical probing. In order to reduce the memory size of the 8-bit platforms, key expansion should be performed every time encryption or decryption is performed. This is due to the fact that these devices are generally asynchronous in nature. Since the encryption or decryption is performed only on a relatively small amount of data, the key should be generated and expanded each time the cryptographic module is called. Look-up tables consume a lot of memory. Therefore, they should be generated by suitable algorithms before encryption or decryption is performed. This might increase the code size but it definitely improves the overall system performance by decreasing the dormant memory occupation. 7.2.2 32/64-bit Platforms The devices with larger data-bus lengths and processing word lengths are generally synchronous in nature. If the data to be encrypted or decrypted is continuous, the key scheduling should be performed outside the encryption and decryption modules. This saves the repetitive task of key scheduling by moving it out of the continuous routines. 7.3.3 Optimization There are many optimization metrics concerned with embedded systems such as: Production cost Execution speed

PAGE 67

54 Memory size Data throughput Power consumption Robustness When dealing with cryptosystems, the added issues of security affect all the above metrics. The security issues become the top priority in the metrics to evaluate the system. The difference between a poorly de signed and perfectly designed cryptographic system cannot be assessed until an attack is made on both of them. The only way to make the systems more robust is to learn from the previous attacks and explore the weaknesses in the present systems by performing new kinds of attacks. Thus, the ideal way of summarizing this concept is ‘the real security of a system cannot be assessed until it is broken’.

PAGE 68

55 CHAPTER 8 CONCLUSION AND RECOMMENDATIO NS FOR FUTURE WORK 8.1 Conclusion The AES was implemented on the StarCore using the CodeWarrior IDE. Additionally, various speed optimization tec hniques were applied. A study of different side-channel attacks was performed and proposals were made to counter such attacks. The system’s security was given greater importance than speed optimization. The programming was divided into different stages and the results were observed at each stage. Each sub-module of encrypt() was optimized and the clock cycles were observed. Relevant observations of this research are: The compiler was able to effectively optimize a simple code. Code optimization techniques boosted the compiler’s ability to further optimize the code. In-line functions are better suited for smaller functions. Function call parameters should be replaced by memory pointers. Deletion of the round keys and the main key at the termination of the program improves security. Critical paths should be optimized more than the less critical paths. Probing attacks should be defeated by physical and radiation shielding. Constant-time implementations should be considered to avoid timing and power attacks. 8.2 Recommendations for Future Work The study and analysis of cryptosystems is an ongoing effort in which new attacks are constantly discovered. Future study to this research should be in the following topics: An extensive study must be made of side-channel attacks. Architectural advantages must be explored with a consideration for portability.

PAGE 69

56 Experiments should be conducted to study the effects of various types of attacks. New methods such as algorithm switching must be implemented. Key scheduling must be made more robust. Attack detection must be incorporated into the system. Interrupt priorities should be assigned appropriately with the highest concern for the security of the system.

PAGE 70

57 REFERENCES [BS96] Bruce Schneier: Applied Cryptography, Protocols, Algorithms and Source Code in C, John Wiley & Sons, Inc, 1996. [DR01] J. Daemen and V. Rijmen: AES Proposal Rijndael. National Institute of Standards and Technology, July 2001. [G99] B. Gladman: Input and Output Block Conventions for AES Encryption Algorithms, AES Round 2 public comment, June 6, 1999. [GLIPV02] Guido Bertoni, Luca Breveglieri, Israel Koren, Paolo Maistri, Vincenzo Piuri: On the Propagation of Faults and Their Detection in a Hardware Implementation of the Advanced Encryption Standard, The IEEE International Conference on Application-Specific Systems, Architectures, and Processors (ASAP'02) July 17 19, 2002 San Jose, California. [GLIPV03] Guido Bertoni, Luca Breveglieri, Israel Koren, Paolo Maistri, Vincenzo Piuri: Error Analysis and Detection Procedures for a Hardware Implementation of the Advanced Encryption Standard, IEEE Transactions on Computers 52(4), 492-505 (2003). [K01] F. Koeune: Careful design and integration of cryptographic primitives with contributions to timing attack, padding schemes and random number generators, Ph.D. thesis, UCL, July 2001. [KSWH98] J. Kelsey, B. Schneier, D. Wagner, and C. Hall: Cryptanalytic Attacks on Pseudorandom Number Generators, Fast Software Encryption, Fifth International Workshop Proceedings (March 1998), Springer-Verlag, 1998, pp. 168-188. [RL00] Rainer Leupers: Code Optimization Techniques for Embedded Processors –Methods, Algorithms, and Tools, Kluwer Academic Publishers, 2000. [SASR01] C. Sanchez-Avila and R. Sanchez-Reillo: The Rijndael Block Cipher (AES Proposal): A Comparison with DES, 35th IEEE International Conference on Security Technology. London (Reino Unido), 19-16 Octubre, 2001. pps. 229-234.

PAGE 71

58 [SJBW96] Sanjaya Kumar, James H. Aylor, Barry W. Johnson, Wm. A. Wulf: The Codesign of Embedded Systems-A Unified Hardware/Software Representation, Kluwer Academic Publishers, 1996.

PAGE 72

59 BIBLIOGRAPHY Krishnendu Chakrabarty, Vikram Iyengar, Anshuman Chandra: Test Resource Partitioning for System-on-a-chip, Kluwer Academic Publishers, 2002. Juan Carlos Lopez, Roman Hermida and Walter Geisselhardt: Advanced Techniques for Embedded Systems Design and Test, Kluwer Academic Publishers, 1998. Keith Tizzard: C for Professional Programmers, Ellis Horwood Limited, 1986.

PAGE 73

60 APPENDICES

PAGE 74

61 APPENDIX A Optimized C Code for the AES /* Code Optimization techniques for encryption and decryption */ #include typedef unsigned char word8; typedef unsigned int word32; /* The tables Logtable and Alogtable are used to perform multiplications in GF(256) */ word8 Logtable[256] = { 0, 0, 25, 1, 50, 2, 26, 198, 75, 199, 27, 104, 51, 238, 223, 3, 100, 4, 224, 14, 52, 141, 129, 239, 76, 113, 8, 200, 248, 105, 28, 193, 125, 194, 29, 181, 249, 185, 39, 106, 77, 228, 166, 114, 154, 201, 9, 120, 101, 47, 138, 5, 33, 15, 225, 36, 18, 240, 130, 69, 53, 147, 218, 142, 150, 143, 219, 189, 54, 208, 206, 148, 19, 92, 210, 241, 64, 70, 131, 56, 102, 221, 253, 48, 191, 6, 139, 98, 179, 37, 226, 152, 34, 136, 145, 16, 126, 110, 72, 195, 163, 182, 30, 66, 58, 107, 40, 84, 250, 133, 61, 186, 43, 121, 10, 21, 155, 159, 94, 202, 78, 212, 172, 229, 243, 115, 167, 87, 175, 88, 168, 80, 244, 234, 214, 116, 79, 174, 233, 213, 231, 230, 173, 232, 44, 215, 117, 122, 235, 22, 11, 245, 89, 203, 95, 176, 156, 169, 81, 160, 127, 12, 246, 111, 23, 196, 73, 236, 216, 67, 31, 45, 164, 118, 123, 183, 204, 187, 62, 90, 251, 96, 177, 134, 59, 82, 161, 108, 170, 85, 41, 157, 151, 178, 135, 144, 97, 190, 220, 252, 188, 149, 207, 205, 55, 63, 91, 209, 83, 57, 132, 60, 65, 162, 109, 71, 20, 42, 158, 93, 86, 242, 211, 171, 68, 17, 146, 217, 35, 32, 46, 137, 180, 124, 184, 38, 119, 153, 227, 165, 103, 74, 237, 222, 197, 49, 254, 24, 13, 99, 140, 128, 192, 247, 112, 7, }; word8 Alogtable[256] = { 1, 3, 5, 15, 17, 51, 85, 255, 26, 46, 114, 150, 161, 248, 19, 53, 95, 225, 56, 72, 216, 115, 149, 164, 247, 2, 6, 10, 30, 34, 102, 170, 229, 52, 92, 228, 55, 89, 235, 38, 106, 190, 217, 112, 144, 171, 230, 49, 83, 245, 4, 12, 20, 60, 68, 204, 79, 209, 104, 184, 211, 110, 178, 205, 76, 212, 103, 169, 224, 59, 77, 215, 98, 166, 241, 8, 24, 40, 120, 136, 131, 158, 185, 208, 107, 189, 220, 127, 129, 152, 179, 206, 73, 219, 118, 154, 181, 196, 87, 249, 16, 48, 80, 240, 11, 29, 39, 105, 187, 214, 97, 163, 254, 25, 43, 125, 135, 146, 173, 236, 47, 113, 147, 174, 233, 32, 96, 160, 251, 22, 58, 78, 210, 109, 183, 194, 93, 231, 50, 86, 250, 21, 63, 65, 195, 94, 226, 61, 71, 201, 64, 192, 91, 237, 44, 116, 156, 191, 218, 117, 159, 186, 213, 100, 172, 239, 42, 126, 130, 157, 188, 223, 122, 142, 137, 128, 155, 182, 193, 88, 232, 35, 101, 175, 234, 37, 111, 177, 200, 67, 197, 84, 252, 31, 33, 99, 165, 244, 7, 9, 27, 45, 119, 153, 176, 203, 70, 202, 69, 207, 74, 222, 121, 139, 134, 145, 168, 227, 62, 66, 198, 81, 243, 14, 18, 54, 90, 238, 41, 123, 141, 140, 143, 138, 133, 148, 167, 242, 13, 23, 57, 75, 221, 124, 132, 151, 162, 253, 28, 36, 108, 180, 199, 82, 246, 1, }; word8 S[256] = { 99, 124, 119, 123, 242, 107, 111, 197, 48, 1, 103, 43, 254, 215, 171, 118, 202, 130, 201, 125, 250, 89, 71, 240, 173, 212, 162, 175, 156, 164, 114, 192, 183, 253, 147, 38, 54, 63, 247, 204, 52, 165, 229, 241, 113, 216, 49, 21, 4, 199, 35, 195, 24, 150, 5, 154, 7, 18, 128, 226, 235, 39, 178, 117, 9, 131, 44, 26, 27, 110, 90, 160, 82, 59, 214, 179, 41, 227, 47, 132, 83, 209, 0, 237, 32, 252, 177, 91, 106, 203, 190, 57, 74, 76, 88, 207, 208, 239, 170, 251, 67, 77, 51, 133, 69, 249, 2, 127, 80, 60, 159, 168, 81, 163, 64, 143, 146, 157, 56, 245, 188, 182, 218, 33, 16, 255, 243, 210,

PAGE 75

62 APPENDIX A (Continued) 205, 12, 19, 236, 95, 151, 68, 23, 196, 167, 126, 61, 100, 93, 25, 115, 96, 129, 79, 220, 34, 42, 144, 136, 70, 238, 184, 20, 222, 94, 11, 219, 224, 50, 58, 10, 73, 6, 36, 92, 194, 211, 172, 98, 145, 149, 228, 121, 231, 200, 55, 109, 141, 213, 78, 169, 108, 86, 244, 234, 101, 122, 174, 8, 186, 120, 37, 46, 28, 166, 180, 198, 232, 221, 116, 31, 75, 189, 139, 138, 112, 62, 181, 102, 72, 3, 246, 14, 97, 53, 87, 185, 134, 193, 29, 158, 225, 248, 152, 17, 105, 217, 142, 148, 155, 30, 135, 233, 206, 85, 40, 223, 140, 161, 137, 13, 191, 230, 66, 104, 65, 153, 45, 15, 176, 84, 187, 22, }; word8 Si[256] = { 82, 9, 106, 213, 48, 54, 165, 56, 191, 64, 163, 158, 129, 243, 215, 251, 124, 227, 57, 130, 155, 47, 255, 135, 52, 142, 67, 68, 196, 222, 233, 203, 84, 123, 148, 50, 166, 194, 35, 61, 238, 76, 149, 11, 66, 250, 195, 78, 8, 46, 161, 102, 40, 217, 36, 178, 118, 91, 162, 73, 109, 139, 209, 37, 114, 248, 246, 100, 134, 104, 152, 22, 212, 164, 92, 204, 93, 101, 182, 146, 108, 112, 72, 80, 253, 237, 185, 218, 94, 21, 70, 87, 167, 141, 157, 132, 144, 216, 171, 0, 140, 188, 211, 10, 247, 228, 88, 5, 184, 179, 69, 6, 208, 44, 30, 143, 202, 63, 15, 2, 193, 175, 189, 3, 1, 19, 138, 107, 58, 145, 17, 65, 79, 103, 220, 234, 151, 242, 207, 206, 240, 180, 230, 115, 150, 172, 116, 34, 231, 173, 53, 133, 226, 249, 55, 232, 28, 117, 223, 110, 71, 241, 26, 113, 29, 41, 197, 137, 111, 183, 98, 14, 170, 24, 190, 27, 252, 86, 62, 75, 198, 210, 121, 32, 154, 219, 192, 254, 120, 205, 90, 244, 31, 221, 168, 51, 136, 7, 199, 49, 177, 18, 16, 89, 39, 128, 236, 95, 96, 81, 127, 169, 25, 181, 74, 13, 45, 229, 122, 159, 147, 201, 156, 239, 160, 224, 59, 77, 174, 42, 245, 176, 200, 235, 187, 60, 131, 83, 153, 97, 23, 43, 4, 126, 186, 119, 214, 38, 225, 105, 20, 99, 85, 33, 12, 125, }; word32 RC[30] = { 0x00, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36, 0x6c, 0xd8, 0xab, 0x4d, 0x9a, 0x2f, 0x5e, 0xbc, 0x63, 0xc6, 0x97, 0x35, 0x6a, 0xd4, 0xb3, 0x7d, 0xfa, 0xef, 0xc5}; #define MAXBC 8 #define MAXKC 8 #define MAXROUNDS 14 static word8 shifts[5][4] = { 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 4, 0, 1, 3, 4}; static int numrounds[5][5] = { 10, 11, 12, 13, 14, 11, 11, 12, 13, 14, 12, 12, 12, 13, 14, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14}; int BC, KC, ROUNDS; word8 mul(word8 a, word8 b) { /* multiply two elements of GF(256) required for MixColumns and InvMixColumns */ if (a && b)

PAGE 76

63 APPENDIX A (Continued) return Alogtable[(Logtable[a] + Logtable[b])%255]; else return 0; } void AddRoundKey(word8 a[4][MAXBC], word8 rk[4][MAXBC]) { /* XOR corresponding text input and round key input bytes */ int i; for (i=0; i<4; i++) { a[i][0] ^= rk[i][0]; a[i][1] ^= rk[i][1]; a[i][2] ^= rk[i][2]; a[i][3] ^= rk[i][3]; } } void SubBytes(word8 a[4][MAXBC], word8 box[256]) { /* Replace every byte of the input by the byte at that place in the non-linear S-box */ int i; for (i=0; i<4; i++) { a[i][0] = box[a[i][0]]; a[i][1] = box[a[i][1]]; a[i][2] = box[a[i][2]]; a[i][3] = box[a[i][3]]; } } void ShiftRows(word8 a[4][MAXBC]) { /* Row 0 remains unchanged The other three rows are shifted a variable amount */ word8 tmp[MAXBC]; int i; for (i=1; i<4; i++) { tmp[0] = a[i][(0 + i) % BC]; tmp[1] = a[i][(1 + i) % BC]; tmp[2] = a[i][(2 + i) % BC]; tmp[3] = a[i][(3 + i) % BC]; a[i][0] = tmp[0]; a[i][1] = tmp[1]; a[i][2] = tmp[2]; a[i][3] = tmp[3]; } } void InvShiftRows(word8 a[4][MAXBC]) { /* Row 0 remains unchanged The other three rows are shifted a variable amount */ word8 tmp[MAXBC]; int i; for (i=1; i<4; i++) { tmp[0] = a[i][(0 + 4 i) % BC];

PAGE 77

64 APPENDIX A (Continued) tmp[1] = a[i][(1 + 4 i) % BC]; tmp[2] = a[i][(2 + 4 i) % BC]; tmp[3] = a[i][(3 + 4 i) % BC]; a[i][0] = tmp[0]; a[i][1] = tmp[1]; a[i][2] = tmp[2]; a[i][3] = tmp[3]; } } void MixColumns(word8 a[4][MAXBC]) { /* Mix the four bytes of every column in a linear way */ word8 b[4]; word8 temp1[4], temp2[4]; int j; for(j=0; j< BC; j++) { b[0] = mul(2,a[0][j]); temp1[0]= mul(3,a[1][j]); temp2[0]= a[2][j] ^ a[3][j]; temp2[0]^= temp1[0]; b[0] ^= temp2[0]; b[1] = a[0][j] ^ a[3][j]; temp1[1]= mul(2,a[1][j]); temp2[1]= mul(3,a[2][j]); temp2[1]^= temp1[1]; b[1] ^= temp2[1]; b[2] = a[0][j] ^ a[1][j]; temp1[2]= mul(2,a[2][j]); temp2[2]= mul(3,a[3][j]); temp2[2]^= temp1[2]; b[2] ^= temp2[2]; b[3] = mul(3,a[0][j]); temp1[3]= a[1][j] ^ a[2][j]; temp2[3]= mul(2,a[3][j]); temp2[3]^= temp1[3]; b[3] ^= temp2[3]; a[0][j] = b[0] ; a[1][j] = b[1] ; a[2][j] = b[2] ; a[3][j] = b[3] ; } } void InvMixColumns(word8 a[4][MAXBC]) { /* Mix the four bytes of every column in a linear way This is the opposite operation of MixColumns */

PAGE 78

65 APPENDIX A (Continued) word8 b[4];//[MAXBC]; word8 temp0[3], temp1[3], temp2[3], temp3[3]; int j; for(j=0; j< BC; j++) { b[0] = mul(0xe,a[0][j]); temp0[0]= mul(0xb,a[1][j]); temp0[1]= mul(0xd,a[2][j]); temp0[2]= mul(0x9,a[3][j]); temp0[1]^=temp0[0]; b[0] ^=temp0[2]; b[0] ^=temp0[1]; b[1] = mul(0x9,a[0][j]); temp1[0]= mul(0xe,a[1][j]); temp1[1]= mul(0xb,a[2][j]); temp1[2]= mul(0xd,a[3][j]); temp1[1]^=temp1[0]; b[1] ^=temp1[2]; b[1] ^=temp1[1]; b[2] = mul(0xd,a[0][j]); temp2[0]= mul(0x9,a[1][j]); temp2[1]= mul(0xe,a[2][j]); temp2[2]= mul(0xb,a[3][j]); temp2[1]^=temp2[0]; b[2] ^=temp2[2]; b[2] ^=temp2[1]; b[3] = mul(0xb,a[0][j]); temp3[0]= mul(0xd,a[1][j]); temp3[1]= mul(0x9,a[2][j]); temp3[2]= mul(0xe,a[3][j]); temp3[1]^=temp3[0]; b[3] ^=temp3[2]; b[3] ^=temp3[1]; a[0][j] = b[0] ; a[1][j] = b[1] ; a[2][j] = b[2] ; a[3][j] = b[3] ; } } int KeyExpansion (word8 k[4][MAXKC], word8 W[MAXROUNDS+1][4][MAXBC]) { /* Calculate the required round keys */ int i, j, t, RCpointer = 1; word8 tk[4][MAXKC]; for(j=0; j< KC; j++) for (i=0; i<4; i++) tk[i][j] = k[i][j]; t=0; /* Copy values into round key array */

PAGE 79

66 APPENDIX A (Continued) for (j=0; (j
PAGE 80

67 APPENDIX A (Continued) return 0; } int Decrypt (word8 a[4][MAXBC], word8 rk[MAXROUNDS+1][4][MAXBC]) { int r; /* To decrypt: apply the inverse operations of the encrypt routine, in opposite order * AddRoundKey is equal to its inverse) the inverse of SubBytes with table S is SubBytes with the inverse table of S) the inverse of Shiftrows is Shiftrows over a suitable distance) */ /* First the special round: without InvMixColumns with extra AddRoundKey */ AddRoundKey(a, rk[ROUNDS]); SubBytes(a, Si); InvShiftRows(a); // ROUNDS-1 ordinary rounds for ( r=ROUNDS-1; r > 0; r--) { AddRoundKey(a,rk[r]); InvMixColumns(a); SubBytes (a,Si); InvShiftRows(a); } // End with the extra key addition AddRoundKey(a,rk[0]); return 0; } int main() { int i, j; word8 a[4][MAXBC], rk[MAXROUNDS+1][4][MAXBC], sk[4][MAXKC]; // AES block length allowed is 128 // BC=4 for 128-bit plaintext BC = 4; /* The KC value is changed to set the key length. KC=4 for 128-bit key, KC=6 for 192-bit key KC=8 for 256-bit key */

PAGE 81

68 APPENDIX A (Continued) KC = 4; { ROUNDS = numrounds[KC-4][BC-4]; // For the real system plaintext and // key is given by an external application for ( j=0; j
PAGE 82

69 APPENDIX A (Continued) Figure A.1: Function call tree

PAGE 83

70 INDEX 3-DES, 2 Address Generation Unit, 16 AddRoundKey( ), 23 AddRoundKey(), 46 Advanced Encryption Standard, 15, 18 AES, 15, 18 AES-128, 15 AES-192, 15 AES-256, 15 AGU, 16 algorithm switching, 56 ASIC, 6 Asymmetric algorithm, 9 Attacks, 36 Authenticity, 8 Block algorithm, 9 bruteforce attack, 9 buffer, 25 C, 6 C++, 6 CBC, 11 CFB, 12 cipher, 8 Cipher Block Chaining, 11 Cipher Feedback Mode, 11 CodeWarrior, 26 CodeWarrior profiler, 42 Compiler Exploitation, 27 confusion, 9 Constant-time Implementation, 38 Co-synthesis, 15 Counter Measures, 38 Counter Mode, 13 Critical Paths, 28 cryptanalysis, 1 cryptographer, 8 Cryptography, 1, 8 Cryptology, 1 cryptosystem, 8 CTR, 13 Data ALU, 16 Data Arithmetic Logic Unit, 16 Data throughput, 54 decrypt(), 45, 47 decryption, 8 DES, 2 Differential Power Analysis, 37 diffusion, 9 ECB, 10 Electronic Code Book, 10 embedded java, 7 embedded system, 1 encrypt(), 45, 47 encryption, 8 Exclusive-OR Cipher, 9 Fault induction attacks, 38 Field Programmable Gate Arrays, 6 GF, 23 High-level Synthesis, 26 I/O Queues Management, 35 Instruction-level Parallelism, 30 Integrity, 8 Inverse Cipher, 24 InvShiftRows( ), 22 Java, 7 KeyExpansion(), 19 linear feedback shift register, 39 loop merging, 31 loop unrolling, 30 Memory size, 54 Metal shielding, 53 Microcontroller, 5 MixColumns( ), 22 modularization, 27 mul(), 45 Multi-sample Processing, 31 OFB, 12 Optical probing, 37 optimization metrics, 53 optimization techniques, 28 Output Feedback Mode, 12 physical shielding, 37 pipelining, 31 plaintext, 8 Portability, 27 Power Attacks, 37 Power consumption, 54 Probing attacks, 37 Production cost, 53 Program Sequencer, 16 PSEQ, 16 pseudo code, 19 pseudo-random number, 39 public-key algorithm, 9 Random number generation, 38 Rijndael, 15 RISC, 5 Robustness, 54 S-box, 21 Secrecy, 8 ShiftRows(), 22, 45 Side-channel Cryptanalysis, 37 Simple Power Analysis, 37 smart cards, 53 Split Summation, 32 state, 18 Stream algorithm, 9 Structural Partitioning, 5 SubBytes(), 21, 45 Substitution Cipher, 9 Symmetric algorithm, 9 Temporal obfuscation, 38

PAGE 84

71 Time-sliced multitasking, 35 Timing attacks, 37 Transposition Cipher, 9 XOR, 9


xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001441478
003 fts
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 031203s2003 flua sbm s000|0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0000151
035
(OCoLC)54018230
9
AJM5918
b SE
SFE0000151
040
FHM
c FHM
090
TK145
1 100
Kandi, Jayavardhan R.
0 245
Embedded cryptography
h [electronic resource] :
an analysis and evaluation of performance and code optimization techniques for encryption and decryption in embedded systems /
by Jayavardhan R. Kandi.
260
[Tampa, Fla.] :
University of South Florida,
2003.
502
Thesis (M.S.E.E.)--University of South Florida, 2003.
504
Includes bibliographical references.
516
Text (Electronic thesis) in PDF format.
538
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
500
Title from PDF of title page.
Document formatted into pages; contains 84 pages.
520
ABSTRACT: It is clear that Cryptography is computationally intensive. It is also known that embedded systems have slow clock rates and less memory. The idea for this thesis was to study the possibilities for analysis of cryptography on embedded systems. The basic approach was the implementation of cryptographic algorithms on high-end, state-of-the-art, DSP chips in order to study the various parameters that optimize the performance of the chip while keeping the overhead of encryption and decryption to a minimum. Embedded systems are very resource sensitive. An embedded system is composed of different components, which are implemented in both hardware and software. Therefore, hardware-software co-synthesis is a crucial factor affecting the performance of embedded systems. Encryption algorithms are generally classified as data-dominated systems rather than ubiquitous control-dominated systems. Data-dominated systems have a high degree of parallelism. Embedded systems populate the new generation gadgets such as cell phones and Smartcards where the encryption algorithms are obviously an integral part of the system. Due to the proliferation of embedded systems in all the current areas, there is a need for the systematic study of encryption techniques from the embedded systems point of view. This thesis explored the different ways encryption algorithms can be made to run faster with much less memory. Some of the issues investigated were overlapped scheduling techniques for high-level synthesis, structural partitioning, real-time issues, reusability and functionality, random number and unique key generators, seamless integration of cryptographic code with other applications and architecture specific optimization techniques.
590
Adviser: Moreno, Wilfrido
653
co-synthesis.
dsp.
rijndael.
aes.
starcore.
690
Dissertations, Academic
z USF
x Electrical Engineering
Masters.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.151