xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001469401
003 fts
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 040524s2004 flua sbm s000|0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0000303
035
(OCoLC)55731088
9
AJR1155
b SE
SFE0000303
040
FHM
c FHM
090
TK7885
1 100
Bilhanan, Anuleka.
0 245
High level synthesis of an image processing algorithm for cancer detection
h [electronic resource] /
by Anuleka Bilhanan.
260
[Tampa, Fla.] :
University of South Florida,
2004.
502
Thesis (M.S.Cp.E.)--University of South Florida, 2004.
504
Includes bibliographical references.
516
Text (Electronic thesis) in PDF format.
538
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
500
Title from PDF of title page.
Document formatted into pages; contains 95 pages.
520
ABSTRACT: There is a crucial need for real time detection and diagnosis in digital mammography. To date, most computer aided analysis applications are software driven and normally require long processing times. Digital filtering is often the initial stage in processing mammograms for both automated detection and tissue characterization, which relies on Fourier analysis. In this research the main objective is to lay the groundwork for converting software driven mammography applications to hardware implementations by using Application-Specific Integrated Circuits (ASICs). The long-term goal is to increase processing speed. This research focuses on achieving the main objective by using one specific mammographic image processing application for demonstration purposes. ASICs offer high performance at the price of high development costs and are suitable for real time diagnosis. In this research, we develop a behavioral VHDL model of a specific filtering algorithm. Automatic Design Instantiation System (AUDI)8, a high level synthesis tool is used to automatically synthesize an RTL design from the model. A floating point behavioral component library is developed to support the synthesis of the filtering algorithm. The work shows that the hardware output is identical to the software driven output at when considering eight-bit accuracy and shows only rounding errors at higher storage capacities.
590
Co-adviser: Katkoori, Srinivas
Co-adviser: Heine, John
653
behavioral synthesis.
AUDI.
ASIC.
Fourier transform.
deconvolution.
690
Dissertations, Academic
z USF
x Computer Engineering
Masters.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.303
PAGE 1
High Level Synthesis Of An Image Processing Algorithm For Cancer Detection by Anuleka Bilhanan A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Engineering Department of Compute Science and Engineering College of Engineering University of South Florida Co-Major Professor: Srinivas Katkoori, Ph.D. Co-Major Professor: John Heine, Ph.D. Murali Varanasi, Ph.D. Date of Approval: March 29, 2004 Keywords: Deconvolution, Fourier Transform, ASIC, AUDI, Behavioral Synthesis. Copyright 2004 Anuleka Bilhanan
PAGE 2
DEDICATION To my beloved parents and sister who believed in me.
PAGE 3
ACKNOWLEDGEMENTS First of all I would like to thank my professors, Dr. Katkoori who boosted my frame of mind every time I discussed my problems with him, making me a better personality; and Dr. Heine who has motivated me and affirmed that I learn Fourier transforms which I will be always grateful for. I am fortunate to have a very experienced professor like Dr.Varanasi on my committee. I am glad to have had a chance to work with the VLSI team especially Ranga and Hari who have constantly been there to provide me technical support. I appreciate Narendar, Andrew and Suvodeep for their helping tendency, and Alok and Daita for patiently listening to me during my dull days. Among the people at ISRD, I extend my thanks to Madhu and Mugdha who have supported and encouraged me. I am very indebted to my dear friends Rajesh, Ashok and Tinku who have always been there, constantly cheering me till the end.
PAGE 4
i TABLE OF CONTENTS LIST OF FIGURES........................................................................................................iii ABSTRACT....................................................................................................................v CHAPTER 1 INTRODUCTION ..................................................................................1 1.1 Hardware Implementation Using ASICs...............................................................2 1.2 ASICs in Medical Imaging....................................................................................3 1.3 Overview of Proposed Work.................................................................................4 1.4 Contribution and Organization of the Thesis.........................................................5 CHAPTER 2 BACKGROUND AND RELATED WORK ............................................6 2.1 ASIC Design Flow................................................................................................6 2.2 High-Level Synthesis............................................................................................8 2.3 Mammography...................................................................................................12 2.4 Full-Field Digital Mammography.......................................................................13 2.5 The Fourier Transform........................................................................................15 2.5.1 Two Dimensional Fourier Transform....................................................15 2.5.2 Filtering Images in the Frequency Domain...........................................16 2.6 Pre-whitening Technique...................................................................................17 2.6.1 Image Modeling...................................................................................17 2.7 IDL Implementation of the Deconvolution Algorithm........................................19 2.8 Concluding Remarks..........................................................................................22 CHAPTER 3 HIGH LEVEL MODELLING AND SYNTHESIS OF..........................24 IMAGE PROCESSING ALGORITHM...............................................24 3.1 Deconvolution Algorithm..................................................................................24 3.1.1 Fourier Transform Module...................................................................25 3.1.1.1 One Dimensional Fourier Transform.......................................26 3.1.2 Octave Ring Power Function................................................................30 3.1.2.1 Octave Power Measurement Function.....................................31
PAGE 5
ii 3.1.2.2 Line Fitting Function..............................................................32 3.1.3 Inverse Fourier Transform Function.....................................................33 3.2 The IEEE 754 Floating-Point Standard..............................................................34 3.3 VHDL Model of Components ............................................................................35 3.3.1 Floating Point Addition/Subtraction.....................................................36 3.3.2 Floating Point Multiplication................................................................37 3.3.3 Floating Point Division.........................................................................39 3.3.4 Sine Function.......................................................................................40 3.3.5 Cosine Function...................................................................................41 3.3.6 Log Function........................................................................................42 3.3.7 Power Function....................................................................................44 3.3.8 Negate Function...................................................................................45 3.3.9 Real to Integer Function.......................................................................45 3.3.10 Integer to Real Function.......................................................................45 3.4 High Level Synthesis Framework......................................................................46 CHAPTER 4 EXPERIMENTAL RESULTS..............................................................48 4.1 IDL Results.......................................................................................................48 4.2 Behavioral Model Results..................................................................................48 4.3 Test Results of Components...............................................................................52 CHAPTER 5 CONCLUSIONS AND FUTURE RESEARCH....................................64 REFERENCES..............................................................................................................65 APPENDIX A.Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…..Â…..70 APPENDIX B.Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…Â…..Â…..81
PAGE 6
iii LIST OF FIGURES Figure 1.1 RTL Implementation of Deconvolution Algorithm...................................4 Figure 2.1 ASIC Design Flow...................................................................................8 Figure 2.2 High Level Synthesis Flow.....................................................................10 Figure 2.3 RT Level Design Model given by AUDI System....................................11 Figure 2.4 State Machine of a Controller in AUDI..................................................12 Figure 2.5 Two Dimensional Fourier Transform.....................................................16 Figure 2.6 Flow Diagram of the Deconvolution Algorithm......................................20 Figure 2.7 A Raw FFDM Mammogram..................................................................21 Figure 2.8 Stages during Deconvolution of ROI......................................................22 Figure 3.1 High Level Diagram of Deconvolution Algorithm..................................25 Figure 3.2 Two Dimensional Discrete Fourier Transform........................................26 Figure 3.3 One Dimensional Discrete Fourier Transform........................................27 Figure 3.4 One Dimensional Discrete Fourier Transform Block..............................28 Figure 3.5 CDFG of the One Dimensional FT Function..........................................29 Figure 3.6 Octave Ring Power Function..................................................................30 Figure 3.7 CDFG of the Octave Power Measurement Function...............................31 Figure 3.8 CDFG of the Line Fitting Function.........................................................32 Figure 3.9 CDFG of the Convolution Function........................................................33 Figure 3.10 CDFG of the Byte Scaling Function.......................................................34 Figure 3.11 IEEE Format of Floating Point Numbers................................................35 Figure 3.12 Functional Block of a Floating Point Adder/Subtractor...........................37 Figure 3.13 Functional Block of a Floating Point Multiplier......................................38 Figure 3.14 Functional Block of a Floating Point Divider..........................................39 Figure 3.15 Functional Block of a Sine Component..................................................40 Figure 3.16 Functional Block of a Logarithm Unit....................................................43 Figure 3.17 Functional Block of a Power Calculation Component.............................44
PAGE 7
iv Figure 3.18 High Level Synthesis..............................................................................47 Figure 4.1 Test Image 1...........................................................................................49 Figure 4.2 Test Image 2...........................................................................................50 Figure 4.3 Histogram Comparison of Image 1.........................................................51 Figure 4.4 Histogram Comparison of Image 2.........................................................51 Figure 4.5 Floating Point Adder Waveform.............................................................53 Figure 4.6 Floating Point Subtractor Waveform......................................................54 Figure 4.7 Floating Point Multiplier Waveform.......................................................55 Figure 4.8 Floating Point Divider Waveform...........................................................56 Figure 4.9 Sine Waveform......................................................................................57 Figure 4.10 Cosine Waveform...................................................................................58 Figure 4.11 Log Function Waveform........................................................................59 Figure 4.12 Power Function Waveform.....................................................................60 Figure 4.13 Real to Integer Waveform......................................................................61 Figure 4.14 Integer to Real Waveform......................................................................62 Figure 4.15 Negate Function Waveform....................................................................63
PAGE 8
v HIGH LEVEL SYNTHESIS OF AN IMAGE PROCESSING ALGORITHM FOR CANCER DETECTION Anuleka Bilhanan ABSTRACT There is a crucial need for real time detection and diagnosis in digital mammography. To date, most computer aided analysis applications are software driven and normally require long processing times. Digital filtering is often the initial stage in processing mammograms for both automated detection and tissue characterization, which relies on Fourier analysis. In this research the main objective is to lay the groundwork for converting software driven mammography applications to hardware implementations by using Application-Specific Integrated Circuits (ASICs). The long-term goal is to increase processing speed. This research focuses on achieving the main objective by using one specific mammographic image processing application for demonstration purposes. ASICs offer high performance at the price of high development costs and are suitable for real time diagnosis. In this research, we develop a behavioral VHDL model of a specific filtering algorithm. Automatic Design Instantiation System (AUDI)8, a high level synthesis tool is used to automatically synthesize an RTL design from the model. A floating point behavioral component library is developed to support the synthesis of the filtering algorithm. The work shows that the hardware output is identical to the software driven output at when considering eight-bit accuracy and shows only rounding errors at higher storage capacities.
PAGE 9
1 CHAPTER 1 INTRODUCTION Demand for digital image processing techniques has increased drastically in the past few decades, especially in the fields of biomedical engineering, computer tomography, automatic control, robotics, speech and communication, nuclear medicine and defense. Common image processing operations include un-sharp masking, noise suppression, distortion suppression, compression, histogram analysis, and thresholding that are used in everyday applications to sharpen, smooth, enhance edges, filter noise and scale images [1]. Some techniques fall either under image enhancement or image restoration. Restoration techniques are used to repair distorted images, and enhancement techniques are used to improve the visual appearance of the image. Processes such as filtering may be applied for both enhancement and restoration purposes. Although these techniques are widely applied, they are computationally expensive, especially when applied to 2D images with large dimensions. Most importantly, the performance times are too slow in many cases for real-time processing. Due to the two-dimensional nature of most image processing applications, parallel and distributed approaches can be applied. Many image processing operations such as filtering involve performing the same calculation repeatedly on different parts of the image [2]. This makes these operations most suitable for a hardware implementation for the following reasons: the identical and independent operations performed on a large number of pixels enable parallelism; simple operations performed on each pixel involve a considerable loop overhead in software, which can be reduced by including minimal circuitry in hardware.
PAGE 10
2 Due to the characteristics of the image processing techniques mentioned above, hardware may outperform general purpose microprocessor implementations to a great extent at reasonable cost. This speedup is achieved with: Parallelism in space replicating hardware by executing several identical functions simultaneously; Parallelism in time through pipelining executing several different stages of the computation at once; and Reduction in overhead due to loop count and memory accesses. Although it is difficult to achieve these benefits over a variety of imaging techniques, parallelization is beneficial to a certain set of functions. Highly optimized parallel algorithms for processing real-time data can be efficiently implemented through the use of dedicated custom hardware. 1.1 Hardware Implementation Using ASICs Generally the obvious choice for implementing such dedicated applications would be an application specific integrated circuit (ASIC) using VLSI technology [3]. ASICs possess predominant advantages such as small chip size, low power consumption, and a low cost per unit for large series [4]. Although the increasing complexity of the systems can lead to a substantial increase in both chip area and design time of the ASIC, its high speed and performance make it advantageous for specific applications. An application specific integrated circuit is a collection of logic and memory circuits on a single silicon die. The application of ASICs range widely in a variety of everyday consumer products such as digital cameras, networking products, automobiles, personal digital assistants, video games, and computer workstation chip sets. ASICs, in general, are microchips designed for end-users to perform a particular function in order to meet the specific requirements of their application. ASICs are advantageous when implementing fixed function applications, as its design can be optimized for cost and processing performance.
PAGE 11
3 The design is implemented in a single silicon die by mapping the functions to a set of pre-designed and verified logic circuits ranging from simple gates such as inverters, NORs, NANDs, flip-flops, and latches, to complex structures such as adders, counters, and static memory arrays. Advanced ASICs can take advantage of highly complex predesigned circuits (cores) such as microprocessors and peripheral component interconnect (PCI) controllers. 1.2 ASICs in Medical Imaging Imaging for medical application systems such as Magnetic Resonance (MR), Computed Tomography (CT), Ultra-Sound (US), and Positron Emission Tomography (PET) are in constant need of higher levels of technology available through semiconductor solutions. Imaging applications such as X-Ray mammography and CT scanners demand higher resolution, 3D imaging, and faster image-capture. The MR technology requires transmission with very fast pulse sequence capabilities and also a continuous demand for improvement in image quality. Fourier transforms, a very fundamental operation in many image processing systems, requires the ability to process data in parallel with the help of multiple processing elements and multiple data access from memory for higher performance. Such imaging techniques compel the need for increased processing speeds and quick information retrieval with reduced size for ease of use. A section of the medical field that has conventionally used ASIC products for its custom logic needs is medical imaging. In spite of the high development cost associated with ASIC technology, the high performance requirements and high degree of parallelism in certain subsystems make ASICs an optimal solution [5]. For instance, the image processor subsystem in CT scanner, which usually takes a huge amount of data for each image and reduces them to 512 x 512 or 1024 x 1024 pixel output, is one that needs an advanced ASIC technology [6]. In these cases, FPGAs turn out to be computationally intensive. In order to achieve the computational performance ASIC gate or standard cell technology is required. Therefore, an ASIC technology not only reduces product cost but also increases performance and improves reliability.
PAGE 12
4 1.3 Overview of Proposed Work This work involves implementing a widely employed image processing technique, which is referred to as deconvolution, on an ASIC (Figure 1.1). The existing implementation is coded in software using the Interactive Data Language (IDL) and its graphical environment, which include several prewritten procedures that are specifically geared to image processing applications. Even such advanced programs do not meet the real time requirements especially while processing high-resolution (often large in dimension) images. In order to improve performance a hardware implementation of the deconvolution algorithm using ASICs is proposed. A behavioral description of the algorithm modeled in VHDL and validated using Cadence NC Launch. The model is then processed using Automatic Design Instantiation System (AUDI), a high level synthesis system that synthesizes a register-transfer level design from the given behavioral description. A component library to handle floating point data and a memory implementation has been integrated with the high level synthesis system. Figure 1.1 RTL Implementation of Deconvolution Algorithm AUDI Algorithm (VHDL model) Library RTL Design (Architecture)
PAGE 13
5 1.4 Contribution and Organization of the Thesis The long-term goals are to implement real time image processing in digital mammography, which includes lesion detection, classification as well as real time breast cancer risk assessment. For these to be succe ssful, a customized ASIC must be designed with significant timing improvements. The following are the milestones reached during the course of this thesis: A high level modeling of the deconvolution algorithm [7] using VHDL has been achieved, with a good precision, and aids in implementing further enhancements with minor modifications. The design has been synthesized using AUDI [8], and validated at registertransfer (RT) level. The AUDI system has been extended to handle floating point data. The AUDI system has been extended to instantiate memories and appropriate control. A floating point (IEEE 754 single precision) component library for AUDI has been implemented. The organization of the rest of the thesis is given below. Chapter 2 presents a brief overview of the required background image processing information. It provides information on high level synthesis, the general ASIC design flow, and a detailed description of the deconvolution algorithm. Chapter 3 presents a module by module description of the VHDL model of the algorithm with an architectural description of all components implemented. The first part of Chapter 4 reports IDL results for several test images that are compared to the results of the behavioral VHDL model. The testbench results of each of the components are also provided followed by the RTL simulation results. Chapter 5 draws the conclusions of the thesis and provides thoughts on future work for improving and extending this work.
PAGE 14
6 CHAPTER 2 BACKGROUND AND RELATED WORK In this chapter a brief introduction of the necessary technical background is provided. This is essential material that helps the reader understand the objectives of this research. This chapter provides an overview of the ASIC design flow, and high-level synthesis. It highlights the importance of mammography, digital mammography (DM), computer applications in mammography and relevant image processing techniques such as Fourier analysis are highlighted. 2.1 ASIC Design Flow Designing an ASIC involves the following basic steps: 1. Design entry : The function expected to be performed by the IC is given as a behavioral specification. This design is provided as a schematic entry or as highlevel description languages (HDLs) such as VHDL [9,10,11] and Verilog [12]. VHDL and Verilog are hardware description languages in which designs can be represented at different levels of abstraction including behavioral descriptions, structural descriptions, and logical descriptions. 2. Logic synthesis : This uses an HDL (VHDL or Verilog), and a logic synthesis tool to produce a netlist (description of the logic cells and their connections). 3. System partitioning : A large system is divided into ASIC-sized pieces. 4. Pre-layout simulation : Design functionality is checked for correctness. This early verification helps eliminate malfunction due to faulty design. 5. Floorplanning : The blocks of the netlist are compactly arranged to fit within a given chip area.
PAGE 15
7 6. Placement and Routing : The routing of interconnections is done among various cells and blocks of the design. During placement, the overall area of the IC expands to provide the wiring area, requiring rearrangements of blocks for an optimal floorplan. 7. Extraction: The resistance and capacitance of the interconnects are extracted. 8. Post-layout simulation : Before fabricating the masks and proceeding with manufacture of the ASIC circuit, a final verification of the ASIC is performed to ensure it works with the added loads of the interconnect. This simulation is performed using detailed and accurate simulation tools that are tuned to the foundry's process and thus provide the final verification of performance. The logical design consists of steps 1 through 4, and the physical design comprises steps 5 through 9 although certain stages relate to both. For instance, system partitioning might be considered as either logical or physical design as it involves both factors [13].
PAGE 16
8 Figure 2.1 ASIC Design Flow 2.2 High-Level Synthesis High level synthesis (HLS) is the process of automatically generating hardware circuits from a high-level input description. The functionality of a system is specified precisely by the designer and the synthesis software transforms it into an equivalent structure that is optimized according to a set of high level user constraints. In other words, high-level synthesis is defined as the method of transforming a behavioral (algorithmic) description of a design into a register transfer level (RTL) description. The specification captures the functionality of the system in a high level language, such as VHDL or verilog. The inputs Physical design Prelayout S imulation 4 Routing 7 Placement 6 Floor Planning 5 Postlayout simulation 9 System partitioning 3 Logic synthesis 2 Desi gn entry 1 Circuit extraction 8 Logical design Layout Start
PAGE 17
9 for this synthesis are the behavioral specification, design constraints, optimization function, and a library of components at RT level. The output of high-level synthesis is a RTL implementation of given behavioral specification in graphical form comprising of a structural datapath and a behavioral controller. A set of transforms are then applied to this graph and the design parameters such as speed, area, and power dissipation are optimized. The entire flow of the operations occurring in high level synthesis is shown in Figure 2.2. The major operations that are carried out during high level synthesis are operation scheduling, resource allocation, operation scheduling, datapath allocation, control allocation, module binding, resource binding, and clock selection. Given a behavioral specification, dataflow analysis checks for loop unrolling and parallelism. The operation scheduling assigns each operation in the design to a time step in which it will be executed without violating data dependencies. Resource allocation determines the logic types involved (e.g., adder, register, etc.,) and the number of each of these resources that should be included in the design. For each resource allocation, multiple clock periods are selected. The cost function is evaluated using a resource and a clock period. The clock period with the best cost is used for better resource allocation. Resource binding determines which resources should be used to implement a specific operation. Operation scheduling achieves performance to cost trade-offs and clocking strategy. During data path allocation, the required operator and register are allocated. These resources are interconnected in such a way that the interconnection and hardware resources are minimized. Following this, the controller is implemented depending on the control type, physical modules and specification of these module parameters.
PAGE 18
10 Figure 2.2 High Level Synthesis Flow Automatic Design Instantiation System (AUDI) is a behavioral synthesis tool [8,14] capable of synthesizing a RTL design entirely in VHDL from high level behavioral descriptions. The behavioral level descriptions are usually represented as control data flow graphs (DFG). The scheduling algorithms implemented in AUDI are force-directed scheduling (FDS) [15], simultaneous scheduling, allocation and mapping algorithm (SAM) as well as simpler algorithms, name ly, as-soon-as-possible (ASAP) and as-lateas-possible (ALAP). The main goal of FDS is to reduce hardware by balancing concurrency. During this scheduling, every independent operation is scheduled during an Behavioral Specification Dataflow Analysis Operation Scheduling Module Allocation Module Binding Control Implementation & Datapath Datapath + Controller
PAGE 19
11 iteration i.e., an operator used in iteration is reused in other iterations to follow. The time frame for each operator is calculated such that unused operator can be reused in the following iterations. Distribution graphs for the operators are drawn to check and ensure concurrencies of similar operators. Allocation and mapping are done using clique partitioning heuristic [16]. Clique partitioning is a technique which can achieve maximum sharing possible between the components [17]. The first step of clique partitioning is to find a pair of connected nodes in the graph that has maximum number of common neighbors. The common neighbors are considered to find the vertex with the maximum number of common neighbors. The priority to the interconnects is given according to their weights. This heuristic is used in functional unit and register mapping. During functional unit mapping, the compatibility graph of the operations is given as the input to the clique partitioning heuristic. In the case of register mapping, analysis is done prior to mapping. Edges of the data flow graph are considered to be compatible if they have non-overlapping lifetimes. Sharing of functional units and registers can be implemented using multiplexers. The organization of the design of a controller and data-path is shown in Figure 2.3. Figure 2.3 RT Level Design Model given by AUDI System Datapath Controller Inputs Reset Outputs Start Flags Control Signals Finish Clock
PAGE 20
12 Figure 2.4 State Machine of a Controller in AUDI The controller is a finite state Moore machine. It is responsible for generating appropriate control signals for enabling registers, writing to registers, and select signals for interconnect units. The controller and data-path units are controlled by the same clock, a common reset, and communicate with each other via flags and control signals. The reset signal is activated before any active computation is performed so that the registers in datapath and controller are cleared. This is done during the idle state where the functional units are prepared for computation. When the inputs are given to the datapath unit, a start signal is asserted and the controller latches the primary inputs into the registers. The controller uses the first half of the clock cycle for computing the signals using the functional units and then uses these control signals to manage the datapath during the second half of the clock cycle. When the computation is done, the finish signal is given as output and the system goes back to idle state. This has been illustrated in the controller state machine in Figure 2.4 [18]. 2.3 Mammography Breast cancer is the second leading cause of cancer deaths in women in the United States today. One out of eight women develop breast cancer in their lifetime [19]. Screening mammography has proven to reduce the breast cancer mortality rate by about 30% especially in women of ages between 40-74 years [20,21]. Planar X-ray projection imaging is the most common form of mammography, where the projection (image) is captured with a screen film combination. Although mammography may provide many State 0 State 1 Finish IDLE Latch Primary Inputs 1 clock cycle Process Inputs Latch Primary Outputs Start State n
PAGE 21
13 benefits, there are some limitations and inherent error rates associated with the quality of the image, which lead to both false negative interpretations and false positive readings. About 12-30% of the cancer cases are missed in screening mammograms [22], as they were not visible or because of technical problems with the image due to improper positioning and interpretation errors. Likewise images of poor contrast due to high proportions of dense breast tissue also decrease the reading performance. Thus, a large number of suspicious findings on mammograms are found to be non-cancerous after a biopsy or ultrasound sonogram. Of some 700,000 biopsies performed each year, 75 percent are benign. These difficulties are largely due to the similarities between normal and abnormal breast tissues and limited film contrast [23]. In order to address some of the inherent limitations in mammography, many researchers have introduced computer-assisted methods for image analysis. Initially these methods were applied to digitized film images. Some of the most common computer applications include fully automated detection methods, abnormality classification, and enhancement. More recently, there is a newer imaging modality termed full field digital mammography (FFDM), where the image is acquired in digital form without film [24]. 2.4 Full-Field Digital Mammography Here we discuss some of the benefits that FFDM provides, since our intended applications will be on this type of data. The reader interested in film-based DM applications is referred to [25]. FFDM has one obvious advantage that it is film-less and the data are ready for electronic manipulation, soft-copy display, and archiving from its inception. Moreover, digital detectors may provide improved detection because of higher efficiency of absorption of the x-ray photons, a linear response over a wide range of radiation intensities, and low system noise. Since image acquisition and display are separated, each can be optimized independently, which may improve visualization and detection of the
PAGE 22
14 lesion [26]. Also, image-processing techniques have shown to improve visualization of details within digital mammograms [27]. Unlike film based data, digital imaging also offers improved storage, transmission, retrieval, and display of images in multiple formats. Thus, digital mammography has the potential to improve breast cancer screening performance. The Full-field digital mammography (FFDM) [28] system used in this research is the Senographe 2000D designed by GE. This is a flat panel detector that provides a field size of 19 x 23 cm with automatic exposure control through the detector along with a removal grid for magnification [29]. The image is displayed on a high-resolution monitor and can be manipulated using a number of image processing tools. This design of the FFDM system allows the development of new medical applications such as CAD, telemammography, tomosynthesis, and contrast medium imaging. The FFDM images used in our research are of size 1914 x 2294 pixels with a 100-micron pixel size and 16bit dynamic range. Over the past decade, several investigators have shown that Computer-aided diagnosis (CAD) software has the potential to increase the effectiveness of screening procedures by assisting the radiologist in interpreting the images and providing a second opinion thus improving diagnostic accuracy [30]. CAD systems use computerized algorithms for identifying suspicious regions of interest. It is proposed as an adjunct to mammography to decrease errors in perception (failure to see an abnormality). With the FFDM data and CAD applications, the images can be manipulated in real time on a workstation to promote detection and diagnosis of micro-calcifications and subtle lesions, even in mammographically dense breasts. It is important to note that real-time processing must be implemented rapidly if the computer-aided diagnosis (CAD) methods are to be incorporated into the clinical environment. Although there are many CAD innovations in literature today, most of them are still in experimental and testing stages, and are still not in the commercial arena [31, 32].
PAGE 23
15 2.5 The Fourier Transform In many of the CAD applications, digital filtering is often the first step in the processing chain. For instance, in the automated detection of calcifications, band-pass filtering is applied to isolate the spatial frequencies relative to localized abnormalities. This suppresses the low-frequency background and makes the abnormalities relatively more pronounced. Other filtering operations are used for viewing enhancement of soft copy display [33]. 2.5.1 Two Dimensional Fourier Transform In this section, an overview of Fourier analysis is provided. Here we focus on the mechanics of the transform, convolution techniques, and deconvolution methods. The Fourier transform (FT) in two dimensions is a simple extension of the one-dimensional case. The discrete two-dimensional Fourier transform (DFT) for an N x M pixelsized image is expressed as ) / 2 exp( ) / 2 exp( ) ( 1 ) (1 0 1 0N jkv M jlu k l f MN v u FM x N y ÂŠÂŠ =ÂŠ = ÂŠ = (2.1) ) / 2 exp( ) / 2 exp( ) ( ) (1 0 1 0N jvk M jul v u F k l fM u N v =ÂŠ = ÂŠ =, (2.2) where l and k are spatial frequency indices, f( l,k ) is the raw image, 1 ÂŠ = j and capital letters denote the Fourier domain. The two-dimensional transform can be separated into a series of one-dimensional transforms. First each row (or column) is transformed creating an intermediate Â“imageÂ”, which is followed by transforming each column (or row). This corresponds to performing N x M one-dimensional FTs. A pictorial representation of the process is provided below in Figure 2.5.
PAGE 24
16 Figure 2.5 Two Dimensional Fourier Transform The Discrete Fourier transform (DFT) may be considered as a sampled version of the continuous FT. The Fast Fourier Transform (FFT) is a rapid way to perform the DFT based on factoring when the image size is a power of two. It essentially changes a N x N operation into NlogN operation [34,35], where N is the number of elements in the transform. This can be applied only when the dimensions are a power of 2. However, when the dimensions are not a power of 2, we can either (1) zero-pad the image which will lead to unnecessary increases in image size or (2) just apply DFT, which is time consuming for large data sets. Since there are no guarantees that mammograms are Â‘power of twoÂ’ sizes, this processing in efficiency must be addressed. 2.5.2 Filtering Images in the Frequency Domain Filtering may be viewed in two equivalent ways: (1) as an operation in the spatial domain where a filtering kernel is convolved with the image; or (2) as a multiplication in the frequency domain where the respective transforms of the image and kernel are multiplied y x y u v u Image, f (l.k) Intermediate F (l,k) FT FT
PAGE 25
17 followed by Fourier inversion. The latter is often implemented more quickly than the former and is therefore the preferred operation if speed is a criterion. This is essentially the crux of the problem. If real time processing is desired, the time required to perform the FTs of large data sets must be reduced. In one dimension, the convolution integral is defined as ÂŠÂŠ = = du u x h u n x h x n x im ) ( ) ( ) ( ) ( ) (, (2.3) where n is the image (signal), h is an arbitrary filter kernel, and im is the resulting filtered version of n. The two dimensional extension is given by dv du v y u x h v u n y x h y x n y x imÂŠ ÂŠ= = ÂŠ ÂŠ. ) ( ) ( ) ( ) ( ) ( (2.4) The equivalent operation in the Fourier domain is a simple multiplication ) ( ) ( ) Im(y x y x y xf f H f f N f f =. (2.5) 2.6 Pre-whitening Technique A specific algorithm is discussed below in fine detail. We will use this example to realize the implementation of our ASIC. The developments below, although involved, really reduce to simple filtering process. Coding this example and testing may be considered as a test case. If successful, we will develop more elaborate algorithms. 2.6.1 Image Modeling As stated by Heine et al [36], a mammogram may be considered as an outcome of a linear filtering process, which is best described in the frequency domain as ) ( ) ( ) Im(y x y x y xf f N f f H f f =, (2.6) where ) (y xf f are two dimensional coordinates in the frequency domain, ) Im( is the inverse Fourier transform of the mammogram, ) (y xf f H is the transfer function in the
PAGE 26
18 frequency domain, and ) (y xf f N is the FT of the input random field [35]. As the form of ) Im(y xf f is known, once the filter is estimated, Equation 2.6 can be solved for) (y xf f N. Performing an inverse FT on this gives the input field in image-domain. This method is a deconvolution process and it is a pre-whitening technique that eliminates the low frequency background. It is important to get an approximate estimate of the filter. This is measured from the power spectrum of the image,2| ) Im( | f, which behaves as a 2/ 1 f process, where f is a radial frequency domain variable. The value characterizes the filter and a larger implies that the image is more irregular [35]. The Octave Spectral Analysis discussed in [7] addresses a method for finding This is based on integrating over rings on the power spectrum of the raw image. The concentric rings represent the division of the frequency plane of the image into octaves. The width of these octaves when in a radial coordinate system is such that each lower frequency ringÂ’s width measures half the width of its next higher frequency neighbor. The ) power total ( log2 calculated for each octave when plotted results in a linear plot provided the data are in the f / 1model. The data are plotted from the highest frequency octave to the lowest (Octaves 1 through 5). The obtained slope value, m of this line is related to value by the expression: 2 2 ÂŠ = m This relation is discussed in detail in the appendix A of [7]. The image formation model expressed in the image domain is given by ) ( ) ( ) ( y x h y x n y x im = (2.7) where ) ( y x im is the mammogram in image domain consisting of noise, ) ( y x n and filter, ) ( y x h.The corresponding Fourier view is given by ) ( ). ( ) Im(y x y x y xf f H f f N f f =, (2.8) where N is the noise and H is the filter.
PAGE 27
19 As indicated in section 2.6.1 the filter function may be approximated with the model of the filter,) (y xf f H given by ] f [f 1 ~ H2 y 2 x2+ = (2.9) The Fourier Domain Deconvolution takes the following form ) ( / ) Im( ) (y x y x y xf f H f f f f N =. (2.10) Taking inverse FT of this image gives the noise field )) ( / ) (Im( ) (1 y x y xf f H f f FT y x nÂŠ=, (2.11) which is the input for other stages of processing. 2.7 IDL Implementation of the Deconvolution Algorithm The technique discussed in section 2.6 has been implemented in Interactive Data Language (IDL), which is a high level software with extended data analysis and visualization routines. The main algorithm that will be implemented on the ASIC is illustrated in Figure 2.6. The input to the system is a 32-bit raw image of size 512 x 512 pixels. The resulting output is the deconvolved or pre-whitened image. In the work presented here a 512 x 512 ROI is used for demonstration purposes. The rectangular ROI shown in Figure 2.7 results from an automated software driven search algorithm that finds the largest rectangular region that can be inscribed within the breast region. This algorithm will be incorporated into the ASIC in the near future. The following are the sequence of steps involved in the deconvolution algorithm: A region of interest (ROI) is taken from a mammogram. A sample ROI is shown in figure 2.8(a). Fourier transform the image (ROI) and characterize the power spectrum by integrating over octave rings and measuring log2 (power) for each of the 5 octaves shown in figure 2.8 (b).
PAGE 28
20 Determine the filter parameter, by linear regression. Create the filter ) (y xf f Hof the form f / 1as described in equation 2.9. A general example of a high pass filter is shown in figure 2.8.(c) Deconvolve (multiplication) the FT of image with the filter as shown in Figure 2.6 and apply inverse FT. An example of the deconvolved image is shown in Figure 2.8 (d). Figure 2.6 Flow Diagram of the Deconvolution Algorithm Raw image im (x, y) Fourier Transform Spectral Characterization Filter / 1 f Filtered image ) ( y x f f N Deconvolved Image n (x, y) Inverse Fourier Transform Im (fx, fy)
PAGE 29
21 Figure 2.7 A Raw FFDM Mammogram
PAGE 30
22 (a) Raw Image (b) Shifted Octave (c) Filter (d) Deconvolved Image Figure 2.8 Stages during Deconvolution of ROI 2.8 Concluding Remarks This chapter covered the detailed description of deconvolution algorithm implemented using IDL. An introduction to a high level synthesis system, AUDI, was provided. The scope of this work includes implementing the above algorithm into the ASIC. The process involved implementing the algorithm into C programming language and then
PAGE 31
23 converting to Behavioral VHDL. Chapter 3 gives a detailed note on the VHDL implementation of the proposed algorithm and its high level synthesis.
PAGE 32
24 CHAPTER 3 HIGH LEVEL MODELLING AND SYNTHESIS OF IMAGE PROCESSING ALGORITHM The high level model and the control data flow graph (CDFG) of the image processing algorithm discussed in the previous chapter, is presented in this chapter. Several floating point behavioral VHDL components have been developed to implement the image processing algorithm. These components are instantiated by AUDI in the synthesized RTL design. The details of the algorithms of the components are presented in detail. This is followed by a third section that gives the design flow using which the unit and the components are implemented and tested. 3.1 Deconvolution Algorithm This section details on the architecture for the process of deconvolution of a FFDM image described in Chapter 2. An overview of the entire architecture is given using Figure 3.1. The main modules of this algorithm are the fourier-transform unit, octave ring power module, filtering and inverse FT module. A Fourier-transformed image is given as an input to the octave ring module. It calculates the Â‘Â’ value of the resulting filter from the power spectrum of the image using dmap, which is an image with 5 concentric rings where each inner ringÂ’s width is half the thickness of its adjacent outer ring.. The filtering module then convolves the raw image with a high-pass filter to produce the noise component in the frequency domain which is converted to image-domain by inverse FT. The Â‘Â’ given here represents the power to which the f / 1 filter is raised [7] and has been discussed in section 2.6. The following sections describe each of these modules in detail.
PAGE 33
25 Figure 3.1 High Level Diagram of Deconvolution Algorithm 3.1.1 Fourier Transform Module The raw images in our case are 512 x 512 images in pgm format that stores pixel values as 8-bit integers between 0 and 255. The input to the FT module is given in the form of complex array of 32 bit real and imaginary parts. The pixel values from the image file are mapped to real parts of the complex array and the imaginary parts are set to 0 initially. A 2D FT is applied by performing a N point one dimensional FT in row major order and then performing an N-point one dimensional FT in column major order on the intermediate result as shown in Figure 3.2 [37]. As each of the N points are used several times to perform a FT, it is considered advantageous to transpose the matrix before performing the second set of FT. Therefore, a row wise transformation is performed on the original matrix and another row-wise transformation is done on the transpose of the ) (y xf f H IFT Filter Octave Ring Analysis ) ( y x f f im Fourier Transform dmap ) ( y x n
PAGE 34
26 resulting intermediate matrix. This results in the Fourier-transformed image. Thus, all the instances/occurrences in this transformation can be formed by generating instances of two basic components namely: One dimensional FT Transposition block. The transposition block simply performs a transpose on the N x N array given out by the row transformed output from the FT block. It is also used to transpose the final column transformed output from the second one dimensional FT block. Figure 3.2 Two Dimensional Discrete Fourier Transform The use of multiple one dimensional FT components allows optimization by enabling parallelism in the design. Each one dimensional FT component will process a different set of data. Each component has a set of attributes through which functions and behaviors are implemented upon instantiation. Below, we take a detailed look at the one dimensional FT procedure. 3.1.1.1 One Dimensional Fourier Transform Basically, the FT changes an N-point input signal in spatial domain into two N-point output signal. While the input contains the amplitude of the signal measured in imagedomain, the output signals contain the amplitudes of the sine and cosine components (Figure 3.3). The spatial domain signals represented as ) (x im produces the frequency domain output represented as) Im(xf. The real part of ) Im(xf is amplitudes of cosine waves ) ( Im_xf R, while imaginary part ) ( Im_xf I are amplitudes of the sine waves, 1 x N 1-D FT (on rows) 32 32 3 N x N Image Block FT 1 x N N x N FT 1 x N 1 x N 1-D FT (on columns) Transpose rows to columns FT Coefficients (N x N 2-D) 32 32 32
PAGE 35
27 denoting complex arrays. These sine and cosine components are the basis functions in a FT. The basis functions are generated from ) ( Im_xf R = cos(arg) (3.1) ) ( Im_xf I = sin(arg), (3.2) where arg = N dir k / 2 of kth pixel in row and dir indicating forward or inverse FT. Figure 3.3 One Dimensional Discrete Fourier Transform The entire process of computing a two dimensional FT is illustrated in detail using the flowchart given in Figure 3.4. The input is given in the form of complex array with the pixel values loaded into the real part and imaginary parts loaded with zeroes initially. At a time one row from the 2D image is loaded. For each pixel k in row, the argument arg is determined. Then, for every other element in row, the cosine and sine basis functions are calculated and the transformed values Im _R and Im_I are calculated. The value of dir determines whether forward or inverse FT is to be performed. After every row is completed, the row-transformed image is obtained. N-1 N-1 Time Domain Frequency Domain im [ ] 0 N-1 0 0 FT IFT Im_R [ ] Cosine wave amplitude Sine wave amplitude Im_I [ ]
PAGE 36
28 Figure 3.4 One Dimensional Discrete Fourier Transform Block Reading every row of ima ge, im im (x,y) For every k th element in row find arg = 2*pi*k*dir/N For every other i th element in row fin d Cosine(arg*i) and Sine(arg*i) Calculate d_R and d_I d_R = d_R/P d_I = d_I/P Transpose of matrix Overwrite each complete d row in main-array Im (f x ,f y ) If dir =1
PAGE 37
29 The architecture for the one dimensional FT is discussed below. This module evaluates Equations 2.1 and 2.2. The architecture of the one dimensional FT module is shown in Figure 3.5. The four multiply operations calculate the intermediate values required to compute the real and imaginary. The mean is calculated using the summation of the FT, using adders in the feedback loops. The computation of both imaginary and real parts of the Fourier-transformed image can operate in parallel. Figure 3.5 CDFG of the One Dimensional FT Function arg (k) dir i b_R 1 0 1 b_I Im_R Im_I dir Cosine im_I 32 8 32 32 32 32 32 32 32 im_R N Sine * * + + / N / + 0 1 =1
PAGE 38
30 There are several components namely sine, cosine, divider, adder, and multiplier inside this unit that are elaborated in detail in the later section of this chapter. 3.1.2 Octave Ring Power Function This function computes the power, Â‘Â’ of the filter that is necessary for the deconvolution procedure. The FT image) (y xf f His given as the input. The power spectrum is calculated from ) (y xf f H by integrating over five octaves. The log2 of each power is calculated. The set of values are then line-fitted, so that a slope, m can be derived. The linear relation here is the log2(power) plotted on the vertical axis against the octave number plotted on the horizontal axis, which is just labeled 1 through 5. The Â‘Â’ value is calculated from the slope value according to equation in [7]. This procedure is further split into two parts as: octave power measurement function line fitting function Figure 3.6 Octave Ring Power Function Im (f x, f y) Power spectrum of image Power calculation on 5 frequency octaves Log 2 (power) Line fit to determine slope
PAGE 39
31 3.1.2.1 Octave Power Measurement Function The octave power measurement function loads a dmap image, which is a mask with 5 concentric rings where each inner ringÂ’s width is half the thickness of its adjacent outer ring. The kval gets loaded with the octave number. There compare operation ensures that the octave indexes are within range. The power spectrum is calculated by squaring the absolute value of the Fourier-transformed image. The power calculation component is used to perform this. The values within the same octave region are (integrated) summed together. This summation is performed using an adder in the feedback loop. The log component gives the log10 of the five powers measured. The divider converts the log10 to log2. Figure 3.7 CDFG of the Octave Power Measurement Function Im_I + + l og 2 (pow) 0 Im_R Pow Log / kval 5 2 2 Pow < 0 1 Log 2 Power (kval)
PAGE 40
32 3.1.2.2 Line Fitting Function The line fitting module takes log-powers as input and performs linear fitting of the points and calculates the slope, m. The compare operation is used to get the absolute value of the slope. The Â‘ Â’ value is then calculated from the slope using the relation, = (m+2)/2 [7]. The components log and power of the octave power function that have been introduced in this section are discussed later in this chapter. Figure 3.8 CDFG of the Line Fitting Function i t __ Pow sc log 2 (pow) Real 2 / Negate + + / 2 2 st2 m 0 < 0 1 +
PAGE 41
33 3.1.3 Inverse Fourier Transform Function This function performs the actual filtering operation. The filter, d which is a radial distance map, is raised to the power that is received from the octave ring power module described in Section 3.1.2. The Fourier-transformed image from the 2D FT module is filtered using the filter, d. As the inputs are all in the frequency domain, the filtering is done by pixel-wise multiplication of the two inputs. Both the real and imaginary parts can be filtered independently. For ease of understanding, the deconvolution step has been merged in this function. The resulting filtered values are given as input to the 2D-FT module with direction as Â‘-1Â’, implying inverse FT. This results in the filtered image back in the image-domain. The obtained pixel values are real numbers with a wide range. Therefore, scaling these values is preferred for a presentable image. Figure 3.9 CDFG of the Convolution Function The last function is the byte-scaling function. As the operations performed on the real parts are identical to those performed on the imaginary parts, most of the discussion covers the transformation of real part. The byte-scaling function takes in the real-valued d 32 32 Power 0.5 Power 32 32 * 32 Im_I Im_R 32 32 32 im_Rout im_Iout IFT 32
PAGE 42
34 deconvolved image values and scales them within a given range. We convert them to 8bit integer values between 0 and 255. The minimum and maximum values of the input are first computed. The minimum value determination is similar; the compare is a less than operation instead of greater than. The byte-scale function shown in Figure 3.10 takes the minimum and maximum values and scales the image down to 8 bit values. The CDFG below is nothing but a representation of the equationb mx y + = ) (. The m and b values are first determined. Then, the image entered is scaled. Figure 3.10 CDFG of the Byte Scaling Function 3.2 The IEEE 754 Floating-Point Standard The IEEE 754 is a standard stipulated for representing binary floating-point arithmetic. The standard specifies four formats for representation. The four formats are singleprecision (32 bits), double-precision (64 bits), single-extended precision (43 bits), and maxval 255 32 32 __ * im out (bytescaled) minval __ minval 255 mval Im_R out Negate +
PAGE 43
35 double-extended precision (>=80 bits). The floating point modules provided in this section are IEEE 754 compliant. Floating point numbers are usually a multiple of the size of a word [38]. In our architecture, we widely use the IEEE single precision format which is a 32-bit floating point format that can express numbers from -3.4e38 to 3.4e38. The 32-bit floating point factoring consists of 1-bit sign, 8-bit exponent, and 23-bit mantissa as shown in Figure 3.11. 31 30 23 22 0 S Exponent (8 bits) Significand (23 bits) Figure 3.11 IEEE Format of Floating Point Numbers Of the 32 bits, bit 31 represents the sign bit, bits 23 through 30 comprise the exponent portion and the remaining bits 0 through 22 are the significand bits. The value of the factoring is given by, E sM 2 * ) 1 (ÂŠ, where s is the sign bit value (1 means a negative value), E is the 127-biased integer exponent value ranging from 2-127 to 2+128(0 to 255) and M is the 23-bit mantissa with a hidden 1 [38]. The binary mantissa bits are used to construct the fractional mantissa value by the Equation 3.3 : = ÂŠ ÂŠ+ =23 1 ) 23 (2 1i i im M (3.3) where. 2 ... 2 2 2 2 123 0 5 19 4 20 3 21 1 22 ÂŠ ÂŠ ÂŠ ÂŠ ÂŠ+ + + + + + = m m m m m M 3.3 VHDL Model of Components This section discusses the implementation of the mathematical components that were implemented to add to the library. All these components are designed to work with 32-bit floating point arithmetic. A detailed discussion of these components has been given below. The floating point adder, subtractor, multiplier and divider have been adapted
PAGE 44
36 from the UCI High Level Synthesis benchmark suite [39]. The following references have some of the best explanations of trigonometric functions [40,41] implemented in this section. 3.3.1 Floating Point Addition/Subtraction This unit performs addition or subtraction of two floating point numbers. The inputs to the unit are two input operands (sign, exponent, and mantissa), a mode opcode indicating operation as add, subtract or idle, and a clock. The outputs are the result (sign, exponent and mantissa) and a set of flags indicating special cases of zero, positive and negative infinities [42]. The type of operation defines whether addition or subtraction should be done on the operands. The idle mode is used to maintain the previous result at the output. The adder/subtractor unit is designed in such a way that they can accept exponents as biased-127 mode, that is the actual exponent is 127 less than the representation. The mantissa is represented in the hidden 1 representation. So, the set of values that the operands can take is sign bit 1 or 0, exponent value 0 to 255, and mantissa part is a combination of 1Â’s and 0Â’s except all 23 bits as 0Â’s. The unit is validated to perform both typical test cases and boundary conditions. The steps involved in a floating point addition/ subtraction are as follows: The decimal point of the inputs should be aligned properly to perform addition or subtraction. This is done by shifting the significand of the smaller number to the right until the exponent value matches with that of the larger number. As the exponent value of both the numbers is same, addition or subtraction is done on the two significand values. The result is then corrected by shifting so that it takes the normalized form. The modified exponent is checked for overflow or underflow. The sign of the result is obtained from the sign bit logic depending on the sign of the operands and the operation involved. The functional block of a floating point adder/subtractor is shown in Figure 3.12.
PAGE 45
37 Figure 3.12 Functional Block of a Floating Point Adder/Subtractor 3.3.2 Floating Point Multiplication This unit multiplies two floating point numbers. The inputs to the unit are two input operands (sign, exponent, and mantissa for each operand), an opcode indicating multiply stage or idle stage, and a clock. The outputs are the result (sign, exponent and mantissa) and a set of flags indicating special cases of zero, positive and negative infinities and nota-number [42]. The multiply operation computes the outputs and the idle operation maintains the previous result at the output. The adder/subtractor unit is designed in such a way they can accept exponents as biased-127 mode, that is the actual exponent is 127 less than the representation. The mantissa is represented in the hidden 1 representation. The set of values that the operands can take is sign bit 1 or 0, exponent value 0 to 255 and mantissa part is the any combination of 1Â’s and 0Â’s except all 23 bits as 0Â’s. The unit is validated to perform both typical test cases and boundary conditions. S1 E1 M1 S2 E2 M2 Compare Shift E by D 8 8 D S out E out M out 8 Sign Logic Normalize 24 bit Add/Sub 00 00 Add/ Sub Op M sign
PAGE 46
38 The following are the sequence of steps followed while performing multiplication: The exponents of both the numbers are summed. The bias/exponent offset ) 2 (1 ÂŠn is subtracted from the sum to get the new exponent. The significands are multiplied using and integer multiplier to get the product. The resulting significand is normalized if required, by shifting to the right and increasing the exponent, such that the MSB of the mantissa is 1. This normalized value is checked for overflow and underflow. The significand is then rounded to the number of bits. The sign of the product depends on the original operands. The product is positive of both the operands are same else the product is negative. The functional block of a floating point multiplier is shown in Figure 3.13. Figure 3.13 Functional Block of a Floating Point Multiplier The data information and the instruction set are similar to the floating point adder/subtractor. The model has been tested for both straightforward and boundary conditions. It has also been tested for operands with positive and negative infinity, and not-a-number boundary conditions. S1 E1 M1 S2 E2 M2 S out E out M out Add Sub Offset (2 n 1 ) XNO R Integer multiply 23 23 8 8 8 8 8 1 23
PAGE 47
39 3.3.3 Floating Point Division This unit divides two floating point numbers. The input and output represented using the same conditions as the floating point adder and multiplier units. The division of two floating point numbers involves the following steps: The dividend mantissa is divided by the mantissa of the divisor using a fixed point divider. The exponent of divisor is subtracted from that of the dividend. The resulting significant is normalized if required by shifting to the right and increasing the exponent, such that the MSB of the mantissa is 1. The normalized value is checked for overflow and underflow. The sign of the result depends on the sign of the two input operands. The product is positive of both the operands are same else the product is negative. This is performed using an xnor gate. The significand is then rounded to the number of bits. Figure 3.14 Functional Block of a Floating Point Divider S out E out M out Normalize Unit Subtract Exponent Integ er Divider Divide by Zero Unit 8 8 8 23 25 23 8 23 S2 E2 M2 S1 E1 M1 Divisor Dividend Quotient
PAGE 48
40 3.3.4 Sine Function In this component the sine of the number given in radians is computed. The sine function for any given numeric x is given by ... 7 5 3 ) (7 5 5+ ÂŠ + ÂŠ = x x x x x Sin (3.4) First the given number is checked for the quadrant it belongs. The routine decomposes the x value into a value between 0 and /2. The sign value also gets altered corresponding to the quadrant in which x belongs. The following formulae are used to decompose the x values. ) ( ) ( ) ( ) 2 ( ) ( ) ( ) ( ) ( x Sin x Sin x Sin x Sin x Sin x Sin x Sin x Sin ÂŠ = = + ÂŠ = + ÂŠ = ÂŠ (3.5) Figure 3.15 Functional Block of a Sine Component xx 1 2a Sign * / + + + 2 k 1 Loop k-1 times Signum x value kfac
PAGE 49
41 The series of the given numeric is calculated using the Equation 3.4. The accuracy variable, k determines the number of iterations to perform and is also reused in calculating the factorial. In the case of the sine function the register k is initialized to 3 as, the power function and factorial computation use this value. The k variable is incremented by two for each iteration, as the series uses only odd powers. The result from the previous step is looped back to current iteration enabling summation in the series. There is accuracy logic in the component that stops the feedback once the present result equals the already computed result from previous iteration. Then the result is multiplied by the sign that was determined during the quadrant decomposition. The input and output of the Sine component is a 32 bit floating point number (IEEE 754 format). The intermediate values that are used in the component are of type real. 3.3.5 Cosine Function This component computes the cosine of the given number in radians. The cosine function for any given numeric x is given by ..... 6 4 2 1 ) (6 4 2+ ÂŠ + ÂŠ = x x x x Cos (3.6) The cosine function is similar to the sine other than changes that are mentioned below by Equation 3.7. The given value, x is decomposed into a value between 0 and /2 using the formulae given below. ) ( ) ( ) ( ) 2 ( ) ( ) ( ) ( ) ( x Cos x Cos x Cos x Cos x Cos x Cos x Cos x Cos ÂŠ ÂŠ = = + ÂŠ = + = ÂŠ (3.7) The series is calculated using the Equation 3.6. In cosine, the variable k is initialized to 2 as, the power function and factorial computation use even power and factorial values. The k variable is incremented by 2 during every cycle. The result from the previous step is stored in a register and looped back to current iteration enabling summation in the series. There is accuracy logic in the component that stops the feedback once the present result equals the already computed result from previous iteration. Then the result is
PAGE 50
42 multiplied by the sign that was determined during the quadrant decomposition. The block diagram of a cosine function is given in Figure 3.15. 3.3.6 Log Function The log function calculates the natural logarithm of any given number x The log function is given by the equation ....) 7 5 3 ( 2 ) 2 ln( ) ln(7 5 3+ + + + ÂŠ = za za za za M x (3.8) First, the input is checked if it lies between 0 and 1. If the input is between this range, the logarithmic value of x 1is calculated and is multiplied by -1 to give the result. If the input number x is greater than 1, then the M value that satisfies the relation xM> 2 is determined. In Equation 3.8, ) 1 /( ) 1 ( z z za + ÂŠ = where ) * 2 /( M x z =
PAGE 51
43 Figure 3.16 Functional Block of a Logarithm Unit xx xx 1 z i * / / + + 1 m 1 2 ln2 2 x in 1 + 1 1 / Sign * 2 / * ln n newln 2 1 xx x
PAGE 52
44 3.3.7 Power Function The power function calculates the power value by raising a real number x to the power a where a is a real number. The input x is checked whether it is equal to zero. If x is equal to zero, then output is zero. If x is zero and the exponent value a is less than zero then the result is not valid. Figure 3.17 Functional Block of a Power Calculation Component The natural logarithmic value of x is found. The result is multiplied by a to give the power function a z that is given by the Equation 3.9 == a k k ak a z z0! ) log( (3.9) l Log * z a i / z + a a *
PAGE 53
45 3.3.8 Negate Function The negate function is a simple function which is used to change the sign of the input. The input to the negate component is a 32 bit IEEE 754 number. The 32nd bit denotes the sign of the number. The sign bit is checked for the negating the given number. i.e. if the sign bit is 1 then it is changed to zero and vice versa. 3.3.9 Real to Integer Function The Real to integer conversion function is used to convert a 32 bit I EEE 754 floa ting point number to a 20 bit unsigned integer. This component is broken down into two subcomponents. One of the components is used to convert a 32 bit IEEE 754 format number to a decimal and the second sub-component is used to convert the decimal to an unsigned 20 bit integer. The exponent part is calculated from bits 23 to 30 by performing a binary to integer conversion. The actual exponent is then calculated by subtracting the bias of 127 and takingonentexp 2 The mantissa part is calculated by multiplying the bits 22 to 0 with values from 2-1 to 2-23. If the sign bit is Â‘0Â’ then the resultant decimal value is given as the product of the mantissa part and the exponent part else the result is negated. The decimal value is changed to an unsigned 20-bit integer by dropping the sign bit and changing to binary. 3.3.10 Integer to Real Function The integer to real function is used to convert a 20 bit integer to a 32 bit IEEE 754 format number. It is divided into two sub-functions. The first function is used to convert the 20 bit integer to a decimal value. This is done by binary to integer conversion of bits 0 to19. If the decimal number is less than zero then the sign bit is set to Â‘0Â’ else Â‘1Â’. The decimal number is normalized to base-2 scientific notation. This means that we must factor it into
PAGE 54
46 a number in the range (1 <= n < 2) and a power of 2. The power is added with 127 to find the exponent value. The binary representation of the exponent value is stored in exponent bits of the result. The fraction part is written in binary form the mantissa (bits 22 to 0). All these functions have been tested extensively, with a set of test vectors with boundary conditions. The modules also have been tested using test-benches with normal and boundary conditions. The test results have been included in Chapter 4. 3.4 High Level Synthesis Framework The design structure involves a comprehensive high-level synthesis (HLS) system (AUDI) that provides a mapping of an absolute behavioral description into a synthesizable register-transfer level (RTL) implementation. The design methodology shown in Figure 3.18 describes how the RT level is obtained from a set of specifications provided by the end user like the size of the input image, nature and size of the convolution filter, speed requirements etc. High level synthesis permits various ways to explore design flow by providing constraints like area, throughput or power. The VHDL RTL design has been simulated to validate the correctness of the design. The VHDL design in RT Level is simulated using several test images. These output images are compared to the IDL output images. The design flow can be organized into the following steps: The design entry and requirements are provided in the form of an IDL algorithm. A behavioral specification of the design is provided in behavioral VHDL. The VHDL code is converted into an AUDI Intermediate Format (AIF), using a translator. AUDI synthesizes the aif code into a structural design consisting of a datapath and controller that instantiate a component library of known structural elements. The component library has the floating point units such as adder/subtractor, multiplier, divider, sine, cosine, natural log, power, etc. Depending on requirements, different algorithms are used to make the design as small and/or as fast as possible, depending on what the design is being optimized for.
PAGE 55
47 The RTL design is simulated using the Cadence NCLaunch simulator and the correctness of design is verified. The VHDL simulator compiles this code into a format that can be checked using test vectors. Waveforms are used to check the validity of the design. The output images are compared with those of IDL. Figure 3.18 High Level Synthesis Output Image IDL implementation of Algorithm VHDL Behavioral model VHDL to AIF AUDI Data Path Controller VHDL Simulator Component Library image FP Add/Sub FP Multiplier FP Divider Log Sine Cosine Power
PAGE 56
48 CHAPTER 4 EXPERIMENTAL RESULTS The deconvolution algorithm discussed in Chapter 2 has been tested on several images using the Behavioral model and the RT Level design and compared to actual results obtained from IDL. This chapter also includes component testing using test-benches. Finally the RTL simulation results are verified. The test-benches of the deconvolution algorithm consist of the raw image, dmap, and d (filter ) The raw image is in 8 bit pgm format that has values 0-255. The dmap is created as a mask to calculate the power in the specific octaves. In our case, there are 5 octaves. The filter, d used here is a characterized high pass filter raised to a power, These files vary with the size of the raw image. 4.1 IDL Results We use IDL 5.5 for our experiments. Irrespective of the size of the raw input image, the dimage and dmap are generated every time. The input images are pgm files and the output image which is originally real is byte-scaled between 0 and 255. This helps compare the results easily and also view images in a particular format. 4.2 Behavioral Model Results In the behavioral VHDL model, the dimage and dmap are created for every raw input. As the test cases used here are of the same size, these images are retained for use with other test images. So they are given as input to the model. The behavioral VHDL code has been verified and simulated using CADENCE Nclaunch simulator. The output images
PAGE 57
49 generated by the behavioral VHDL code are compared with output images generated by IDL. Figures 4.1(a) shows a raw ROI. The corresponding output images from IDL and VHDL are shown in Figures 4.1(b) and (c) respectively. A similar comparison for a second test image is shown in Figure 4.2. The raw image is shown in Figure 4.2 (a) and figures 4.2 (b) and (c) show the resultant IDL and VHDL outputs. (a) Raw ROI (b) IDL output (c) VHDL output Figure 4.1 Test Image 1
PAGE 58
50 (a) Raw ROI (b) IDL output (c) VHDL output Figure 4.2 Test Image 2
PAGE 59
51 (a) IDL result (b) HLS result Figure 4.3 Histogram Comparison of Image 1 (a) IDL result (b) HLS result Figure 4.4 Histogram Comparison of Image 2 A comparison of the IDL and VHDL outputs from Figures 4.1 and 4.2 are made using histograms. The histograms show the pixel distribution of the images. Figure 4.3(a) and 4.3(b) show the histograms of images in Figures 4.1(b) and 4.1(c) respectively. Similarly, Figure 4.4(a) and 4.4(b) show the histogram of images in Figures 4.2 (b) and 4.2(c). Analysis of these histograms shows that the hardware output is identical to the software driven output when considering eight-bit accuracy and shows only rounding errors at higher storage capacities.
PAGE 60
52 4.3 Test Results of Components The basic components, which have been discussed in Section 3.3, are validated. Testbenches have been written for each of the components. These validated components are instantiated using AUDI. Each waveform shows the inputs to the component, the output from the component, and the expected result. Floating point adder is discussed in figure 4.5. The sign, exponent and mantissa for input1, input2 and result are given separately for easier understanding. Similarly the floating point subtractor, multiplier and a divider are discussed in figure 4.6, 4.7 and 4.8 respectively. The sine function discussed in figure 4.9, is validated using an expected output signal. The cosine function is similar to the sine function and is given in figure 4.10. Sine and Cosine function have been used in one dimensional FT calculation. The log function is given in figure 4.11. The waveform of a power function is shown in figure 4.12. This function gives invalid output when both input1 and input2 are negative simultaneously. Waveforms of other components such as Real to integer converter, integer to real converter and negate functions have been shown in figure 4.13, 4.14 and 4.15 respectively. All the functions have been cross checked by comparing the result with the expected result.
PAGE 61
53 Figure 4.5 Floating Point Adder Waveform
PAGE 62
54 Figure 4.6 Floating Point Subtractor Waveform
PAGE 63
55 Figure 4.7 Floating Point Multiplier Waveform
PAGE 64
56 Figure 4.8 Floating Point Divider Waveform
PAGE 65
57 Figure 4.9 Sine Waveform
PAGE 66
58 Figure 4.10 Cosine Waveform
PAGE 67
59 Figure 4.11 Log Function Waveform
PAGE 68
60 Figure 4.12 Power Function Waveform
PAGE 69
61 Figure 4.13 Real to Integer Waveform
PAGE 70
62 Figure 4.14 Integer to Real Waveform
PAGE 71
63 Figure 4.15 Negate Function Waveform
PAGE 72
64 CHAPTER 5 CONCLUSIONS AND FUTURE RESEARCH In this thesis, we have successfully modeled an image processing algorithm for cancer detection, specifically in mammograms. The deconvolution algorithm has been implemented at behavioral level and validated using several test images. The working behavioral model has been synthesized using AUDI which gives a datapath and controller. The design has been simulated at RT Level and accuracy measures have been made for results from a few sample mammograms using Nclaunch simulator. The AUDI component library has been extended to handle floating point computations. These library components have been validated and implemented at RT-level. The research presented here will lead to easier implementation of other data intensive image processing algorithms with minimal enhancements of AUDI. An enhancement may be done to eliminate the data transfer problem resulting from large images by implementing pipelining in the design. Further extension to this thesis can be done by implementing this algorithm at the layout level. This will provide a completely portable image processing device that satisfies the real time needs of the quickly emerging medical imaging field at better speeds without a tradeoff in accuracy. Another suggestion for future research would be implementing the design using FPGAs, which support reconfigurable designs
PAGE 73
65 REFERENCES [1] I. T. Young, J. J. Gerbrands, and L. J. Van Vliet, Â“Fundamentals of Image Processing,Â” CRC Press LLC. ISBN 90 75691 01 7, 2000. [2] W. K. Pratt, Â“Digital Image Processing,Â” Wiley, 1991. [3] N. H. E. Weste, and K. Eshraghian, Â“Principles of CMOS VLSI design: a systems perspective,Â” Addison-Wesley Longman Publishing Co., Inc.,1985. [4] L. Wanhammar, Â“DSP Integrated Circuits,Â” Academic Press, Pages: 2 Â– 5, 1999. [5] A. De Gloria, P. Faraboschi, M. Olivieri, and E. Guidetti, Â“ASIC and board design of a high performance parallel architecture,Â” Euro ASIC Proceedings, 1992. [6] M. A. Wu, and T.F. Brennan, Â“ASIC applications in diagnostic imaging systems,Â” ASIC Seminar and Exhibit, Third Annual IEEE Pro ceedings, Pages T/5.1-T/5.4, September, 1990. [7] J. J. Heine, and R. P. Velthuizen, Â“Spectral analysis of full field digital mammography data,Â” Med Phys., May 2002. [8] AUDI homepage Â– http://vcapp.csee.usf.edu/~katkoori/kkweb/audi.html [9] P. J. Ashenden, Â“The Designer's guide to VHDL,Â” Morgan Kaufmann Publishers, Inc., 1995. [10] S. Mazor, and P. Langstraat, Â“A Guide to VHDL,Â” Kluwer Academic Publications, 1992. [11] J. R. Armstrong, Â“Chip-Level Modeling with VHDL,Â” Prentice-Hall, Inc., 1989. [12] D. E. Thomas, and P. Moorby, Â“The Verilog Hardware Description Language,Â” Kluwer Academic Publications, 1991.
PAGE 74
66 [13] M. J. S. Smith, Â“Application-Specific Integrated Circuits,Â” Addison-Wesley, Pages 16-18, 1997. [14] C. Gopalakrishnan, and S. Katkoori, Â“Behavioral Synthesis of Datapaths with Low Leakage Power,Â” IEEE International Symposium on Circuits and Systems, Volume 4 Pages 699-702, May 2002. [15] S. Gupta, and S. Katkoori, Â“Force-directed Scheduling for Dynamic Power Minimization,Â” IEEE Computer Society Annual Symposium on VLSI, Pages: 68 Â– 73, 2003. [16] C. Gopalakrishnan, and S. Katkoori, Â“Resource Allocation and Binding for Low Leakage Power,Â” 16th International Conference on VLSI Design, Pages: 297 Â– 302, Jan. 2003. [17] C. Tseng, and D.Siewiorek, Â“FACET: A Procedure for the Automated Synthesis of Digital Systems,Â” In Proceedings of 20th DAC, pages 566-572, June 1983. [18] C. Gopalakrishnan, Â“High Level Techniques for leakage Power Estimation and Optimization in VLSI ASICs,Â” Ph.D. dissertation, USF, Tampa, 2003. [19] American Cancer Society, Â“Breast Cancer Facts and Figures,Â” 1999-2000. Atlanta, GA, 2000. [20] R. Ferrini, E. Mannino, E. Ramsdell, and L. Hill, Â“Screening mammography for breast cancer: American College of Preventive Medicine practice policy statement,Â” Am J Prev Med 1996. [21] L. Tabar, A. Gad, L.H. Holmberg, U. Ljungquist, C. Fagerberg, and L. Baldetorp, Â“Reduction in mortality from breast cancer after mass screening with mammography,Â” Randomized trial from the Breast Cancer Screening Work Group of the Swedish National Board of Health and Welfare, 829-32, Lancet 1985. [22] J. J. Heine, and P. Malhotra,. Â“Mammographic tissue, breast cancer risk, serial image analysis, and digital mammography. II. Serial breast tissue change and related temporal influences,Â” Acad Radiol 2002. [23] E. Sickles, Â“Periodic mammographic follow-up of probably benign lesions: Results in 3,184 consecutive cases,Â” Radiology, 179(2):463-8. 215(2):554-62.
PAGE 75
67 [24] S. Muller, Â“Full-field digital mammography designed as a complete system,Â” European Journal of Radiology, 25Â–34, 1997.. [25] H. C. Burrell, D. M. Sibbering, A. R. Wilson, et al., Â“Screening interval breast cancers: mammographic features and prognostic factors,Â” Radiology 199:811Â–817, 1996. [26] A. S. Feig, and M. J. Yaffe, Â“Digital mammography,Â” RadioGraphics 18, 893Â–901. 1998. [27] E. D. Pisano, et al., Â“RadiologistsÂ’ preferences for digital mammographic display,Â” Radiology,:216:820-830, 2000. [28] E. D. Pisano, Â“Current status of full-field digital mammography,Â” Radiology 2000, 214:26Â–28, 2000. [29] GE Medical Systems, Senographe 2000D http://www.gemedicalsystemseurope.com/euen/rad/whc/products/cr_2000d_04.html [30] M. L. Giger, Â“Computer-aided diagnosis in radiology,Â” Acad Radiol 2001. [31] T. W. Freer, and M. J. Ulissey, Â“Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast centre,Â” Radiology; 220:781Â—786, 2001. [32] J. J. James, Â“The current status of digital mammography,Â” Clinical Radiology, 59 (1):1-10, January 2004. [33] M. Orhan Altan, Â“Digital Image Processing Techniques Photogrammetry IIdigital photogrammetry,Â” page 1-14 [34] A. Ganapathiraju, J. Hamaker, and J.P.A. Skjellum, Â“Contemporary view of FFT algorithms.Â” [35] J. W. Cooley, and J. W. Tukey, Â“An Algorithm for Machine Computation of Complex Fourier Series,Â” Math. Comp., vol. 19, pp. 297-301, April 1965.
PAGE 76
68 [36] J. J. Heine, S. R. Deans, and R. P. Velthuizen, "On the statistical nature of mammograms," Medical Physics 26, 2254-2265, 1999. [37 ] O. Brigham, Â“The Fast Fourier Transform and its Applications,Â” Prentice Hall, NJ, 1988. [38] J. L. Hennessy, and D.A. Patterson, Â“Computer Architecture Â– a quantitative approach,Â” Morgan Kaufman Publishers, 1990. [39 ] UCI Benchmark Suite, http://ftp.ics.uci.edu/pub/hlsynth/HLSynth95 [40] I. Koren, Â“Computer Arithmetic Algorithms,Â” Second Edition, A. K. Peters, Natick, MA, 2002. [41] B. Parhami, Â“Computer Arithmetic Algorithms and Hardware Designs,Â” 2000. [42] P.R. Panda, and N.D. Dutt, Â“High Level Synthesis Design Repository," Technical Report 95-04, University of California, Irvine, 1995.
PAGE 77
69 APPENDICES
PAGE 78
70 APPENDIX A ----Sine of 32-bit IEEE 754; Result is 32 bit library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; use IEEE.std_logic_textio.all; use STD.textio.all; entity sine is port (input:in std_logic_vector(31 downto 0); output: out std_logic_vector(31 downto 0)); end sine; architecture arch_sine of sine is BEGIN sine_full:process(input) variable acc:real := 200.0; constant pi: real := 3.1415926535897932384626433; variable xx,Pim2,pid2,value,x: real:=0.0; variable exp_value,mant,mant_value,xreal:real:=0.0; variable div2,signum,za,kfac,newza:real:=1.0; variable k: real:= 2.0; variable sign: real:= -1.0; variable bias: integer := 127; variable i,j,exp,exponent_final:integer:=0; variable texp,pow2: integer :=1; variable frac_part:std_logic_vector(31 downto 0); variable b_exp,i_exp :integer :=0; variable fraction1,tman:real:=0.0; variable my_line: LINE; BEGIN if(input(23)='1') then exp := 1; end if; for i in 24 to 30 loop pow2:=pow2*2; if (input(i)='1')then exp:= exp + pow2; end if; end loop; exponent_final:= exp bias; exp_value:= 2.0 ** exponent_final;
PAGE 79
71 Appendix A (Continued) --Breaking the Mantissa part for i in 22 downto 0 loop div2:=div2/2.0; if (input(i)='1')then mant:= mant + div2; end if; end loop; mant_value:= 1.0 + mant; --Final Computation if (input(31)='1') then xreal:=-1.0 mant_value exp_value; elsif (input(31)='0') then xreal:=mant_value exp_value; end if; -Sine Calculation Pim2:=2.0*pi; pid2:=pi/2.0; if xreal< 0.0 then xreal:=-xreal; signum:=-1.0; end if; if xreal>= Pim2 then div2:=real(integer(xreal/Pim2)); xreal:=xreal-(Pim2*div2); end if; if xreal < 0.0 then signum:=-1.0*signum; xreal:=-xreal; end if; while xreal > pi loop xreal:=xreal-pi; signum:=-1.0*signum; end loop; while xreal > pid2 loop xreal:=pi-xreal; end loop; za:=xreal; k:=3.0; while k < acc loop xx:=1.0; for j in 0 to integer(k)-1 loop xx:=xx*xreal; end loop;
PAGE 80
72 Appendix A (Continued) kfac:=kfac*(k-1.0)*k*sign; newza:=za+(xx/kfac); if newza = za then k:=acc; end if; za :=newza; k :=k+2.0; end loop; value:=za*signum; -Sine Calculation Ends if (value<=0.0) then value:=-value; frac_part(31):='1'; else frac_part(31):='0'; end if; -Change Real to Std logic if (value>=2.0) then while value>= 2.0 loop value:=value/2.0; b_exp:= b_exp+1; end loop; elsif(value<1.0 and value>0.0) then while value< 1.0 loop value:=value*2.0; b_exp:= b_exp-1; end loop; elsif(value<2.0 and value>=1.0) then b_exp:=0; else b_exp:=-127; end if; i_exp:=b_exp+127; fraction1:= value-1.0; --converting fraction to IEEE mantissa part tman:=1.0; for j in 22 downto 0 loop tman:=tman/2.0; if (fraction1>=tman)then fraction1:= fraction1-tman; frac_part(j):='1'; else frac_part(j):='0';
PAGE 81
73 Appendix A (Continued) end if; end loop; --converting to binary for i in 23 to 30 loop texp:=i_exp/2; if ((i_exp-(texp*2))=0)then frac_part(i):='0'; else frac_part(i):='1'; end if; i_exp:=i_exp/2; end loop; output<=frac_part; END PROCESS; END arch_sine; ----Cosine of 32-bit IEEE 754; Result is 32 bit library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; entity cosine is port (input:in std_logic_vector(31 downto 0); output: out std_logic_vector(31 downto 0)); end cosine; architecture arch_cosine of cosine is BEGIN cosine_full:process(input) variable acc:real := 200.0; constant pi: real := 3.1415926535897932384626433; variable xx,Pim2,pid2,value,fraction1,tman: real:=0.0; variable exp_value,mant,mant_value,xreal:real:=0.0; variable div2,signum,za,kfac,newza:real:=1.0; variable k: real:= 2.0; variable sign: real:= -1.0; variable bias: integer := 127; variable i,j,exp,exponent_final:integer:=0; variable texp,pow2: integer :=1; variable b_exp,i_exp :integer :=0; variable frac_part:std_logic_vector(31 downto 0):="00000000000000000000000000000000";
PAGE 82
74 Appendix A (Continued) BEGIN --Cosine Function Starts Pim2:=2.0*pi; pid2:=pi/2.0; if xreal < 0.0 then xreal:=-xreal; end if; if xreal>= Pim2 then div2:=real(integer(xreal/Pim2)); xreal:=xreal-(Pim2*div2); end if; while xreal > pi loop xreal:=xreal-pi; signum:=-1.0*signum; end loop; while xreal > pid2 loop xreal:=pi-xreal; signum:=-signum; end loop; while k < acc loop --accuracy iterations=20 xx:=1.0; for j in 0 to integer(k)-1 loop xx:=xx*xreal; end loop; kfac:=kfac*(k-1.0)*k*sign; newza:=za+(xx/kfac); if newza = za then k:=acc; end if; za :=newza; k :=k+2.0; end loop; value:=za*signum; output <= frac_part; END PROCESS; END arch_cosine;
PAGE 83
75 Appendix A (Continued) ----Log of 32-bit IEEE 754; Result is 32 bit library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; entity logfn_stdlogic is port (input:in std_logic_vector(31 downto 0); output:out std_logic_vector(31 downto 0)); end logfn_stdlogic; architecture arch_logfn_stdlogic of logfn_stdlogic is BEGIN log_fn_stdlogic:process(input) constant pi: real := 3.1415926535897932384626433; constant ln2:real:= 0.69314718055994530941723212; variable div2,xx,sign:real:=1.0; variable z,zeta,zetasup2,newln,m,n,ln,fraction1,tman:real:=0.0; variable exp_value,mant,mant_value,xreal:real:=0.0; variable texp,pow2,j:integer:=1; variable acc:integer:=200; variable bias: integer := 127; variable b_exp,i_exp,i,exp,exponent_final:integer:=0; variable frac_part:std_logic_vector(31 downto 0); BEGIN -Main logarithm calculation if xreal>0.0 and xreal<1.0 then xreal:=1.0/xreal; sign:=-1.0; end if; if xreal >= 1.0 then while xx<=xreal loop xx:=xx*2.0; m:=m+1.0; end loop; m:=m-1.0; z:=xreal/(xx/2.0); zeta:=(1.0-z)/(1.0+z); n:=zeta; ln:=zeta; zetasup2:=zeta*zeta; while j
PAGE 84
76 Appendix A (Continued) xx:=sign*(m*ln2-2.0*newln); if newln = ln then j:=acc; end if; ln:=newln; j:=j+1; end loop; end if; output<=frac_part; END PROCESS; END arch_logfn_stdlogic; ----Power of two 32-bit IEEE 754(x^a); Result is 32 bit library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; use IEEE.std_logic_textio.all; use STD.textio.all; entity powflt_stdlogic is port (input1:in std_logic_vector(31 downto 0); input2:in std_logic_vector(31 downto 0); output:out std_logic_vector(31 downto 0)); end powflt_stdlogic; architecture arch_powflt_stdlogic of powflt_stdlogic is BEGIN powflt_full:process(input1,input2) constant pi:real:= 3.1415926535897932384626433; constant ln2:real:= 0.69314718055994530941723212145818; variable bias: integer := 127; variable pow2,j:integer:=1; variable i,r,b_exp,i_exp,exp,exponent_final:integer:=0; variable acc:integer:=200; variable div2,xx,sign,za,kfac,l,aa:real:=1.0; variable lz,newza,zlg,zeta,zetasup2,newln,ln,m,n:real:=0.0; variable exp_value,mant,mant_value,xreal,areal:real:=0.0; variable fraction1,tman:real:=0.0; variable frac_part:std_logic_vector(31 downto 0):="00000000000000000000000000000000"; variable my_line: LINE;
PAGE 85
77 Appendix A (Continued) BEGIN if xreal=0.0 then za:=0.0; elsif areal=0.0 then za:=1.0; else if xreal>0.0 and xreal<1.0 then xreal:=1.0/xreal; sign:=-1.0; end if; if xreal >= 1.0 then while xx<=xreal loop xx:=xx*2.0; m:=m+1.0; end loop; m:=m-1.0; zlg:=xreal/(xx/2.0); zeta:=(1.0-zlg)/(1.0+zlg); n:=zeta; ln:=zeta; zetasup2:=zeta*zeta; while j
PAGE 86
78 Appendix A (Continued) output<=frac_part; END PROCESS powflt_full; END arch_powflt_stdlogic; ----Negation of 32-bit IEEE 754; Result is 32 bit library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; use IEEE.std_logic_textio.all; use STD.textio.all; use IEEE.NUMERIC_STD. all ; entity negate_stdlogic is port (input:in std_logic_vector(31 downto 0); output: out std_logic_vector(31 downto 0)); end negate_stdlogic; architecture arch_negate_stdlogic of negate_stdlogic is BEGIN negate_full:process(input) variable xneg:std_logic_vector(31 downto 0):="00000000000000000000000000000000"; variable i:integer:=0; variable my_line: LINE; BEGIN if(input(31)='1') then xneg(31):= '0'; else xneg(31):='1'; end if; for i in 30 downto 0 loop xneg(i):=input(i); end loop; output<=xneg; END PROCESS; END arch_negate_stdlogic;
PAGE 87
79 Appendix A (Continued) ---Toint : convert s 32 bit IEEE 754 to a 20 bit integer library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; use IEEE.NUMERIC_STD. all ; entity toint is port (input:in std_logic_vector(31 downto 0); output: out std_logic_vector(19 downto 0)); end toint; architecture arch_toint of toint is BEGIN process(input) variable bias: integer := 127; variable exp_value,mant,mant_value,xreal:real:=0.0; variable exp,exponent_final,temp:integer:=0; variable div2:real:=1.0; variable pow2: integer :=1; variable i,r:integer:=0; variable binval:std_logic_vector(19 downto 0); BEGIN temp:=integer(abs(xreal)); for i in 0 to 19 loop r:=temp/2; if ((temp-(r*2))=0)then binval(i):='0'; else binval(i):='1'; end if; temp:=temp/2; end loop; output<=binval; END PROCESS; END arch_toint;
PAGE 88
80 Appendix A (Continued) ---Toreal : convert s 20-bit unsigned integer to 32-bit IEEE 754 library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; use IEEE.NUMERIC_STD. all ; entity toreal is port (input:in std_logic_vector(31 downto 0); output: out std_logic_vector(31 downto 0)); end toreal; architecture arch_toreal of toreal is BEGIN to_real:process(input) variable bias: integer := 127; variable frac_part,temp_input :std_logic_vector(31 downto 0); variable decinum_1,b_exp,i_exp :integer :=0; variable tval,texp,bytwo:integer:=1; variable decinum_real,fraction1,tman:real:=0.0; BEGIN if(input(0)='1') then decinum_1 := 1; end if; for j in 1 to 19 loop tval:=tval*2; if(input(j)='1') then decinum_1:= decinum_1+tval; end if; end loop; --Decimal to 32 bit real decinum_real:=real(decinum_1); if (decinum_real<0.0) then decinum_real:=-decinum_real; frac_part(31):='1'; else frac_part(31):='0'; end if; output<=frac_part; END PROCESS; END arch_toreal;
PAGE 89
81 APPENDIX B ----Deconvolution Algorithm : Behavioral VHDL code PACKAGE fft_pkgin512 is constant band : integer := 5; constant MAX : integer := 262144; constant P : integer := 512; constant PI : real := 3.1415926535; type maxarray is array (0 to MAX-1) of real; type bandintarray is array (0 to BAND) of integer; type maxintarray is array (0 to MAX-1) of integer; type bandarray is array (1 to BAND) of integer; type band1array is array (0 to BAND) of real; type marray is array (0 to P-1) of real; END fft_pkgin512; library IEEE; use IEEE.STD_LOGIC_1164.all; use IEEE.STD_LOGIC_arith.all; USE work.fft_pkgin512.all; USE IEEE.math_real.all; USE std.standard.all; entity mainfftin512 is PORT ( im_R_in : in integer; dmap_in : in integer; d_in : in integer; oarr_out : out integer); end mainfftin512; architecture arch_fft512 of mainfftin512 is BEGIN main_fftin512:process variable im_R,im_I,d,im_ps : maxarray; variable i,j,k,nn,base,count,kval,sc,tmpd, dir,x1,x2: integer; variable dmap : maxintarray; variable m,st2,maxval,minval,mval,bval,alpha,power,x3,rp: real:=0.0; variable const_R,const_I,arg : real:=0.0; variable t : bandarray; variable logpow,powr : band1array; variable c_R,c_I : maxarray; variable d_R, d_I,e_R,e_I : marray;
PAGE 90
82 Appendix B (Continued) BEGIN for i in 0 to MAX-1 loop im_R(i) := real(im_R_in); dmap(i) := dmap_in; d(i) := real(d_in); end loop; for i in 0 to band LOOP powr(i):=0.0; end LOOP; for i in 0 to MAX-1 LOOP im_I(i):=0.0; end LOOP; --Forward FT starts --FT begins dir:=1; rp:=real(P); for i in 0 to P-1 LOOP base:=P*i; count :=0; for j in base to base+P-1 LOOP e_R(count):=im_R(j); e_I(count):=im_I(j); count:=count+1; end LOOP; for j in 0 to P-1 LOOP arg:= -((2.0*PI)*(real(dir)*real(j)))/rp; d_R(j):= 0.0; d_I(j):= 0.0; for nn in 0 to P-1 LOOP x3 := (arg*real(nn)); const_R:=cos(x3); const_I:=sin(x3); d_R(j):=d_R(j)+ ((e_R(nn) const_R) (e_I(nn) const_I)); d_I(j):=d_I(j)+ ((e_R(nn) const_I) + (e_I(nn) const_R)); end LOOP; if dir = 1 then d_R(j) := d_R(j)/rp; d_I(j) := d_I(j)/rp; end if; end LOOP; k:=base;
PAGE 91
83 Appendix B (Continued) for count in 0 to P-1 LOOP im_R(k):=d_R(count); im_I(k):=d_I(count); k:=k+1; end LOOP; --FT ends end LOOP; for i in 0 to P-1 LOOP base:=P*i; for j in 0 to P-1 LOOP x1 := base+j; x2 := ((P*j)+i); c_R(x2):=im_R(x1); c_I(x2):=im_I(x1); end LOOP; end LOOP; --FT starts for i in 0 to P-1 LOOP base:=P*i; count:=0; for j in base to base+P-1 LOOP e_R(count):=c_R(j); e_I(count):=c_I(j); count:=count+1; end LOOP; for j in 0 to P-1 LOOP arg:= -(2.0*PI*real(dir)*real(j))/rp; d_R(j):= 0.0; d_I(j):= 0.0; for nn in 0 to P-1 LOOP x3 := (arg*real(nn)); const_R:=cos(x3); const_I:=sin(x3); d_R(j):=d_R(j)+ ((e_R(nn) const_R) (e_I(nn) const_I)); d_I(j):=d_I(j)+ ((e_R(nn) const_I) + (e_I(nn) const_R)); end LOOP; if dir = 1 then d_R(j) := d_R(j)/rp; d_I(j) := d_I(j)/rp; end if; end LOOP; k:=base;
PAGE 92
84 Appendix B (Continued) for count in 0 to P-1 LOOP c_R(k):=d_R(count); c_I(k):=d_I(count); k:=k+1; end LOOP; end LOOP; for i in 0 to P-1 LOOP base:=P*i; for j in 0 to P-1 LOOP x1 := base+j; x2 := ((P*j)+i); im_R(x2):=c_R(x1); im_I(x2):=c_I(x1); end LOOP; end LOOP; --FT ends --Forward FT ends for i in 0 to P-1 LOOP for j in 0 to P-1 LOOP x1 := ((P*i)+j); im_ps(x1) := ((im_R(x1)**2)+(im_I(x1)**2)); end LOOP; end LOOP; for i in 0 to P-1 LOOP for j in 0 to P-1 LOOP x1 := ((P*i)+j); kval := dmap(x1); if kval <= 5 then powr(kval):=powr(kval)+im_ps(x1); end if; end LOOP; end LOOP; for i in 1 to band LOOP powr(i) := powr(i) 4.0; logpow(i) := log(powr(i))/log(2.0); end LOOP; sc := 3; for i in 1 to band LOOP t(i) := i-sc; m := m + real(t(i))*logpow(i); st2 := st2+ (real(t(i)*t(i))); end LOOP;
PAGE 93
85 Appendix B (Continued) m := m/st2; if m < 0.0 then m := -m; end if; alpha := (m +2.0)/2.0; power := alpha*0.5; for i in 0 to P-1 LOOP for j in 0 to P-1 LOOP x1 := ((P*i)+j); d(x1) := d(x1)**power; end LOOP; end LOOP; for i in 0 to P-1 LOOP for j in 0 to P-1 LOOP x1 := ((P*i)+j); im_R(x1) := (im_R(x1))*(d(x1)); im_I(x1) := (im_I(x1))*(d(x1)); end LOOP; end LOOP; --Inverse FT starts --IFT begins dir:=-1; for i in 0 to P-1 LOOP base:=P*i; count :=0; for j in base to base+P-1 LOOP e_R(count):=im_R(j); e_I(count):=im_I(j); count:=count+1; end LOOP; for j in 0 to P-1 LOOP arg:= -((2.0*PI)*(real(dir)*real(j)))/rp; d_R(j):= 0.0; d_I(j):= 0.0; for nn in 0 to P-1 LOOP x3 := (arg*real(nn)); const_R:=cos(x3); const_I:=sin(x3); d_R(j):=d_R(j)+ ((e_R(nn) const_R) (e_I(nn) const_I)); d_I(j):=d_I(j)+ ((e_R(nn) const_I) + (e_I(nn) const_R)); end LOOP; end LOOP;
PAGE 94
86 Appendix B (Continued) k:=base; for count in 0 to P-1 LOOP im_R(k):=d_R(count); im_I(k):=d_I(count); k:=k+1; end LOOP; --IFT ends end LOOP; for i in 0 to P-1 LOOP base:=P*i; for j in 0 to P-1 LOOP x1 := base+j; x2 := ((P*j)+i); c_R(x2):=im_R(x1); c_I(x2):=im_I(x1); end LOOP; end LOOP; --IFT starts for i in 0 to P-1 LOOP base:=P*i; count:=0; for j in base to base+P-1 LOOP e_R(count):=c_R(j); e_I(count):=c_I(j); count:=count+1; end LOOP; for j in 0 to P-1 LOOP arg:= -(2.0*PI*real(dir)*real(j))/rp; d_R(j):= 0.0; d_I(j):= 0.0; for nn in 0 to P-1 LOOP x3 := (arg*real(nn)); const_R:=cos(x3); const_I:=sin(x3); d_R(j):=d_R(j)+ ((e_R(nn) const_R) (e_I(nn) const_I)); d_I(j):=d_I(j)+ ((e_R(nn) const_I) + (e_I(nn) const_R)); end LOOP; end LOOP; k:=base; for count in 0 to P-1 LOOP c_R(k):=d_R(count); c_I(k):=d_I(count);
PAGE 95
87 Appendix B (Continued) k:=k+1; end LOOP; end LOOP; --IFT ends for i in 0 to P-1 LOOP base:=P*i; for j in 0 to P-1 LOOP x1 := base+j; x2 := ((P*j)+i); im_R(x2):=c_R(x1); im_I(x2):=c_I(x1); end LOOP; end LOOP; --Inverse FT ends maxval := 0.0; minval := 0.0; for i in 0 to MAX-1 LOOP if im_R(i) > maxval then maxval := im_R(i); elsif im_R(i) <= minval then minval := im_R(i); end if; end LOOP; mval := 255.0/(maxval-minval); bval := -(mval*minval); for i in 0 to MAX-1 LOOP tmpd := integer(mval*im_R(i)+bval); oarr_out <= tmpd; end LOOP; END PROCESS; END arch_fft512;