USFDC Home  USF Electronic Theses and Dissertations   RSS 
Material Information
Subjects
Notes
Record Information

Full Text 
PAGE 1 Statistical Idealities and Expected Realities in the Wavelet Techniques U sed for Denoising by Eric Jan D. DeNooyer A thesis submitted in partial fulfillment of the requirements for the degree of Master of Arts Department of Mathematics Col lege of Arts and Sciences University of South Florida Major Professor: Catherine B n teau, Ph.D. Arthur A. Danielyan, Ph.D. Yuncheng You, Ph.D. Date of Approval: April 12 2010 Keywords: expected value, filters, Fourie r s eries, mean square error, w avelet shrinkage Copyright 2010, Eric Jan D. DeNooyer PAGE 2 Acknowledgements I owe a great deal of gratitude and thanks to one person in particular, my devoted mother, Greet je H. Roos Brown. She, more than any other person in my life has given me a great deal of love and patience She took the time to understan d me when I needed to be understood She remained faithful to my potential even when my progress in life seemed uncertain I also need to give her a tremendous amount of credit for being one of the most emotionally honest and princ ipled people I know. I t is by these qualities I believe, that her children gre w up to become the self respecting individuals they are today Most importantly, it allowed me to find the faith that honesty and good ness is something that truly can be believe d in Through all of this, I've become a strong er and more patient person. One who is better able to endure those difficult times Mom I thank you and I love you I want to give a special thanks to my advisor, Dr. Catherine B n teau. Her patient support and help as we work ed together on this thesis will always be greatly appreciated. I will take from this experience one important value that she has consistently reinforced throughout : always endeavo r to truly understand the depth of what we we re studying Dr. B n teau, thank you for making me think more critically. I am also grateful for all the enjoyable learning experiences that I've had with the mathematics professors here at the U niversity of S outh F lorida (USF). I took my first math class here as a non degree seeking student back in 1995 At that time I was refreshing my math skills that I had not practiced for six years ( since receiving my undergraduate degree in Mechanical Engineering fro m Rensselaer Polytechnic Institute in Troy, New York ) The class that I took back then was Calculus I with Dr. Yuncheng You. Dr. You suggested to me already then that I should think about working towards a PAGE 3 career in teaching mathematics for a communit y college. After finishing my non degree goals at USF in 1996, i t would not be for another nine years that would I return ed to USF in 200 5 ready to see the wisdom of Dr. You's vision through. Upon my return, t he first class that I would t a k e would be B ridge to Abstract Mathematics with Dr. Arthur A. Danielyan. It seems quite fateful that I would invite these two professors to be committee members on my thesis I thank them kindly for their willingness to give their valuable time towards this last ste p in helping me realize an important professional accomplishment. Finally, I would like to thank Dr. An y a Sebasti e n, and the rest of the administration at St. Petersburg College 's Seminole Campus in Seminole, Florida for offering me that first opportunit y to teach preparatory mathematics in a college environment back in 2004 It opened the door for a renewed chance at achieving something I truly can be proud of. PAGE 4 i Table of Contents List of Tables i i i List of Figures i v Abstract v i 1 Preliminari es: Mathematical Spaces, Transformations, and the Fourier Series 1 1.1 Mathematical Spaces 1 1.2 Transforming Data 3 1.3 The Fourier Series 5 2 Wavelets and T heir Origins 1 4 2.1 Alfrd Haar and the Theory of the Systems of Orthogonal Functions 1 4 2.2 Jean Morlet: Looking for Oil, and Finding Wavelets 1 5 2.3 Yves Meyer & Stphane Mallat: Fostering the Father of a Theory 2 4 2.4 Wavelets as Filters: An Introduction to Signal Processing 2 7 3 The Statistics of Noise, Error, & Estimation 41 3.1 Sta tistic s: The Essentials 41 3.2 Noise: A Statistical Characterization 5 3 3. 3 Estimation: Decisions that Consider the Statistical Errors of Loss & Risk. 5 5 PAGE 5 ii 4 Denoising, Thresholding, and an Ideal istic Oracle 62 4.1 Denoising: An Overview. 62 4.2 L owpass Filtering: Muffling the Noise 6 3 4. 3 Highpass Filtering: The Highlights, and Less Noise too 6 6 4. 4 Wavelet Shrinkage Part I : Of Ideals, Oracles, & Great Expectations 6 7 4. 5 Wavelet Shrinkage Part II: A Tale of Two Errors. 8 6 4. 6 Illustrativ e Example: Denoising a Corrupted Sine Function. 92 4. 7 Summary & Conclusions 101 References 103 Bibliography 10 5 Appendices 1 09 Appendix A: Selected Mathematica Code 1 1 0 PAGE 6 ii i List of Tables 4.1 Wavelet shrinkage thresholding functions 6 9 4. 2 Compa rison of ideal and non ideal thresholding decisions 7 4 4.3 MSE calculations for various individual signal strengths, g i 7 7 PAGE 7 iv List of Figures 1 .1 A graph of the sawtooth function, a periodic linear function ( p = 2). 11 1. 2 The Fourier series appro ximation s of the sawtooth function ( n = 1, 3, 5). 13 2.1 The wavelet series approximation s of the sawtooth function ( n = 1, 0 1 ). 2 2 2 2 Modulus plots of the Haar lowpass and highpass filters. 3 0 2. 3 Comparing an original picture to some of its lowpass a nd highpass data 3 9 2 4 Complete image before and after a Haar wavelet transformation 3 9 3. 1 Probability distribution function of the standard normal distribution. 52 3. 2 Linear l og p lot s of s ome colored n oise spectral p ower d ensities 5 3 4. 1 Noise level reductions of the Haar lowpass filter 6 5 4 2 Noise before and after a Haar wavelet transformation 6 7 4 3 MSE( ) from ideal thresholding normalized to noise level 71 4.4 S ignal vector g i = 1.5 overlaid with a distribution of noisy observations y i 7 6 4.5 Comparison of expected to ideal t hresholding at = 7 8 4.6 Effects on expected when changing threshold: = 0 0.8 1.0 1.20 7 9 4.7 Comparison of two differently defined oracles. 81 4.8 Comparison of oracle 2 's ideal to oracle 1 and previous expectations. 82 4.9 Oracle 2 overlaid with the expectations of a hard thresholding rule, = . 83 4.10 Close up view of oracle 2 hard thresholding, and two g i 's, 2 g j < and 2 g k > 83 PAGE 8 v 4. 11 Effects of an increasing on different strengths of g i 's. 8 5 4. 12 Close up view of Figure 4.6, showing the e ffects of different 's on a g i 8 5 4. 13 The highpass data of j and k will be thresholded by the oracle. 91 4. 14 The highpass data of j and k will not be thresholded by the oracle. 91 4.15 y = sin x 0 x 2 92 4.16 PDF of the sine function 94 4.1 7 PDFs of various noisy sine functions. 9 5 4.1 8 The denoising of a corrupted sine function, before and after. 9 6 4.1 9 Finding the best for denoising a corrupted sine function with noise 9 7 4. 20 The lowpass and highpass data of a clean sine function. 9 8 4.2 1 Plots of the MSE for various levels of noise as varies from 0 to 5 9 9 4.2 2 PDF of the cosine function. The expected of the highpass data. 100 PAGE 9 vi Statistical Idealities and Expected Realities in the Wavelet Techn iques U sed for Denoising Eric Jan D. DeNooyer ABSTRACT In the field of signal processing one of the underlying enemies in obtaining a good quality signal is noise. The most common examples of signals that can be corrupted by noise are images and audi o signals Since the early 1980's, a time when wavelet transformations became a modernly defined tool statistical techniques have been incorporated into processes that use wavelets with the goal of maximizing signal to noise ratio s We provide a brief h istory of wavelet theory going back to Alfr d Haar's 1909 dissertation on orthogonal functions, as well as its important relationship to the earlier work of Joseph F ourier (circa 1801 ) which brought about t hat famous mathematical transformation, the Four ier series We demonstrate how wavelet theory can be used to reconstruct an analyzed function, ergo, that it can be used to analyze and reconstruct images and audio signals as well. Then, in order to ground the understand ing of the application of wavelet s to the science of denoising, w e discuss som e important concepts from statistics From all of these, w e introduce the subject of wavelet shrinkage a technique that combines wavelets and statistics into a thresholding scheme that effectively reduces no ise without doing too much damag e to the desired signal. Subsequently we discuss how the effectiveness of these techniques are measured, both in the ideal sense and in the expected sen se. We then look at an illustrative example in the application of one technique Finally, we analyze this example more generally, in accord ance with the underlyin g theory, and make some conclusions as to when wavelets are an effective technique in increasing a signal to noise ratio. PAGE 10 1 Chapter One Preliminaries : Mathema tical Spaces, Transformations and the Fourier Series 1.1 Mathematical Spaces The most basic notion of a space comes from our three dimensional view of the world we live in It is f rom these views that we can begin to understand that a space in the mat hematical sense, is not just a loose set of objects, but a set of objects with structure. O ne can c onstruct a description of a very simple space by using the rudimentary Euclidean objects of points Beyond this it is quite impressive to realize the sh eer number and complexities of the many different spaces that can be imagined, or rather di scovered So we quickly move through the hierarchies and head towards a mathematical space where the objects contained in it are functions We can start off st ructuring a set of k points by ordering them under the Cartesian coordinate system These points can be equivalently considered to be vectors i.e. v = ( v 1 v 2 . v k ) each v j The familiar operations of vector addition and scalar multiplicati on are defined, which gives us a vector space over ( equivalently a linear space ) We also define the important in ne r product operation Definition 1.1 Let v = [ v 1 v 2 ... v k ] T and w = [ w 1 w 2 ... w k ] T be two vector s in k The inner product v w is defined by With this inner product we now have an inner product space From here, we can define the idea of a norm for that space. PAGE 11 2 Definition 1.2 Let v be a vector in k The norm of v is given by With this norm we now have the classic Euclidean space Further, this norm allows us to define the concept of distance between two points,  v w , giving us the added characterization of qualifying a Euclidean space as a metric space We now move on from using simple points as our objects and work our way into including functions as our objects. To evolve a structure d space for functions, we need to introduce the concept of a algebra Definition 1.3 A collection M of subsets of a se t X is called a algebra in X if M has the following properties [ 18 p. 8] : (i) X M (ii) If A M then A M where A is the complement of A relative to X (iii) If and if A n M for n = 1, 2, 3, . ., then A M I f M is a algebra of subsets of X then this pair ( X M ) represents a measurable space If a subset A o f X is also an element in the algebra M then A is called a measurable set If a function f is a mapping from a measurable space X into a measurable space Y such that the pre image f 1 ( V ) is a measurable set in X for every measurable set V in Y then f is called a measurable function Th e se conditions above will give us a space that is measurable b ut how do we measure the size of sets in M ? We n ext define a special function we call it a measure that will relate a nonnegative number to any measurable set. That related number will indicate the size of the measured set. PAGE 12 3 Definition 1.4 A measure is a nonnegative function defined [ 18 p. 16] o n a algebra M satisfying = 0 and (1) for any sequence { A i } of disjoint measurable sets in M The property of in (1) is referred to as countable additivity If the algebra M has such a measure defined on it, then th e triple ( X M ) will be called a measure space In a way that is a nalogous to how we defined the inner product between two vectors we can also define an inner product between two measurable functions: Given this inner pr oduct, w e can now define [ 18 p. 65] an L 2 norm of a measurable function f as: provided that  f  2 < This norm gives us the function space L 2 ( X ). This space i s important to our discussion for denoising since the statis tical measures that we will be using later are defined in terms of the L 2 metrics. 1.2 Transforming Data Transformations stand out for their ability to preserve a problem's underlying structure while at the same time simplifying that problem's analysis and manipulation. One example of a simplification would be transform the graphic representation of logarithmically dependent data from a linear scale to a logarithmic scale. In such an example we are essentially changing the domain of the graph For a base 10 logarithm, that would mean mapping 1 to 0 (i.e. 10 0 ), 10 to 1 (10 1 ), 100 to 2 (10 2 ), etc. This really PAGE 13 4 points to the central purpose of transformation s, and that is to represent a particular problem in the most appropriate basis We don't have t o limit ourselves to just bases for exponents, w e al so have the bases for vector spaces. Definition 1.5 Let V be a linear space, and f 1 f 2 . f n V If f 1 f 2 . f n span V (meaning every f in V can be expressed as a linear combination of f 1 f 2 . f n ), and if f 1 f 2 . f n are linearly independent (meaning the equation c 1 f 1 + c 2 f 2 + . + c n f n = 0 is possible only if c 1 = c 2 = . = c n = 0), then we can say that the elements f 1 f 2 . f n form a basis for V [ 1 pp. 157 158] With such a basis we can represent every f V uniquely as a linear combination f = c 1 f 1 + c 2 f 2 + . + c n f n The coefficients c 1 c 2 . c n are called the coordinates of f with respect to the basis B = ( f 1 f 2 . f n ). The vector [ c 1 c 2 . c n ] T in n is called the B coordinate vector of f denoted [ f ] B A transformation T that transforms any f into its B coordinate vec tor, T ( f ) = [ c 1 c 2 . c n ] T is called a B coordinate transformation T : V n T he B coordinate transformation is invertible: T 1 [ c 1 c 2 . c n ] T = c 1 f 1 + c 2 f 2 + . + c n f n = f [ 1 pp. 157 158] So we can recover our original function's representation by an inverse transformation I f our basis is orthonormal (meaning  f i  = 1 for all i and f i f j = 1 if i = j and 0 otherwise [ 1 p. 187] ), then our transformation will preserve the magnitude of the original function (e.g., length, en ergy) For the purpose of this discussion, when we process a signal, we should look upon our signal as an original function to be transformed If we think about the familiar idea s of digitally taking a picture, or record ing audio, we are in a sense, tra nsforming a real world representation into a numerical representation (our eyes and brain do something similar ). Once we've recorded our signal, we hopefully have the best representation of the signal possible. Sometimes, though, maybe because of noise, we don't get the best PAGE 14 5 signal. That is where the math ematics and statistics of transformation s and data manipulation can make measurable improve ments. The process of taking our original function (our signal) and transforming its representation from one do main into another is called the analysis of the signal. Its complement, t he process of taking the function back in to its original domain by inverse transformation is called the synthesis of the signal. A particularly famous example of analyzing and synt hesizing functions through mathematical transformations is the Fourier series. We will make a formal presentation of this method in the next section. B efore we do, we would like to briefly conclude this section with a discussion to how t he Fourier series makes things easier for us. T he Fourier series has the ability to take an arbitrarily complicated function and transform its representation into sums of much simpler functions Additionally, it will be shown at the end of Chapter 2, the Fourier transfor ms allow us to equivalently replace the complicated operation of convolution in one domain with the simple operation of multiplication in the other. It is by such mathematical transformations that we can accomplish beneficial mathematical manipulation of data that would otherwise be tremendously difficult, i f not, impossible. 1. 3 The Fourier Series Every text investigated for this thesis invariably precedes the exposition of wavelets with a discussion on the Fourier s eries. Fourier series is part of th e family of transform ation s of a similar name, the Fourier transforms. This proper name Fourier ", gi ves honor to the French scholar Joseph Fourier (1768 1830), who gets much of the credit for starting what would eventually develop into harmonic analys is and functional analysis [ 10 p. 1] The actual focus of Fourier 's work that would bring about his series was 'the propagation of heat He began his work on this subject in 1801. His first attempt to publish a monograph on the matter was rejected in 1807. It would not be until 1822 that PAGE 15 6 Fourier would get a more refined work published under the title "Th orie analytique de la chaleur" (translated, "Analytical theory of heat"). What Fourier presented eventually became the method of decomposing periodi c phenomena into a series of harmonic (oscillatory) terms Before we present the formal definition of a Fourier series, some preliminary definitions and proofs need to be stated We have already mentioned that Fourier series are helpful to describe pheno men a of a periodic nature. We now define the notion of a periodic function Definition 1.6. A complex valued function f defined on is said to be a periodic function with period p if f ( x + p ) = f ( x ), x The best examples of a periodic function are the trigonometric functions of sine and cosine We now define the concept of a trigonometric polynomial Defin ition 1.7. A tr igonometric polynomial is a function that can be written as where x is real and a 0 . a N b 1 . b N are complex [ 17 p. 185] We now quickly refer to a famous r esult from Swiss mathematician Leonhard Euler (1707 1783) Euler's formula, e i x = cos x + i sin x allows us to transform a trigonometric polynomial into a finite sum of complex exponential function s The formula itself will not be proved here, bu t it will be used just below to give the some PAGE 16 7 trigonomet ric identities. Let x be a real number i the imaginary unit, and e the base of the natural logarithm. Then [ 17 p. 182] These identities allow us to rewrite a trigonometric polynomial into the more convenient form of (2) It should be obvious that the these polynomials are periodic functions with a period of 2 Now consider the integral where m and n are integers. Using the additive rule of exponents and then applying Euler's formula to substitute out the exponential functions we get Considering tw o cases, m n = 0 and m n integrat ion shows us that (3) Next, if we multiply both sides of (2) by e i n x and then integrate both sides we get Interchanging the inte gral and sum we then have (4) Using t he previous result from (3) gives u s the opportunity to simplify (4) into allowing us to conclude that PAGE 17 8 (5) for  n N If  n  > N the integral in (5) is 0. We can slightly adjust the earlier definition of the inner product to this result, and state that We do not have to limit ourselves to analyzing functions that only have a period p = 2 We can easily adapt this to wo rk for any periodic function g ( t ) with an arbitrary period p = 2 S Let x = ( / S ), and f ( x ) = g ( t ) = g ( S x / ). Then f has period 2 and t = S corresponding to x = By substitution we get the following : (6) Definition 1.8 A trigonometric series is a series of the form (7) where t is a real number [ 17 p.186] ( N ote : unless there are assumptions made about the function g the series may not converge ) Definition 1.9 If g is an integrable fun ction on [ S S ], the numbers c n defined by (6) for all integers n are called the Fourier coefficients of g Definition 1.10. The series presented in (7) formed with the Fourier coefficients defined in (6) is called the Fourier series of g PAGE 18 9 Together, t he two formulas presented in (6) and (7) respectively represent the complementary aspects of the analysis and synthesis of a function. They are the essence to understanding the Fourier series. If a Fourier analysis of a function g is applicable, then we can synthesize the same function from its Fourier series (meaning the series converges) It should be mentioned that n ot all functions surrender to such an analysis. First, the function in question needs to be integrable under some definition of integrab ility (Riemann, Lebesgue, etc.) And second, despite a function g being integrable the convergence of its Fourier series is never a certainty. In any regard, for the purposes of this brief introduction it will be enough to say that g is well behaved if it is at least a piecewise continuous function with a finite number of jump discontinuities in the S S ] [ 21 p. 110] An important aspect to note about the Fourier series is that it provides us with a set of complex exponential equations { e n = 1 exp( in 2 1 t )} that form a complete orthonormal basis for the space o f L 2 [ p /2, p /2] where p is the period of the period ic function being analyzed T hus, t he analysis of the function, which gives us the Fourier coefficients, is in fact a projection of the given function onto the space spanned by those basis function s via a n inner product integral One should also see that t his family of orthonormal basis functions are all related to a single complex exponential function, e i t via dilations. Simply multiplying the t occurring in the exponent of e i t by a factor of n gives us a new basis function of a different frequency e i n t Remember, in the Fourier series n is an integer. Such a fact is only natura l in the analysis of a periodic function since we want the cycles of our basis functions t o be in phase (i.e., in sync) with the period of our analyzed function However, restricting our analysis to only the integers leaves out an infinite number of frequencies in between. For example, if our domain is time, and our period is one second, then n = 1 gives us a basis functi on working a t 1 H ert z n = 2 gives us 2 Hertz, etc. To analyze all the frequencies, PAGE 19 10 including those in between 1 Hertz and 2 Hertz, we do have the more complete tool, the Fourier transform. Definition 1.11. The Fourier transform of a function g L 1 ( ) is defined by (8) Theorem 1.1 If is the Fourier transform of g L 1 ( ) then at every point of continuity of g [ 13 p. 6 60 ] Understandably, the Fourier transfo rm looks quite similar to the Fourier series. This transformation i s typically used in the analysis of phenomena t hat are measured in respect to their relationships with time and frequency. For example, as we experience a signal g ( t ), we do so in the time domain. If we analyze that signal with (8) we would transform that representation into the frequency domain. This method as will be explained later does have its limitations. So, f or this discussion the theory of Fo urier analysis will not be de veloped any further than this W e will make some important use of what we do have in our next chapter on wavelets T o conclude this first chapter we now present a popular function that is commonly used to illustrate an application of the Fourier series T his function will be referred to again as a tie in to our understanding of wavelets. Example 1.1 A pplic ation of the Fourier series to the sawtooth function (also called the sawtooth wave) Let g be a function define d by PAGE 20 11 where is the floor function. This function is periodic with period p It ranges between 1 and 1 I f we set the period p to be 2 then it would have the same phase as the sine function Its graph in the following figure makes it clear why it is c alled the sawtooth function. Figure 1.1 A graph of the sawtooth function, a periodic linear function ( p = 2). For the future purposes of this discussion, i t will be convenient for us to use the period p = 2. Then it can be note d that S = 1, an d that we will analyze our sawtooth function 1, 1 ] where it can be defined by g ( t ) = t Using that fact we apply (6) to compute the Fourier coefficients of our periodic sawtooth function: (9) Using several additional facts, that g ( t ) and sin e are odd functions that cos ine is an even function and that our integral is symme tric 1 1 ] we can simplify (9) to be (10) Setting t = u and sin( n ) d t = dv we use integration by parts uv vdu to solve ( 10) : PAGE 21 12 where n is a nonzero integer. Since sin( 1 ) = 0 and cos( ) = ( 1) n for all integers n we get for all nonzero integers n In determining the coefficient for n = 0 we can apply (9) in its complex exponential form and determine that giving us the following Fourier coefficients which provides us with the following Fourier series for g (11) The Fourier coefficients, c n in (11) do not converge absolutely, but Kammler [ 11 p. 43] demonstrates that this series does converge for t Knowing that the series does converge allows us to do a little algebraic manipulation of the terms W e group them in such a way that we can simplify this Fourier series using the ea rlier trigonometric identities: (11) We now have the simplest expression for the Fourier series of the sawtooth function PAGE 22 13 If we define a sequence of partial sums based on (11) , we can illustrate ( see F igure 1.2 ) how the given Fourier se ries converges to the sawtooth function. (a) Fourier series partial sum, n = 1. (b) Fourier series partial sum, n = 3. (c) Fourier series partial sum, n = 5. Figure 1.2 The Fourier series approximation s of the sawtooth function ( n = 1, 3, 5). I n the next chapter we will develop the basic theoretical background of wavelets. Chapter 3 will cover some basic statistical concepts necessary for this discussion Then C hapt er 4 will be the crux of the discussion analyzing the key principals that val idate th e use of wavelets in denoising corrupted signal s The last section in Chapter 4 will summarize our conclusions. PAGE 23 14 Chapter 2 Wavelets and T heir Origins 2.1 Alfrd Haar and the Theory of the Systems of Orthogonal Functions Fourier series asi de, r esearchers trace the origins of wavelets back to Alfr d Haar (1885 1933), a Hungarian mathematician who studied under David Hilbert at the University of G ttingen in Germany In July 1909 he proposed in his "I naugural dissertation a sequence of fu nctions as an example of a countable orthonormal system for the space of square integrable functions on the real line. Similar to the Fourier series, Haar's representation was a series of terms involving coefficient s multiplied by their respective basis f unction s T he coefficients are determined by calculating the following inner product integral : (1) The integral here is presented in its indefinite form for the moment while we continue to discuss the historical development of the wavelet transform. Haar's thesis work would later be published in 1910 for the German publication Mathematische Annalen under the title "Zur Theorie der orthogonalen Funktionensysteme, [ 8 ] Translated, it means On the Theory of the Systems of orthogonal Functions." But nowhere in the article will one find the term "wavelet". That term would come much later, in 1982, out of the research of Jean Morlet (1931 2007), a French engineer who developed the wavelet transfor m to study the layering of sediments in the geophysics of oil exploration. PAGE 24 15 2.2 Jean Morlet: Look ing for Oil, and Find ing Wavelets Jean Morlet's problem was to try to find the influence of each sedimentary layer when reflected acoustic waves generated at the earth's surface were recorded. Since some waves get trapped inside a layer and others do not, the sedimentary influence could be found in the different instantaneous frequencies reflected off the different layers [ 10 p. 179] To bridge the developme nt of wavelets from Haar through Morlet one has to mention the work of another Hungarian scholar, Dennis Gabor (1900 1979). He had a paper published in 1946, "Theory of Communication [ 6 ] that described a new method of analyzing signals in which time and frequency play symmetrical parts. Gabor was trying to overcome a problem with Fourier analysis. Its inherent probl em is that the analysis of signals in the time domain produces results that exist purely in the frequency domain. These frequencies are treated as being constant over time, so a Fourier transform cannot determine the particular point in time at which a particular frequency is occurring. It only determines to what degree a particular frequency exists in the entire global picture of the da ta. This makes the Fourier series exceptionally weak at detecting discontinuities in data as well as working poorly w ith phenomen a that really do not exhibit regular periodicity. What Gabor did to try to overcome this was to modulate the signal with a window function before performing the Fourier transform. It effectively adapted the traditional Fourier transform by placing a recurring window around subsets of the data to be analyzed. Gabor's transform uses a Gaussian function for its windowing funct ion [ 10 ] That technique creates uniform windows where each window is associated with a wave shape of invariant envelope with a carrier of variable frequency [ 7 ] Doing this allows the influence of frequencies occurring outside the window to be excluded from the calculations. Today, Gabor's transform is known as a special case of the "short time Fourier transform" (STFT). The end result gives us a time frequency analysis of the signal. PAGE 25 16 The essence of the underlying problem here, as described by Germa n physicist W erner Heisenberg (1901 1976), is the "uncertainty principle." This principle states that certain pairs of physical properties (in our concern, time and frequency) cannot both be known to arbitrary precision. Heisenberg realized this princi ple in 1926 out of his research on the mathematical foundations of quantum mechanics (he would later be awarded the Nobel Prize for Physics in 1932 for the creation of quantum mechanics). Initially Morlet used Gabor's Fourier based transform, but found th e results unsatisfactory What Morlet thought of next was to change what was to be invariant. He decided that it was the wave shape that should be invariant (instead of the wave s envelope) which would give uniform resolution to the entire plane [ 7 ] To accomplish this he would adapt the sampling rate to the frequency. This effectively creates a changing time scale producing a dilation of the wave shape. The end result gives us a time scale analysis of the signal (as opposed to time frequency). Thi s new idea still respected the constraints imposed by Heisenburg's uncertainty principle but provide d much greater flexibility in balancing them So, w hat Morlet had developed was the f irst discrete wavelet transform. His initial work was published in 19 82 in a two part article title d Wave propagation and sampling theory [ 14 & 15 ] However, the mathematical soundness of his creation was still a question to him. To find an answer for this he worked with Alex Grossman, a theoretical physicist at the Ce ntre de Physique Theorique in Marseille, France. What Grossman and his team at Marseille did from here was to take Morlet's discrete use of dilations and extrapolated them into the continuous realm, where all dilations were possible. T hat is where theore tical introductions to continuous wavelets begin. PAGE 26 17 Definition 2 1. A wavelet basis of L 2 ( ) is a family of functions (1) where a is a dilation parameter defining the scale transformation , and b is a translation parameter defining the shift (location) transformation In other words, this family of functions consists of all the translations and re scales of a single function A colloquial note he r e about wavelets, is called the mother wavelet, and all other a,b 's are called the dau ghter wavelets. You should notice that the wavelet s translation parameter mentioned above is someth ing that the Fourier series does not have. This highlights an important difference between using wavelets as a basis function versus the complex exponen tials ( of the Fourier series ) Unlike complex exponential functions, wavelet functions do not continue on indefinitely (hence the name wavelet as compared t o wave ). Thus, this "localized" wave needs to be translated to other areas of the domain in o rder to g enerate th e analyzed function. T he localization is the key to the power of the wavelets, and it brings about many of the ir benefits over the Fourier series. As mentioned, t h e wavelet representation presented in (1) is one of a continuous na ture. However, t here is much redundancy to be found in an analy sis that is done continuously. There is, though, a critical sampling rate that we do need to achieve. The foundation of this sampling r ate can be interpreted from The S ampling T heorem which is stated below without proof (Note the literature acknowledges many people with regards to the development of this theorem : D. Gabor, V. A. Kotelnikov, Karl Kpfmller, Harry Nyquist, H. Raabe, Claude E. Shannon, E. T. Whittaker, and J. M. Whittaker .) Theorem 2 1 The Sampling Theorem [ 19 ] If a function f ( t ) contains no frequencies higher than W hertz, it is completely determined by giving its ordinates at a series of points 1/(2 W ) seconds apart. PAGE 27 18 Thus, if we sample our signal a t twice the rate of the hig hest frequency present then we can completely reconstruct the entire signal In our discrete wavelet transformations t his necessitates the idea of the dyadic (binary) scale. Let the dilation p a rameter be , and the translation parameter [ 23 pp. 79 80] then g ives us our discrete wavelet basis : The dyadic scale ef fectively partitions the domain into intervals of equal length at each level j [ 23 p. 115] Each successive finer scale divides th ose intervals from the previous ( and co a rser ) scale in half. For example, if one scale's partitions are {..., [0,1), [1, 2), ...}, then the next finer scale's partition would be {..., [0,), [,1), [1,1), [1,2), ...} We can carry these finer and finer scales onto infinity, but for all practical applications we will reach an upper (lower) limit to what we need in detecting t he highest (lowest) frequencies So, a wavelet basis together with some coefficients derived from an analyzed function gives us the wavelet series represen tation: Going back to Alfr d Haar's original wavelet we can now pr esent a discrete Haar wavelet basis : Using our original indexing procedure, we have a = 2 j /2 So, if a = 1 then j = 0. Then for the purposes of having a convenient index our mother wavelet will actually be indexed as 0,0 instead of as 1 ,0 This wavelet , is defined on the interval [0,1) S ince we now have the interval for which these basis f unctions are defined, we can determine PAGE 28 19 the coefficients for Haar's wavelet series. Remember, this is a projection of the analyze d function onto the space spanned by this orthonormal basis We calcu late the inner product integral of f and j k : (2) This equation should be recognizable as Haar's representation shown in (1) at t he beginning of this chapter. Then by t aking F as the anti derivative (primitive) of f we get And so we can now analyze a function with respect to the Haar wavelet series. So then let's apply this new technique to our earlie r example from the first chapter of our discussion the sawtooth function. Example 2.1 Let Without loss of generality we will where F ( t ) = t 2 /2. Further, it should be noted that our integrals above need to be applied upon intervals where the function is continuous. So, since the sawtooth function has discontinuities at (..., 3, 1, 1, 3, ...) we do need to break up any ana lyzed interval that includes any discontinuities This would occur for any j < 0. Before we go any further about this issue, we can at least determine the coefficients of the analyzed sawtooth function for j 0: PAGE 29 20 You can see that in this case the coefficients do not depend on any translation parameter k it depends only on the dilation parameter j Now so far, we know this much about the sequence of co efficients, { g jk } j = { }. T hen what about the coefficients for j < 0? For j = 1 we are working on interval s [ 2( k ) 2 ( k +1) ) and so the coefficient g 1, k is determined by the following integral: PAGE 30 21 Next, we go onto j = 2, where the interval we are working on is [4( k ), 4( k +1)), and so the coefficient g 2 k is determined by the following integral: To keep going lower than j only produce the same result. So our wavelet series f or the sawtooth function can be rep resented as A s we did similarly with the Fourier series representation of sawtooth at the end of chapter one, we can define a partial sum of this sequence call it f n ( t ) wher e we give the infinite summation involving the index j a finite upper limit of n instead (note: if n < 0, then the infinite summation involving j becomes zero) The next figure illustrates how a Haar w avelet series of the sawtooth function converges. PAGE 31 22 (a) Haar wavelet series partial sum, n (b) Haar wavelet series partial sum, n = 0. (c) Haar wavelet series partial sum, n = 1. Figure 2.1 The wavelet series approximation s of the sawtooth function ( n = 1, 0 1 ). W hat if our funct ion was not oscillating around the mean value of zero? What if our sawtooth function is instead ranging between 0 and 2 instead of 1 to 1? Then f on the interval [ 1, 1) will be defined by f ( t ) = t + 1, and the primative is We n ow calculat e the coefficients of this new sawtooth function First f or j 0: PAGE 32 23 So, these coefficients have not changed. Next, w hat about for j = 1 ? That coeff icient ha s not changed either. Finally, looking at the coefficient for j = 2 we will get the same coefficient there as well if the following integral is zero, PAGE 33 24 S o that d id n't change either, and we can see that this new sawtooth funct ion has the exact same wavelet series representation as our original sawtoot h function. Clearly this is not going to be a good approximation of our new function. So, w e need something more to work with in order to analyze functions that do no t only oscil late around zero. 2.3 Yves Meyer & St phane Mallat: Fostering the Father of a Theory Th is brings us to the next piece of wavelet theory. There is a companion function to the wavelet function called the scaling function One can think of the scaling f unction as taking the average of the analyzed function on an interval. It can be looked upon as replacing all wavelets existing at frequencies below its own detail level j T he idea of a scaling function grew out of the work of Yves Meyer, a mathematicia n at the cole Polytechnique in Paris, France (finally a pure mathematician enters the fray), and St phane Mallat, who, at the time, was a Ph.D. candidate in electrical engineering in Philadelphia. In 1985 Meyer had postulated the existence of a scaling f unction in order to complete his theoretical construction of a wavelet basis that could be extended to a multi dimensional set ting. With this development, Mallat soon would connect wavelet theory to similar algorithms already used in his field. These inc luded the pyramidal algorithms of computer vision, the subband coding schemes of signal processing, and the perfect reconstructi on quadrature mirror filters. Mallat and Meyer would eventually collaborate to develop the formalism that includes this importa nt idea of a scaling function. That formalism is called a Multiresolution A nalysis [ 10 pp. 183 185] PAGE 34 25 Definition 2 2. A Multiresolution A nalysis (MRA) decomposes the entire function space into a sequence of closed subspaces V n n in L 2 ( ) These subspaces satisfy: (i) a nested heirarchy, V j V j +1 where the spaces (a) have a trivial intersection, and (b) a dense union in L 2 ( ), ; (ii) the V spaces are self similar, f ( t ) V j f (2 t ) V j +1 ; and (iii) V j has an orthonormal basis Th e s e function s j,k are called the scaling function s associated with V j Since V j V j +1 the function j,k ( t ) V j can be represented as a linear combination of functions from V j +1 for some coefficients h k k [ 22 p. 5 2 ] There is an interesting relationship to develop here between the scaling function and the wavelet function Given a sequence of subspaces that satisfy the prope rties of an MRA, we can define another sub space based on the differences between two successive subspaces of the MRA W j = V j +1 V j It so happens that the wavelet basis for L 2 ( ) described earlier in (1), provides for an orthonormal basis of W j [ 22 p. 57] In turn with j,k ( t ) W j V j +1 we can actually derive the wavelet function j,k from the scaling function j +1 ,k : for some coefficients g k k T he scaling function is called the "father wavel et". For example, in Haar's discrete wavelet ba sis we define the scaling function to be PAGE 35 26 Projecting the analyzed function onto the space g enerated by this orthonormal basis, again by t aking the inner product : Taking F as the anti derivative (primitive) of f then W e now have a much more convenient theory to work with: a double family of orthogonal projection operators [ 10 p. 183] Therefore a n analyzed function can now be synt hesized by the following series: for some coefficients h nk and g jk We can now go back to that altered sawtooth fu n ction example and complete its analysis using our new scaling function Looking at j = 2, we are working on t he interval [4( k ), 4( k +1)). Discontinuities exists here, and so we calculate the coefficient h 2 k by the following integral: PAGE 36 27 Applying this to the scaling function representation we get w hich is the average value of our altered sawtooth function. 2.4 Wavelets as Filters : An Introduction to Signal Processing We just discussed how the scaling function , effectively takes the average of the function over it s analyze d interval, [2 j ( k ), 2 j ( k +1)) I f we take this observation and then l ook at the companion wavelet function , we should also see that a similar use of averaging is going on, but with a twist What t he wavelet function essentially does is to find the "d ifference" between the respective average values of the function on its two adjacent subintervals, [2 j ( k ), 2 j ( k + )) and [2 j ( k + ), 2 j ( k + 1)). These companion ideas of "average" and "difference" will be important concepts to appreciate as we develo p our rational e for connecti n g wavelet theory to filter theory. So far, t he use we' ve made of the Haar wavelet has really been in the analysis of piecewise continuous functions. W e can easily make an anal ogous jump to discrete applications and do so now PAGE 37 28 To be gin consider a typical discrete sequence of numbers { x k } You already know that to find the average of two numbers x i and x j you just simply take their sum and divide by two. This is analogous to the scaling function's averaging effect and can be wr itten equivalently as Continuing the analogy we can r elate x i and x j to the two subintervals of [2 j ( k ), 2 j ( k + )) and [2 j ( k + ), 2 j ( k + 1)) and then a simple association would suggest that the analogy of the wavelet fu nction onto x i and x j would look like So then w hat does this offer us? What we have actually done is to transform the two original numbers x i and x j into two new numbers, and in doing so we really have not lost any informat ion about the two original numbers. For w e can still recover x i and x j from the two new numbers perfectly : We can express this process equivalently through linear algebra. First, let's just take a simple look at the initial transformation for the case of a two element vector [ x i x j ] T And t he n look at the inverse transformation that brings us back to the original vector We include the factor in order to preserve energy. W hat we have here is called the Haar Transform W e now define it formally for the simple case of a 2 element vector PAGE 38 29 Definition 2.3 Let x = [ x 1 x 2 ] T be a 2 element vector. The Haar transform of x is given [ 12 ] by W hat does this transformation of the original numbers offer us? The transformed numbers tell us something about how much the value from x i to x j h a s changed If x i and x j are really close to the same value, then the ir resulti ng difference ( analogous to the wavelet function ) will end up be ing very close to zero. A s for its companion the average ( analogous to the scaling function) th is new number will be obviously very close to both of the two original numbers On the other hand, i f there is a large difference between x i and x j then of course the ir difference will be a large value, and their average will no longer be near either of the two original numbers. In essence, through the use of these two calculations we can say that the original numbers are being filtered for two opposing qualities: stability versus insta bility. If you d on't quite see this filtering comparison, t hen let's think again about the two scenarios described in the preceding paragraph stab ility and instability but only at their respective extremes F irst we consider the two extremes for the case of th e averaging filter. If we t ake two numbers that are exactly the same ( i.e. stable ) then the ir resulting average will be t he exact same value as we ll a nd thus the one number will pass through the averaging filter unchanged If we next take two un equal numbers t hrough the averaging filter then t he further apart they are the more divergent the ir average value will be from the ir two original value s So the averaging filter better preserves the values t hat have a low rate of change between them and thus we call it a lowpass filter. On the other hand, when we consider the two extremes for the case of the difference filter we get something else When we t ak e the two equal numbers their resulting difference is zero thus the difference filter will produce a zero as well W hen we ta k e the two un equal numbers through the difference filter, the further apart those numbers are (i.e. PAGE 39 30 unstable) the l arger the filter's resulting value will be So the difference filter gives larger values for the higher rate s of change and we thus call it a highpass filter There is another way to illustrate these highpass and lowpass filtering concepts and that is to use the coefficients of the two filtering functions as the coefficients in two respective finite length Fourier series First we describe the general representation G iven a finite sequence of coefficients { h k }, k = 0, 1, 2, ... N we will construc t a finite length Fourier series from these coefficients as follows : Then in the case of the Haar transform, its lowpass filter, w hose coefficients are { } will give us the following finite length Fo urier series : For Haar's highpass filter t hose coefficients are { } and gives us Remember that these functions are 2 periodic. In Figure 2.2 we plot the modulus of th e se two function s onto the interval [ ] (a) Haar lowpass filter (b) Haar highpass filter Figure 2.2 Modulus plots of the Haar lowpass and highpass filters. PAGE 40 31 What t hese plots show us is tha t the functions  H ( ) and  G ( ) are the frequency response functions of the lowpass and highpass filters, respectively. You can see that  H ( ) / is near 1 when is a low frequency ( near 0 ) and  G ( ) / is near 1 when is a high frequency ( near or ) We now formally define the Haar Transform Filters [ 21 p. 166] Definition 2. 4 The filter is a lowpass filter called the Haar filter and is a highpass filter called the Haar wavelet filter The filters we 've looked at here are based on the Haar transform. The coefficients used to create these filters are r eferred to as filter taps With only a finite number of taps in these examples, such filters are considered to be a finite impulse response filter (FIR). The concept of filtering helps us to theoretically connect wavelet theory to the field of signal processing, which is the dominant area for the application of discrete wavelet s A signa l, in general, is a time varying or spatial varying quantity, such as an audio signal or a photographic image. These signals can be codified into streams and/ or arrays of quantified data. Thus, we can look at an arbitrary bi infinite sequence { s j } as a signal representation {. . s 2 s 1 s 0 s 1 s 2 . .} or equivalently as an infinite vector [. . s 2 s 1 s 0 s 1 s 2 . .] T PAGE 41 32 Earlier when we illustrate d how the Haar transform operates as a combination of lowpass and highpass filter s we used the trivial example of a two element vector In order to get a more robust use out of these filters we will need to understand a special binary operation called convolution Definition 2. 5 Convolution [ 21 p. 128] Let h and x be two bi infinite sequences. Then the convolution product y of h and x denoted h x is the bi infinite sequence y = h x whose n th component is given by The definition requires two bi infinite sequences. In general, there is no guarantee that this series converges. We will mostly be dealing with FIR filters, and therefore will not be worried about convergence. We can easily make a bi infinite se quence out of any FIR filter by adding zeros on either side of the coefficients, { . 0, 0, h 0 h 1 . h N 0, 0, . .}. We can then apply any FIR filter onto a signal { s j } by using the convolution operation What w e will get from that is the following sequence which represent s a filtered signal You'll notice that the sequence { s j } appears to be processed in reverse or der. That comes from how convolution has been defined. When n = 0, x n k = x k and clearly shows how the bi infinite sequence { x j } is processed from + to as k goes from to + We can translate this in to linear algebra but we will reverse the t aps of { h k } instead of the original signal { s j } : PAGE 42 33 So, we now have a convention for processing a large signal through a filter. We do need to develop this some more (remember there are two filters to deal with), but before we do t here is an important result that characterizes convolution in the Fourier domain. That result is the convolution theorem. Theorem 2. 2 Convolution Theorem [ 21 pp. 133 134] Let h x and y be bi infinite sequences with y = h x Let H ( ), X ( ), and Y ( ) denote the Fourier series of h x and y respectively. Then Y ( ) = H ( ) X ( ) This result allows us to substitute the complicated operation of convolution with a simple multipli cation in the Fourier domain It provides u s with an important tool in the design of filters. Before we conclude this chapter, there is one more aspect of signal processing we need to finalize. The convolution we just explained above was only demonstrated for one filter. Y ou should remember tho ugh, that the Haar transform works as two combined filters, a lowpass and a highpass W e do need them both and t here is a way to combine these filters into a single process using the block matrix arithmetic of linear algebra. Let { h k } be the taps of a l owpass filter, and { g k } be the taps of a highpass filter. Then the translation of this process into linear algebra will loo k like this : PAGE 43 34 In the case of the Haar Transform, and remember that we' re reversing the taps, we g e t a process that look s like this : PAGE 44 35 When we think about invert ing this transform, remember our goal is for perfect reconstruction of the original signal { s j }. L ook ing carefully a bove it can be noticed that we actually have more i nformation than we really need in order to accomplish this. Observe that f or any s j above we have ( s j 1 + s j ), ( s j + s j +1 ) ( s j 1 + s j ) and ( s j + s j +1 ) That means we can reconstruct s j in two ways: s j = ( s j 1 + s j ) + ( s j 1 + s j ) or s j = ( s j + s j +1 ) ( s j + s j +1 ). W e only need to choose one of them to make this work. That is why we are going to downsample our process ing by truncating the matrix. T o accomplish the truncation we will get rid of every other row in the Haar transform ation m atrix : What we have above is the Haar Wavelet Transformation generalized to N dimensions We now formalize its definition just below. PAGE 45 36 Definition 2. 6 T he Haar Wavelet Transformation [ 21 p. 166] is given by Then if S N represent a one dimensional signal matrix of length N where N is an even positive inte ger is the transformed signal. It is important to note here that t he resulting matrix has two blocks in it. The upper block is a result of the lowpass filter an d is considered to be the lowpass data. The lower block is a result of the highpass filter and is considered to be the highpass data. The concise mathematical notation to represent this is as follows. Let the one dimensional signal be represented by the matrix S = [. . s 2 s 1 s 0 s 1 s 2 . .] T and let W be the matrix of the entire wavelet transformation, and H and G be the respective lowpass and highpass blocks of W Then our processed signal matrix, call it V will be the result of the following block matrix multip lication: PAGE 46 37 T he inverse of our transformation will work as follows : W e can represent this concisely as follows: This is all satisfactory for a one dimensional signal, b ut what do we do for the case of a two dimensional signal like a digital pho tograph? We could use the wavelet transformation W on the two dimensional data in just the same way, but it would only filter the data through its columns It would not filter any of the data through its second dimension, meaning it would not analyze how the data is changing along each rows We can still accomplish this by including one additional step of matrix multiplication W e only need to multiply the other side of S by the transpose of W t o get the rows filtered W e can thus define a 2 dimensional Haar Wavelet Transform Definition 2. 7 Let S represent a two dimensional signal matrix, and let W be the matrix of the entire wavelet transformation The n the 2 dimensiona l Haar Wavelet Transform of S is given by PAGE 47 38 Looking at the block matrix arithmetic details of this 2 dimensional transform we have: This is the matrix of our fully transformed two dimensional signal It is important to note that a two dimensional transformation produces four blocks instead of two. The one block in the upper left is a single lowpass block, whereas the other three are conside red to be all highpass blocks. To illustrate graphically w hat the differences between lowpass and highpass data amount to we will give an example shortly of a photograph transformed by the Haar wavelet F irst, i t needs to be note d that we' ve taken some liberties in adjusting the data for these illustrations. In t he rendering of any image we are required to stay within the limits of the grayscale format [0, 255]. But t he Haar transform's normalizing factor makes the potential range of the lowpass data become [0, 510 ], and the highpass data, [ 255, 255]. So, in order to fit th is data into a range of [0, 255], the lowpass values have been cut in half and the highpass values were increased by 255 before cutting those in half as well. A particular point to be aware of is that what use to a valu e of zero in the highpass block, is now 127. In any regard, these manipulations allow us to really appreciate the relative changes occurring between adjacent values. In Figure 2.3 we present a picture of a clown before and after transformation. In (a) w e have an original picture (160 160 pixels). In (b) we have the lowpass block of the clown data (80 80 pixels). Notice how the averaging the adjacent data points (pixel values) has blurred the image. In (c) we show one of the highpass blocks (80 8 0 pixels) Here it is visually dramatic what calculating the differences between adjacent pixel values is doing. It' s mostly preserving the locations where the picture has large amounts of contrast between adjacent pixels, i.e., where our eyes would see edges (for example, the eyeglasses ) PAGE 48 39 (a) Original image (b) Lowpass data (c) Highpass data Figure 2.3 Comparing an original picture to some of its lowpass and highpass data. In Figure 2.4, we show the entire picture before and after trans formation. It can be seen that only one block has lowpass data, whereas the other three are all highpass data. To inverse the transformation, we simply calculate W T V W = S (a) Original image (b) Haar transformed image Figure 2.4 Complete im age before and after a Haar wavelet transformation. So, w e now have the basic mechanics for using discrete wavelet transformations in the field of signal processing. These techniques by themselves are not our main goal: T here is a point to this signal p rocessing other than breaking down and then perfectly reconstituting a signal In between the forward and the inverse transformations we are going to make some changes to the processed signal with the intention of trying to PAGE 49 40 imp rove it in some idealized wa y I n this discussion our stated goal is to denoise a co rrupted signal. The d enoising will be accomplished in between the two transformations, through the use of statistics. I n the next chapter w e introduce the necessary statistical concepts used in the art of denoising. PAGE 50 41 Chapter 3 The Statistics of Noise, Error & Estimation 3.1 Statistics: The Essentials W hen we use statistics to analyze data we' re not really trying to be objectively accurate in a traditional mathematical way Rather, we are trying to impose a system of estimation that is somewhat subjective yet still quite accurately defined by mathematics. A simple explanation would be to say that we are "trying to estimate the truth." Unlike a pure ly p hysical phenomena that follow an ex act law of nature to an n th degree of accuracy, more complicated events whether physical or man made, tend to have to o many unknown dependencies to come up with an exact answer Under such circumstances we' ll be happy to have however uncertain, decent i nformation on w hat the majority of dat a might look like Statistical techniques offer us the tools to accomplish this. To start with, two prominent descriptive measures used in statistics are the estimations of the data' s central tendency (e.g., mean, me dian, and mode) and the e stimations of the data's degree of dispersal (e.g., variance and standard deviation). W e will develop these notions through t he conventional mathematical model of randomness the probability space Definition 3.1. Let be a se t (which we think of as the set of all possible outcomes of an "experiment" ) S be the algebra (any subset S is a set of events) and P be the measure S ) such that P ( such a measure is called a probability measure ). Then th e S P ) is called a probability space [ 16 p. 3, 7, 8] PAGE 51 42 A probability space is effectively a measure space where its measure, the probability measure P is a measure with total measure equal to one. In order to make a probability space conduc ive to mathematical analysis, we often need to define functions that assign a numerical value to our experimental events. Such functions are called random variable s Th at label may actually sound a little confusing at first wi th the word 'variable' in t here and the word 'function' not. But i t does make sense if we think about how functions are typically defined by a mathematical expression that includes a variable It just happens that the variable in this kind of function is a random event, not a numb er. Definition 3.2. Let be the set of all possible outcomes, and S be the algebra of valued measurable function X is called a random variable (RV) [ 16 p. 41] A relevant example of a RV for this discussion is connected to the familiar idea of taking a photograph. In t his process of recording an image digitally, our camera assigns numerical values to the physical events of color and location In particular, each specific pixel will be assigned some value that indicates the intensity of its light source The light will be measured relative to some scale For example, a grayscale image has pixel intensities ranging from 0 to 25 5 where 0 is black and 25 5 is white. Theorem 3.1. The random variable X defined on the probability space S P ) induces a probability space ( B Q ) by means of the correspondence : Q ( B ) = P { X 1 ( B )} = P { : X ( ) B } for all B B We write Q = PX 1 and call it the distribution of X [ 16 p. 43] PAGE 52 43 Random variables may be d iscrete functions, like in the camera example, or they may be functions that map their results continuously over some range. In either case we will have varying d egrees of dispersal around some central tendency In order to characteriz e th e se dispersals we introduce the distribution function Definition 3.3. A real valued function F defined on ( ) that is non decreasing, right continuous, and satisfies F ( ) = 0 and F (+ ) = 1 is called a distribution function (DF) [ 16 p. 4 4 ] Definition 3. 4 Let X be a random variable defined on S P ) Define a point function F ( ) on by F ( x ) = Q ( x ] = P { : X ( x } for all x The function F is called the distribution function of RV X [ 16 p. 4 5 ] What a distribution function accomplishes is to measure how probable some set of events is. Each event has been related by a random variable function X to a real number. Now, you can imagine that there are many non decreasing, right continuous, real valued functions that range from 0 to 1. In order to understand the connection between such a real valued function F and the probability it's ass ociated with, we examine two different situations, discrete and continuous. For discrete situations we have various probability mass function s P { X = x i } and fo r continuous situations we have probability density function s f ( x ) PAGE 53 44 Definition 3. 5 A r andom variable X S P ) with distribution function F is said to be a discrete random variable if there exists a countable set E such that P { X E } = 1 The collection of numbers { p i } satisfying P { X = x i } = p i 0, for all x i E an d is called a probability mass function (PMF) of RV X The DF F of X is given [ 16 p. 4 8 49 ] by Definition 3. 6 A random variable X defined on S P ) with distribution function F is said to be a continuous random variable if there exists a nonnegative function f ( x ) such that for every real number x we have i.e., F is absolutely continu ous. The function f is called the probability density function (PDF) of the rand om variable X [ 16 p. 50 ] Theorem 3. 2 Let X be an RV of the continuous type with PDF f Then for every Borel set B B [ 16 p. 50 ] If F is absolutely continuous, and f is continuous at x we have Given an RV X with a known distribution, we can easily incorporate X into a function to create a function of a ran dom variable, Y = f ( X ). We then can extend the theory of distributions to be able to determine the distribution of Y PAGE 54 45 Theorem 3. 3 Let X S P ) with known distribution F Let f be a Borel measurable function on Then f ( X ) is also an RV and the distribution of the RV Y = f ( X ) is determined [ 16 p. 57 ] When we consider the signals that we will be working with (digital photographs, audio signals), what we have is a collection of several random variables, which, itself, is a RV, X = ( X 1 X 2 . X n ). We can describe the probabilities of the various possibilities of RV X with a joint distribution function F and joint probability density function f Definition 3. 7. Let X 1 X 2 . X n be n random variables. W e say that these random variables have a joint distribution [ 16 p. 103, 106 ] if there exists a non negative function P { X 1 = x i X 2 = x j . X n = x l } = p i ,j,..., l discrete case, or f ( t 1 t 2 . t n ) continuous case, such that for all ( x p x q . x s ) n The function F is called the joint distribution function of X = ( X 1 X 2 . X n ) p i,j,..., l the joint probability mass function and f ( t 1 t 2 . t n ) the joint probability density function of X There are some theoretical conditions th at need to be placed on F for it to truly be a joint distribution function. Two of them look similar to the conditions for the earlier definition of a DF for a single RV X PAGE 55 46 Theorem 3. 4 A function F ( x 1 x 2 . x n ) is the joint DF of some n dimensio nal RV if and only if F is non decreasing and right continuous with respect to all arguments x 1 x 2 . x n and satisfies the following conditions: ( a ) F ( x 2 . x n ) = F ( x 1 , . x n ) = . = F ( x 1 x 2 . ) = 0 for all x k ( b ) F (+ + . + ) = 1, ( c ) For every ( x 1 x 2 . x n ) n and all i > 0, i = 1, 2, . n the following inequality holds: For the p roof of the case for two variables, see Rohatgi [ 16 p p 103 105] and Tucker [ 20 p. 26] There is one more general kind of distribution to mention and that is the case where we have a joint distribution of an n dimensional multiple RV X but we would like to identify the DF of only one ( maybe more, but less than n ) of the RVs. Such a distribution is called a marginal distribution PAGE 56 47 Definition 3. 8. Let X 1 X 2 . X n be n random variables, and suppose that p i,j,..., l is the jo int probability mass function (in the discrete case ) or f ( x 1 x 2 . x n ) is the joint probability density function (in the continuous case ) for these RVs We say that the random variable X m 1 m n has a marginal distribution if there exists a non negative function p ,, ..., k ..., or f m ( x m ) defined [ 16 p. 108 111 ] by such that for all x r The function F m is called the marginal distribution function of X m the function p ,, ..., k .. ., is called the marginal probability mass function and f m ( x m ) is called the marginal probability density function ( N ote: these definitions can be generalized to a d dimensional, 1 d n 1, marginal distribution ) When it comes to the noise that w e wil l be dealing with, we will be treating it as a random variable. Additionally though, we will be assuming that the probable strength of any one instance of noise wi ll not influence the strength of any other instance of noise. How we say this statistical ly is to call these random variables independent Definition 3. 9. Let F ( x 1 x 2 ) and F 1 ( x 1 ) and F 2 ( x 2 ), respectively, be the joint DF of ( X 1 X 2 ) and the marginal DFs of X 1 and X 2 Then we say X 1 and X 2 are independent [ 16 p. 119 ] if and only if F ( x 1 x 2 ) = F 1 ( x 1 ) F 2 ( x 2 ) for all ( x 1 x 2 ) 2 PAGE 57 48 The definition of independence can be generalized to a sequence { X n } of RVs, meaning, for every n = 2, 3, 4, . the RVs X 1 X 2 . X n are independent. One other statistical quality we can assume ab out noise is that any instance of noise is identically distributed which we can combine with independence to describe random variables that are both independent & identically distributed Definition 3. 10. We say that RVs X 1 and X 2 are identically dist ributed if X 1 and X 2 have the same DF [ 16 p. 123 ] that is for all x Definition 3.1 1. We say that { X n } is a sequence of independent, identically distributed ( i i d ) RVs with common law L ( X ) if { X n } is an independent sequence of RVs and the distribution of X n ( n = 1, 2, . .) is the same as that of X [ 16 p. 12 3] By themselves, these distribution functions and probability mass/density functions give us the ability to determine the likelihood of any set of ev ents that we can measure. T h is is not all that we can measure with these functions. We can now also cal culate the idea of an expected value The most simple example of an expected value is our common notion of arithmetic mean (i.e., average). T here are many more mathematical expectations that can be calculated If we are given a random variable X its pr obability mass (or density) function, { p i } ( or f ) and a function g then we can calculate such expected values as X 2 or g ( X ). First, let's formalize the concept of expected value with a definition. PAGE 58 49 Definition 3. 1 2 Let X be a discrete random variab le with probability mass function p i = P { X = x i }, i = 1, 2, . If then we say that the expected value of X exists [ 16 p. 69 ] and write Similarly, l et X be a continuous random variable with pr obability density function f If then we say that the expected value of X exists [ 16 p. 70 ] and write We can see that for discrete RVs, the expected value is the probability weighted sum of the possible values. F or continuous RVs with a density function, it is the probability density weighted integral of the possible values. Now t he definition above was written in the context of finding the expected value of X If we instead actually wanted t o find the expected value of X 2 with density function f we would need to calculate And i f we wanted to find expected value of g ( X ) where g is an arbitrary function then we would calculate whi ch is the inner product of f and g f g If we add the results of two or more random variables, then we have one kind of function of several random variables. When we consider how the Haar transform works, we note that we will be adding two random variables at a time, as well as multip lying PAGE 59 50 them by some constant factor of Thus will need to know some properties of expected values under basic mathematical operations Theorem 3. 5. ( Addition Properties of Expected Value) [ 21 p. 507 ] Let X 1 and X 2 are two ra ndom variables, and a and b be arbitrary real constants Then ( a ) E ( a ) = a ( b ) E ( aX 1 + b ) = aE ( X 1 ) + b ( c ) E ( X 1 + X 2 ) = E ( X 1 ) + E ( X 2 ) More generally, if X 1 X 2 . X n are n random variables, then ( d ) Of specia l importance are the expectations of X n where n is a positive integer. E ( X n ) is called the n th moment of X about the origin [ 16 p. 72 ] and the 1 st moment gives us a good estimate of the data's central tendency = E ( X ). W e also need to a way of d etermining the character of the data's range of dispersal around its central tendency. In address ing that we will first mention the general concept of a moment of order k about a point c E ( X c ) k [ 16 p. 79 ] Then, i f we take c = E ( X ) = we get what we need, the central moment of order k E ( X ) k Of particular importance is the central moment of order 2 which is call ed the variance Definition 3. 1 3. If E ( X 2 ) exists, we call E ( X ) 2 the variance of X and we write = The quantity the square root of the variance, is called the standard deviation of X PAGE 60 51 W e need to mention here some of the properties of variance These properties will be important to the understanding and appreciatio n of how we use the Haar transform to accomplish denoising Theorem 3. 6. (Properties of Variance) [ 21 p. 510] Let X be a RV, X 1 X 2 . X n be n independent RVs a and b be arbitrary real constants, and = E ( X ) Then ( a ) Var ( aX + b ) = a 2 Var ( X ) ( b ) Var ( X ) = E ( X 2 ) 2 ( c ) Var ( X 1 + X 2 + . + X n ) = Var ( X 1 ) + Var ( X 2 ) + . + Var ( X n ) It was mentioned a little bit ago that there are many probability distributions to consider One distribution familiar to most is the normal distrib ution (also commonly referred to as a Gaussian distribution ) This particular distribution is tremendously important to our discussion p articular ly to our study of noise, and the measurement of the error s caused by noise. Definition 3. 1 4. A random va riable X is said to have a normal distribution with mean parameter and variance parameter if its PDF is given by If we set = 0 and = 1 then we have the s tandard n ormal d istribution You can see that PAGE 61 52 If a random variable X is normally distributed with mean and variance 2 then we write X ~ N ( 2 ). Given a normally distributed random variable X we can, for convenience define a new random variable, Z = ( X )/ which is a function of the original RV X We can th en say that the distribution function F of a N ( 0 1 ) random variable is given by Figure 3.1 shows the probability density function f (the familiar bell curve ) of a normally distributed random variable Z Z ~ N (0, 1). F i gure 3.1 Probability distribution function of the standard normal distribution. The importance of the normal dis tribution is that it is the most common ly used tool for the statistical analysis of complex phenomena b ecause a ny random variable that is the sum of a large number of independent factors is likely to be normally distributed. This distribution provid es us with a very simple model for an otherwise unmanageable problem, w hich is why it is frequently used as a model for noise. PAGE 62 53 3.2. Noise : A Statistical Characterization It should be of no surpris e that n ot all noise s are created equal However, it' s interesting to note that there are various statistical characteriz ation s for the d ifferent types of noise One such attributio n that scientist s give to noise is the description of its spectral density. Noise does not necessarily a ffect the entire frequency spectrum of a given signal uniformly. It may actually have a varying amount of intensity at the different frequencies If we consider the noise to be o perating as a random variable, then this means the noise is not identically distributed from frequency to frequency One of the spectral density characterization s involves assigning a color to noise In F igure 3.2 we selected three different noise colors for graph ical representation P ink noise has a spectral density that is proportional to the inverse of the frequency meaning it has a stronger intensity on the l ower frequencies On the other hand, b lue noise has a spectral density that is in direct proportion to the frequency, giv ing a stronger e ffect at the h igher frequencies Finally, with what may be a familiar reference to some we also have white noise W hite noise is a noise that pervades the entire frequency spectr um with equal e ffect This distin guishes it as being the noise most difficult to remove Figure 3.2 Linear log plots of some colored noise spectral power densities. PAGE 63 54 H ere is a quick note about the decibel (dB) calculations involved in F igure 3.2. The decibe l formula is given by where P 1 / P 0 is a ratio that compares one power value P 1 to some reference power value, P 0 Decibels are a dimensionless measurement They do not indicate the actual power that could be measured by som e absolute unit of measure ment ( like watts or volts ) For example, c onsider the pink noise shown in our graph It has been representatively defined by the formula This returns a value of 0 dB a t = 50 Hz, making our reference power P 0 equal to whatever the power of the noise may be at 50 Hz If we g o up one octave to = 100 Hz, then we have P 100 Hz equal to half of P 5 0 Hz : t hat is a loss of 3.01 dB between the two frequencies In general, the spectral power density for pink noise, given some reference frequency 0 will decrease inverse ly with 1 , which is a bout 3dB per octave. Here is o ne final note about the graph : For the purpose of concise ness we've substituted for log 2 ( 1 / 0 ) which would actually make Hz the reference frequency. In addition to the noise's spectral density, w e also need to consider one more statistical characterization to be able to work with noise mathematically. Again, since n oise opera tes like a random variable, we can describe the range and dispersal of the varying occurrences of noise with a probability density function. To be clear, this is a different consideration than the noise's spectral density. W e are now talki ng about charac terizing how the instance s' of noise vary in intensity a round an y arbitrary point on the frequency spectrum. It may sound similar to the idea of spectral density, but it's not. So let's contrast th is a little more with some statistical language S pect ral density PAGE 64 55 describes the noise's changing variance ", i.e., strength, at a specific point on the frequency spectrum B ut the PDF for the same noise will describe its generic dispersion, i.e., the shape of its curve for any instance of noise no matter wh ere it happens on the frequency spectrum This second characterization essentially gives our noise a two dimensional description. Fit f or our purposes the most common PDF characterization given to noise is Gaussian ( i.e. normally distributed ) This sh ould be understand able since normal distributions, as described earlier, are the preferred tool for the analysis of events resulting from a large number of independent factors If it were o therwise, meaning there was something systematic to our noise (lik e a motor running in the background ), then we could likely characterize t he noise more closely with a more suitable PDF (like a Cauchy d istribution for example ). In any regard, the noise we are going to work with will be assumed to be Gaussian white nois e 3.3 Estimation : Decisions that Consider the Statistical Error s of Loss & Risk N oise can easily be thought of as caus ing errors in our attempts at measuring the true signal. W e will eventually present a technique that reduces the effects of n oise induced error B efore we do, we first examine components of two u nderlying processes, estimation and decision, that will for m the basis of our technique. When we observe (i.e., record) a signal, whether an audio signal or digital photograph, we are observing a large set of data points. Each one of those observed data points will be mapped to a numerical value, which, with all accuracy, has only one 'true' value. A deviation in our recording of the true value is called an observational error Let represent the perfect vector resulting from the sampling of our true signal let represent the noise vector ~ N (0, 1), that is impacting our true signal, and let PAGE 65 56 represent the no ise variance (i.e., strength of the noise). Then we can represent t he actual observations of our signal with the following model: w here x is the observation vector. Each vector component i in is one sampled parameter of the true signal. The set of all admissible values of the parameters makes a parameter space denoted The goal in our denoising technique will be to come up with the best estimate for each i which in turn gives us the best estimate for We will denote the respective estimates of i and as and To make these estimates we will need to make decisions that are based not only on our observations, x but also on a well reasoned decision function Definition 3. 1 5. Let be a parameter space and let { } b e a family of PDFs ( or PMFs) Further, let ( X 1 X 2 . X n ) be the RV obtained from sampling the parameter space n times and let ( x 1 x 2 . x n ) be the observed sample point Finally let A be a set, which we think of as the set of all decisions that may be taken based on those observations Then a decision function , is a mapping : n A that relates observations of X to prescribed decisions in A ( X ) A [ 16 p. 4 24 ] Our decisions won't be perfect. There will still be some degree of error in our estimate s I f we can measure the effecti veness of our decisi ons then we will be able to determine which particular decision function works best So to accomp lish that we are going to measure what is called the loss which is a numerical value that indicates how wrong the result from the decision , is compared to the perfect vector There are many different formulations for calculating a loss and each one of them is associated with its own loss function PAGE 66 57 Definition 3. 1 6. Let A be a arbitrary space of decisions. Then a loss function L is a mapping L : ( A ) + that rel ates any decision about a parameter to a nonnegative real number The resulting value is an i ndication of the amount of loss coming from that decision, [ 16 p. 4 25 ] One loss function to consider is the absolute difference los s function Another loss function is the q uadratic l oss f unction For any estimation process we need to realize that our losses will not always be the same every time. Consider our noise for instan ce. Most of the time the loss due to noise will be close to zero, but sometimes that noise will cause big ger losses. For our discussion we've assumed that noise will be normally distributed. Recalling the statistical concept of expected value, we can d efine an expectation for our losses. The expected loss on i will be called the risk of the estimator Definition 3. 1 7. Let D be a class of decision functions : n A and let L be a loss function defined on ( A ). Then a risk function R is a mapping R : ( D ) + that relat es a decision functi on and an unknown parameter i to a nonnegative real number. The resulting value is an indication of the expected loss coming from that decision, [ 16 p. 4 24 ] A risk that is of particular interest to us is the one associated with the quadratic loss function PAGE 67 58 Definition 3.1 8. Let = [ 1 2 . n ] T be the true function vector of a sampled signal L et be i ts estimator and let the quadratic loss, be the loss function for each Then the mean square error of is defined to be [ 16 p. 354 ] the expected loss of L given the actual sampling s i = 1, 2, . n : The qu antity is known as the total quadratic loss ( or 2 loss ) The mean square error of the estimator with respect to the estimated parameter is the second moment of the measured loss, e ssentially, the "average of the square of the error" Taking the square root of the MSE y ields the root mean square error (RMSE), which is analogous to standard deviation These definitions of loss and risk are quite theoretical. Meaning, we need to appreciate that coming up with the true and exact values requires much real knowledge that we don't actually have. In the case of a our corrupted signal , we do not know where the true signal is exactly, nor do we know how strong any particular instance of noise is. Thus our estimation of is going to be based on a prescribed decision function , that is part ly operating on our observations x i.e., = ( x ). That means in our larger discussion o f denoising we will eventually come to a n idealized decision function that provides for by some measure, the optimal denoising technique As we formulat e this dec ision function we will base it on some well reasoned theoretical considerations To evaluate this reasoning, w e will need some statistical s tructure. W e would like our decision function to be valid for all parameters i in ( or at least for all i in a family of parameters, ). Obviously, our concern will be the minimization of error. T he particular error that we will work to minimize, considering all PAGE 68 59 the i is the mean square error That means we are working to mini mize the expected risk Definition 3.1 9. Let be a parameter space and = [ 1 2 . n ] T be a true but unknown value of a sampled parameter As sume the model, where x is the observation of the sampled paramet er and is the error on that observation Further, l et ( x ) = be the estimator of that is dependent on the observation x and finally, let MSE( ) be the risk on Then the expected risk of is defined [ 4 p. 4 25; 9, p. 1200 ] to be : This is the basic problem of decision theory: to find the particular decision function, that m inimizes, in some m easure, the expected risk over all i For us to determine which particular will be our optimal decision function, we do need one more thing, a criterion called the principle of minimax Definition 3. 20. G iven a space of possible decisions, A a class of decision functions, D and a loss function , the principle of minimax is to choose D so that (1) for all in D Such a rule if it exists, is called a minimax ( decision ) rule [ 16 p. 4 25 ] Since our problem is one of estimation we call satisfying (1) a minimax estimator of i Essentially, what we are looking to find is the particular rule whose worst outcome is least among all the worst outcomes possible of any rule. We are not getting the best outcome necessarily. We are just assurin g ourselves that when we do PAGE 69 60 encounter the worst possible outcome, then we've chosen a rule that minimizes the error we would get from it. So, i f we were to use the absolute difference loss function , then it would be the observed median that would minimize our risk. Theorem 3. 7. Let f ( i ) denote the PDF of and let the loss function be given by Then the estimator which minimizes the expected value of the loss function is the median of the distribution of [ 9 p. 31 ] Proof : Let m be an arbitrary paramet er in T hen U sing integration by parts, uv vdu to evaluat e the integrals, we take the antiderivatives of f to be ei ther F ( i ) or F ( i ) 1 as appropriate ( i.e., f ( i ) i = F ( i ) C ) Then letting represent the primitive of F we get Because is a convex function, if we simply take the derivative of (2) with respect to m and set that result equal to zero, and then solve for F ( m ), we get the result that the m that minimizes absolute difference loss is the median. PAGE 70 61 If we were to use the quadratic loss function , then it would be the observed mean, that would minimizes our quadratic loss Theorem 3. 8. Let f ( i ) denote the PDF of and let the loss function be given by Then the estimator which minimizes the expected value of the loss function is the mean of the distribution of [ 9 p. 28 29 ] Proof : We are seek ing the value that minimizes (3) Because is a con vex function, if we simply take the derivative of (3) with respect to a nd s et that result equal to zero, then solve for we get the result that the mean of is the value that minimizes the quadratic loss. We 've now laid out the underlying statistical structure that we will use to e valuate our denoising techniques. So we head o nto the next chapter and towards the main focus of this discussion, den oising. PAGE 71 62 Chapter 4 Denoising, T hresholding and an Ideal istic Oracle 4.1 Denoising: An Ove r view We are now ready to look at the actual mechanics of denoising. We a ssume the following model for our observed signal : whe re x is the observation vector, is the perfect vector from sampling our true signal and is the noise vector (with being the level of the noise). Component wise, each x i = i + i i = 1, . n Remembering that ~ N (0, 1), thus the E ( i ) 2 = 1, it should be obvious that the mean square error between our observations and the true signal will be We are going to try to reduce this error by a denoising technique that incorporates wavelets into its process Our signal will essentially be the funct ion that we are going to analyze. There are many different wavelet transform s that one can use to perform the analysis, but for this discussion we are going stick with the rather simple Haar wavelet transform so as to easily illustrate some other importa nt details that we want to point out In its raw form, there is not much we can do to a noisy signal directly The process of denoising is a multi step program that begins with the transform ation of the signal, via the Haar wavelet transform, into its lo wpass and highpass components (as illustrated back in Chapter 2). This first step does not do to o much to denoise the signal It will be explained shortly that the lowpass portion of the data, by itself, is a denoised vers ion of the original data. T hat alone is not a very sophisticated result. Sophistication comes in the next step when we look at the highpass data and apply a process that we PAGE 72 63 have not even mentioned yet. That process is called "wavelet shrinkage" and incorporates the technique of thre sholding We will explain this in a short while too. It is a fter this that we can reconstitute the signal via a synthesis of the modified highpass and un modified low pass, into a newly revised estimate of the true signal There is an intricate balanci ng act of choices being made here. We are soon going to see how the choice to threshold, or not to threshold, is a choice between tw o kinds of error. We start the explanation by looking at how the low pass filter accomplishes its noise reduction. 4.2 L owpass F iltering : Muffling the Noise When we are sampling a signal impacted by noise, x = + we are, in one sense, taking a sample of the noise. In order t o illustrate how lowpass filtering muffles this noise, we are going to, for the moment consider a true signal that is zero everywhere That is i = 0 for all i thus x = Then g i ven a normally distributed noise that has a mean = 0 and variance 2 we will have a random variable X i ~ N (0, ) W e can then use the properties of variance (mentioned in chapter 3 ) to see what happens to the expected error from noise after lowpass fi ltering B efore we look at Haar's specific lowpass algorithm, let's first look at what happens to the noise induced variance on the calculated average of n samples of a RV X i ~ N (0, ) that is i i d Remember, variance has the property of, Var ( aX i + b ) = a 2 Var ( X i ) and Var ( X i ) = Var ( X i ) Thus, PAGE 73 64 If the variance 2 was associated with Gaussian white noise, then a noise level of would be cut in half after four samples. Now let's look at the algorithm for the low pass p ortion of the Haar wavelet transform. Remember that the Haar transform involves multiplying by the so as to normalize the transformation. Using j instead of n to indicate the iterative level, we get the following lowpass port ion of the transformed data for the first, seco nd, and third iterations : So, in general, the lowpass portion of the transformed data after j iterations is : Then, given a noise related variance of 2 on our blank signal we can determine that in the low pass portion of the Haar transformed data the variance due to noise is: So, it lo oks like the noise is unchanged. That is an illusion due in part to how an orthonormal t ransform preserves distances. B ut it must be remembered that at each iteration of the Haar wavelet transform, the range of the data in the low pass portion, compared to the range of data on the original domain, has been increased by a factor of Thus, the new noise level, relative to the scale of the original, is lower: PAGE 74 65 where j is the iterative level. So e ach iterative level j involves 2 j independent identically distributed data points. Since j = l og 2 2 j we can rewrite the previous equation as: where n = 2 j for j = 0, 1, 2, 3, . . Thus, if is the level of normal noise impacting our blank signal then the relative noise j on the low pass portion decreases with e ach iterative level j by a factor of The following figure graphs the noise reducing effect of the low pass filter (the line relates n as a real number, the bold points relate j as an integer). Figure 4.1 Noise level reducti ons of the Haar lowpass filter. It should be clear that putting a real signal back into our model will not change the effective error the noise causes. Then i f we s quar e j we will get the MSE( ) that exists in the lowpass portion of the data. Obviously the comparative noise level in lowpass data is less than the original noise level : j being the number of iterative l evels taken with the Haar transform. PAGE 75 6 6 4. 3 Highpass Filtering : The Highlights, and Less Noise too. Now let's discuss what the high pass portion of the Haar wavelet transform does with the same noise. Take a look at the highpass portion of the data after on e iteration : N = 1: If we go back to discussing just the noise, x = and remembering that the noise is normally distributed about a mean = 0, then we should understand that any X i is equally likely to be negative as it is positive. This fact wa s equivalently true for the low pass algorithm as well. What this effect ively means is with regards to noise these algorithms are really indistinguishable. T he normalized noise will be at the same level, in the highpass as it is in the lowpass. T hat does not mean that the relative MSE ( ) in the highpass has been reduced as it was in the lowpass. When it come s to real signal data the highpass is not a muffler on the original signal like the low pass wa s If we g o back to the discussion we had at the end of section 2.3, we' ll re call that t he diff erence between the highpass and lowpass filters is that the highpass filtering preserves the differences between data points whereas the lowpass takes the average s between data points If we r emember how bland the highpass b lock appeared (most of the hig hpass values being close to zero) it should be clear that if we were to compare the highpass data to the original image, it would have a much large r MSE ( ) In any regard, all the lowpass and highpass blocks of the Haar transform return the same independent, identically distributed nois e, ( mean = 0 and a noise level equal to the original noise level of ). Further the noise level in the highpass data has also had the same 'relative' decrease that the lowpass had ( a factor of at each iterative level) Figure 4. 2 ill ustrates some noise before and after a Haar transformation. We've taken a n eutral gray image, color value 12 7 ( on a grayscale of 0 to 255) and added noise to it i ~ (0, 32) We need to note considering the limitations in rendering grayscale images, that we've taken the same liberties we did before with the clown photo in adjusting the range of the transformed data so as to best communicate what is going on. You should PAGE 76 67 agree that the noise does appear to be reduced in all four blocks of the transfo rmed date. It has actually been cut in half /2 (a) Original noise (b) Haar transformed noise Figure 4 2 Noise before and after a Haar wavelet transformation. We move now onto the next section where we finally introduce the all important process of wavelet shrinkage 4.4 Wavelet Shrin kage Part I : O f Ideals, Oracles & Great Expectations. N ow that we understand how noise generally behaves after a Haar transformation, and how lowpass f iltering reduces noise the final piece of the process, from a denoising perspective is to explain ho w we are going to reduce this error in the highpass data. The method that we will use here is called wavelet shrinkage The method was develope d in large part by Stanford University professor David Donoho in collaboration with his colleague, professor Ii an Johnstone, as well as many others. The 1992 early work of Donoho and Johnstone, Ideal spatial adaptation by wavelet shrinkage [ 4 ] was the primary paper referenced in the development of this thesis. PAGE 77 68 We are now going to adapt our notation to refle ct the current context of working with the transformed data. We assume the following signal model for our highpass data: where y is the highpass observation vector, g is the perfect highpass vector from sampling our true signa l and is the noise vector ~ (0, 1). Note that this transformed noise has the same variance 2 as the original noise. Component wise, each i = 1, . n The rationa l e for calling the method "wavelet shrinkage" is that, ( a ), the data t hat will be manipulated the highpass data, are the coefficients that result ed from applying the wavelet algorithm, and ( b ), these wavelet coefficients are going to be shrunk down in mag nitude which is the real action that actually accomplishes the denoising. How much these wavelet coefficients will be shrunk, and in what manner they will be shrunk, is determined by a thresholding rule (i.e. a decision function/rule, ). Thresholding rules can be defined in many ways. There is hard thresholding soft thresholding, non negative garrote thresholding and firm thresholding. These four thresholding rules are defined and graphed in Table 4.1. Let t he variable y i repres ents the value of the wavelet coefficient, and the value where the thresholding rule takes effect All of the rules in some manner take the wavelet coefficient s and reduce their magnitude by some no nnegative amount (maybe zero). The ir subtle differen ces need to be appreciated. For example, both hard and soft thresholding will take a wavelet coefficient y i to zero if y i But if y i > 0, then where hard thresholding leaves y i alone, soft th resholding decreases y i s magnitude by Such differences may be significant in some cases. For example, a h ard thresholding treatment of a photograph may cause artifacts By having such a blunt dropoff will exclude the smaller detail coefficient s from the data. Consequently, the soft er changes existing in th e picture will tend to be homogenized by the hard thresholding process, leaving a gap in the range of potential values for those coefficients. The transitions that will remain will thus jump from nothing to something, looking harsher than before. Soft t hresholding won't do this. That does not mean that hard thresholding does not have its applications. Each PAGE 78 69 hard thresholding soft thresholding non negative garrote thresholding firm threshol ding Table 4.1: Wavelet shrinkage thresholding functions [ 5 p. 2 ] PAGE 79 70 technique has its strengths and weaknesses which need to be evaluated considering its intended use. For the purposes of this discussion, hard thresholding will be the rule that we will use, since it's very easy to apply. To be concise, the rule we would simply state that we are either keeping or killing the observed wavelet coefficient y i : What should be clear in all of these techniques is that the smaller coefficients are being shrunk more so than the larger coefficients. This is a key fact to appreciate. For if we have noise corrupting our signal, it is in the killing of these small wavelet coefficients that we have the best chance of improving our signal. Why? To answer that we have to real ly appreciate the subtleties of ( a) what the highpass data tends to look like, and ( b) how noise is affecting our data in general. Remembering our clow n photograph from earlier, we understand that the majority of the highpass data is near zero. Those zeros represent areas where there is much homogeneity in our color. If w e lost these small wavelet coefficients, we would really have a difficult time not ici ng any difference. We really should care mostly about keep ing the larger wavelet coefficients. T hat is where the stark changes in the details hit our eyes. W hen it comes to the noise, we need to remember that regardless of whether it is the highpass data or the lowpass data, it is still the same Gaussian white noise. We refer again to our original signal model, x = + Without a signal, the noise has a mean of = 0. With a signal, the noise has a mean of = g i So, if most of our transformed g i in the highpass data is near zero, it should be easy to connect the fact that most of the noise i n the highpass dat a will have a mean near zero as well. This means that most of the noise in the highpass data is very likely to be eliminated by thresholding. If we can select the right then we should be able to kill more bad noise than good data between and Th e question then is : "W hat should the value of be ?" PAGE 80 71 O ur quadratic loss function using the hard thresholding rule will be come where is our estimator of g i We r ecall that the value of the noise var iance, 2 after the transformation, has not changed in its absolute measure ( only in its relative measure, 2 / 2 j ) So we should understand that if we do nothing our for any p articular g i w ill just be 2 Therefore, when w e choose to threshold, we should do so only if we are going to make improvements to that error of 2 not if we sh ould make things worse. Then, in an ideal way, we need the following hard thresholding rule, for some This m eans that our mean square error ideally will be Thus the ideal that we should use call it is the actual noise level Figure 4. 3 graphs this ideal relationship with g i normalized to the noise Figure 4. 3 from ideal thresholding normalized to noise level PAGE 81 72 To achieve this level of quality, though, such an ideal minimization of our MSE, requires complete knowledge of the true highpass signal g If we knew th at, we wouldn't need to denoise at all This is why we call this ideal decision function with an oracle Since we really don't have an oracle, it would be good to figure out what our expected erro r will be. For a particular true highpass component g i (fixed), and all its possible observations y i we will have if we do not threshold, the following mean square error when observing g i : where the probability distribution function of y i g i is determined by the Gaussian white noise (noise level mean g i ): When using the hard thresholding rule, the calculation has three where the squared error a t any point is ( g i y i ) 2 and inclusive, where the squared error at any point is g i 2 and (c) above where the squared error at any point is again ( g i y i ) 2 Letting t i = g i y i where t i represents the observation y i 's relative pos ition to the mean g i we can represent this as: This calculation of assumes no "oracular" knowledge of the true g i An oracle would decide to keep or kill the observation not on the basis of the obse rved y i but on the basis of knowing how strong the true g i is. In the typical non ideal circumstance, PAGE 82 73 there may be instances where the hard thresholding rule kills the observation y i even though the true signal component g i was above the threshold Therefore, the above representation of is a value in the expected sense. To actually have oracular knowledge of g i would achieve the following ideal when calculating the for a particular g i : Since we have set the ideal threshold then either g i or g i > This simply means that the ideal for a particular g i is: This indicates that the orac le will, under all circumstances, never produce an ideal > 2 In fact, if the entire true highpass signal g is stronger than for all instances i than the ideal will be exactly equal to 2 If there are some instances i where the signal g i is weaker than the noise, then the ideal will definitely be less than 2 The differences between the ideal and the expected can be seen m athematically by noting the particular regions of integration where these calculations are actually different under the two specific circumstance s of the hard thresholding rule. When g i g i ) ( g i g i in th e ideal calculation being less than t i in the expected calculation, thus, PAGE 83 74 When g i > g i g i ], we have t i in the ideal calculation being less than g i in the expected calculation, thus: Otherwise, the expected and ideal accumulate the same square errors. Table 4. 2 quickly outlines a comparison of what makes for "ideal" results, and non ideal" results True Signal Observation Observer? Expected Loss Comment  g i  y i kill ideal  g i  y i  > keep non ideal  g i  >  y i  > keep ideal  g i  >  y i kill non ideal Table 4.2 : Comparis on of ideal and non ideal thresholding decisions. Shortly we will present a major conclusion that characterizes the expected MSE for all F irst, we want to add a technique that gives us an additional mathematical perspective when looking at these ideal and non ideal situations. We would like to be able to calculate the expected for a distinct sub set of probable y i observations. B e a ware that, for the moment, we a re focusing more on y i than g i T o correctly calculate the expected over a certain reduced domain, one needs to apply a statistical technique called conditional probability PAGE 84 75 Definition 4.1 Let S P ) be a probability space, and let B S with P B > 0 For and arbitrary A S we shall write and call the quantity so defined the conditional probability of A given B Conditional probability remains undefined when P ( B ) = 0 [ 16 p. 28 ] Basically we are treating the population t hat exists in the reduced domain of B as if it were 100% of the populati on. So far, we've been using calculations that consider the entire population to be existing between over a subset of this population, w e'll need to adjust our calculations by a specific factor so that it will treat the analyzed population existing between two limits as if it were everything So we just simply divide our regular result by the actual percentage of the population that exist s between those two limits. So, mathematically, let be the percent of the population that exists between a and b If we now incorporate into our calculating of the expected between a and b we'll get the correct value (otherwise we would only be getting the share of the total squared error that exists between a and b ): For example, in the domain of ( 3.5 ) ( 1.5 PAGE 85 76 We now present one of our m ajor example s that illustrates all the different perspectives on the expected MSE further highlighting differences between the ideal o r non ideal res ults Example 4.1. Let g i = 1 .5 T he n the oracle would ideally keep this g i re gardless of y i R ealistically, we may have the result that the observed y i is less than and so would be non ideally thresholded. F igure 4.4 illustrates this pictorially with a noise distribution overlaying the signal vector. The dark gray areas indicate the region where y i observations would be thresholded. Figure 4. 4 S ignal vector g i = 1.5 overlaid with a distribution of noisy observations y i PAGE 86 77 Letting then f or g i = 1.5 the expected mean square error of is: So our expected MSE for is more than 2 which the oracle would never let happen ( the ideal being 2 since g i > ). This is the reality of expectation. If w e instead use conditional probability to look at the individual ideal and non ideal sub regions, we would see results more representative of the theory In the idea l thresholding region  y i  > the is 0.810496 2 That is less than both 2 and g i 2 = 2.25 2 L ook ing where the thresholding is not ideal,  y i the n the is exactly 2.25 2 Table 4.3 provides a su mmary of this and similar results for various values of g i True Signal Oracle? ideal MSE expected MSE expected MSE  y i  < > ? expected MSE  y i  > 2.5 keep 1.00 2 1.16 2 6.25 2 > 0.80 2 2.0 keep 1.00 2 1.24 2 4.00 2 > 0.73 2 1.5 keep 1.00 2 1.25 2 2.25 2 > 0.81 2 1.0 kill 1.00 2 1.11 2 1.00 2 < 1.21 2 0.5 kill 0.25 2 0.90 2 0.25 2 < 1.99 2 0.0 kill 0.00 2 0.80 2 0.00 2 < 2.53 2 Table 4.3 : MSE calculations for various individual signal strengths g i PAGE 87 78 We now present one of our primary results : an illustration of how the expected performs in comparison to the oracle's ideal We start by generally restating the equation used in Example 4.1 but now for an arbitrary g i : (1) Figure 4.5 graphs this result (normalizing the g i 's to an arbitrary noise level ) Note that the oracle 's ideal as defined, will always outperform the expected result for any g i The dotted lines have been added in as a represent ation of a continuation of the respective parts of the oracle's hard thr esholding rule. Figure 4. 5 Comparison of expected to ideal thresholding at = It is interesting to see that for those g i 's near zero and near 2 our expectations are far from ideal. But near 1 and greater than 4 they are pretty close to ideal. We also would like to point out that this particular graph was based on choosing our threshold value for to be equal to that theoretical ideal of noise strength , being the best Then a question to ask is, w hat would happen if we changed our to a supposedly non ideal value like 1.2 or 0.8 ? PAGE 88 79 We begin to answer this question by first generalizing the equation in (1) further, but now for arbitrary as well as g i : (2) Figure 4.6 graphs this result for = 1.2 and 0.8 You can notice that if we increase we increase the expected error for those g i 's with a strength above 0.7 approximately. T he same increase in decreases the error for those g i 's weaker than 0.7 I n the other direction if we decrease we decrease our errors on g i stronger than 0.7 but increase the error on those weaker than 0.7 You can extend these trends to where you would increase to and kill everything ( MSE( ) = 1 n ), or decrease to 0 and keep everything ( MSE( ) = 2 ). This brings up another question. I s the noise level truly the ideal threshold to set to? Figure 4. 6 Effects on expected when ch anging threshold : = 0 0.8 1.0 1.20 PAGE 89 80 Ultimately, the performance of a particular hard thresholding rule will depend on two things: (a) the PDF of the g i 's in the signal you are trying to improve, and (b) the overall strength of your sign al in comparison to the noise. The implication of (b) should be clear. T he stronger your signal is, the lower you should set your threshold , in order to gain the best (i.e. lowest) overall expected The implication of (a) i s not so clear and will be discussed later. Before we leave this section, we would like to parse the analysis of the errors a little bit more We do so through the use of a more advanced oracle one which was conceived by the author of this thesis Whereas the original oracle prese nted does know where eac h component of the true signal is, it does not know the strength of any particular instance of noise. It only knows the general expected variance of the noise. We can design a stronger oracle than this by adding the knowledge of total noise condition W e would then have the following oracle, O 2 thresholding rule, (3) where giving a really is longer necessary Figure 4. 7 illustrates via the graph of a sine function, what is going on with the new oracle 2 and how it compare s to the original oracle 1 (oracle 1 is given a = ). PAGE 90 81 (a) Newly defined oracle 2 's keep and kill zones. (b) Original oracle 1 's keep and kill zones. Figure 4. 7 Comparison of two differently defined oracles PAGE 91 82 We can calculate a new ideal MSE usi ng our oracle 2 (note: the following calculation is valid for all g i 0. For g i < 0, we would need to change the signs on the bounds of integration that involve g i ): (4) We can then include this result with our earlier graphs of ideal and expected MSE. You can see in Figure 4.8 how ideal the new oracle 2 performs in comparison to oracle 1 Figure 4.8 Comparison of oracle 2 's ideal to oracle 1 and previous expectations. Next, we take oracle 2 and overlay the expected decisions of a hard thresholding rule set to = (see F igure 4.9) We can delineate zones where the expected thresholding of the y i observations would give us a good result (ideal) and other zones where it would give us a bad result (not ideal) PAGE 92 83 Figure 4.9 Oracle 2 overlaid with the expectat ions of a hard thresholding rule = Let us now ta ke a close up look at two g i 's, see F igure 4.10. O ne will be g i such that 2 g j < and the other one will g k such that 2 g k > Figure 4.10 Close up view of oracle 2 hard thresholding, and t wo g i 's, 2 g j and 2 g k > PAGE 93 84 L ook ing closely at these two g i 's, think about how increasing the threshold will affect our expected MSE cal culations. Does increasing the threshold improve the error for a particular g i or not? To answer that question, c onsider the following two calculations of the expected MSE (one for each g i ), each of which are partitioned into the five regions depicted in Figure 4.10: We should notice that for g k where 2 g k > an increase in the thre shold trades "good" error for "bad on one end ( increas es error ) and trades "bad" error for "good" on the other ( decreas es error ) We should also notice that once the threshold jumps past 2 g i ( the case of g j ) any increase in only reduces the error. In Figure 4.11 we present a graph of how the changes in response to a n increasing When = 0 it' s understandable that the for any g i is equal to 2 the noise What w e found is that for any g i > 0.7 096 increasing first increase s the MSE up to some peak amount and then decreases it down to a value equal to g i 2 ( a complete quadratic loss of that g i ). PAGE 94 85 Figure 4.11 Effects of an increasing on different strengths of g i 's. In Figure 4.12 we ta ke a cl oseup look at one of our earlier graphs ( Figure 4.6 which demonstrated how the expected is effected by a changing ) Looking at a few marked locations for a g i = 0.77 w e point out th e evidence of an increasing and th en decreasing error Not e how the error increase s as increases from 0.0 to 1.0 but then decrease s as increases from 1.0 to 1.2 (the peak actually occurs near = 0 99 ) Figure 4.12 Close up view of Figure 4.6, showing the effects of differ ent 's on a g i PAGE 95 86 The implication of all t his is that thresholding can not help any g i that is stronger than For any g i less than or equal to but greater than 0.7 096 thresholding does reduce the error, but only if is sufficiently large enough For anything below 0.7 096 any will reduce the error We'd like to mention that the determination of "0.7096" was made in Mathematica Through recursive calculation s we looked for the g i whose maximum MSE did not exceed 1.0 2 No theoretical assessm ent of the exact value has been made. That this value is near 0.7071, is a curiosity These particulars are being pointed out to bring across the idea that finding the best amounts to balancing the amount of "bad" error you would like to kill against the "good" error you would rather keep. As mentioned before, where the best may fall will be determined by the PDF of the particular signal's g i 's We will demonstrate this soon in section 4.6 b ut first we would like to pres ent one more perspective on the theoretical ideals of wavelet shrinkage 4.5 Wavelet Shrinkage Part II: A Tale of Two Errors We concluded the last section with a discussion of how increasing the value of the hard threshold can be looked upon as a trad e off between good errors and bad errors. We should be careful to realize that t his was occurring strictly with in the limited context of the highpass data T o fully assess the entire process, we also need to consider the errors that exist within the lowp ass data as well We reca ll that the noise level in the lowpass data is 2 /2. I f we were to replace our entire data set with just the lowpass data, then we would have reduced our noise down to 2 /2. We also recall from our first discussion on wavelets that the entire original data set, including all the noise, is recoverab le by a synthesis via an inverse Haar transform. That, of course, would only return us to the original noise level 2 In between these two extremes lies the compromise of thresholding that we just introduce d in the last section. PAGE 96 87 I n David Donoho's pape r, he refers to thresholding as "sele ctive wavelet reconstruction". When we synthesize our modified data set back together, we do so by only using some of the highpass wavelet coefficients, i.e., not threshold ing those values In t he other locations w he re we do threshold the wavelet coefficients to zero we are simply using the lowpass scaling coefficients to fill things in. Our selections will be based on our decision function, using some thresholding criteria Intuitively one can see that we are either accepting the original noise error by not thresholding, or accepting the reduce d noise error by thresholding (lowpass data) i.e., 2 or 2 /2. This brings us to an interesting observation. W e cannot actuall y get our any lower than what the l owpass data can already offer That impli es our is going to end up somewhere between 2 and 2 /2. This is quite contrary to the idea that reducing our error would be the only goal here. What w e need to realize is that a successful denoising process balances the desire of noise reduction against the need of information preservation If we simply wanted to get the most noise reduction w e w ould just iterate the lowp as s algorithm down to one d ata point, giving us the single value that represents the mean val ue of all the original data points. Theorem 3. 8 suggests that this value would give us the l east amount of MSE overall but w e sure ly understand that one data point can not really tell us anything So, what value should we select for our thresholding value, ? Based on our earlier discussion of an oracle, the ideal thresholding value should be the noise level Let's analyze this suggestion from the perspective of the lowpass data. First we review what we have so far with regards to the error from noise in the original and transformed data Given true data components i and j we have the following MSE's: PAGE 97 88 We know that if we do not th reshold that we would just get MSE( x i ) again. Then t he question we come to now is this: What would our error be on i if we do threshold? To answer that w e will need to evaluate the following: We do not want the above erro r to be any worse than our original 2 In order to see what situation would cause this to exceed 2 we evaluate the following integral: where is the joint PDF for two i.i.d. normal RVs X i ~ ( i 2 ) and X j ~ ( j 2 ) We start off by expanding the expression of the value being evaluated for expectation : Next, because we have the independence of X i and X j we can split the ir joint PDF and evaluate the following integral term by term : Most of the work in evaluating this integral will be pretty straight forward. All of the integrals of the individual terms will jump out right way as being the expected val ue of PAGE 98 89 that term. Many of those results we can use just as they are Two of the terms though, will require us to use a pr operty that we stated back in Theorem 3. 6 T hat Var ( X ) = E ( X 2 ) E ( X ) 2 T his will allow us to use the equality of E ( X 2 ) = 2 + E ( X ) 2 in a substitution, ( recall ing that Var ( X ) = 2 ). We now evaluate for purpose of limited exposition, just two of the terms Afterwards, we'll state the entire result. T he first te rm we evaluate is The next term we evaluate is : PAGE 99 90 We now present the final result of resolving the entire integral: This last quantity is greater than 2 when  i j  is greater than If i j = 0 then this value achieves a minimum of 2 which happens to be the exact variance of the noise in the lowpass data Thus This implies that the orac le can reduce the error by thresholding when ever where the highpass coefficient. Thus the oracle's chooses as its ideal threshold The two situations of thresholding and not thresholding ar e presented pictorially in F igures 4. 13 and 4. 14 respectively. PAGE 100 91 Figure 4. 13 The highpass data of i and j will be thresholded by the oracle Figure 4.14 The highpass data of i and j will not be thresholded by the oracle. Within the framewor k of thresholding the highpass data, t he oracle minimizes at Setting any lower will only give us more i 's with an error equal to 2 instead of something less. Setting it any higher will actually give us errors greater th an 2 In the end, the oracle will give us an somewhere in between 2 /2 or 2 We PAGE 101 92 must remember though that the se results are ideal because their the oracle 's results They are not actually the expect ed result We illustra te this issue with our example in the next section. 4. 6 Illustrative Example: Denoising a Corrupted Sine Function. We are now going to demonstrate our denoising techniques by trying to clean up a corrupted sin e function. First, we discuss some of the pr obabilistic aspects of the sine function. We consider the standard sine function, f ( x ) = sin x defined on the unit circle, 0 x < 2 and 1 sin x 1. We treat th e domain of sine X as a random variable that is uniformly distributed on 0 to 2 : Then sin X is a functi on of the RV X : f ( X ) = Y = sin X T he DF of Y is given by : Figure 4.15 y = sin x 0 x 2 T here are two sub regions in the domain where sin X satisfies the DF above for any y see F igure 4.15. Also we need to consider two different circumstances for the y 's : (a) when they are negative, and (b) when they are non negative: PAGE 102 93 We t hus calculate : Applying Theorem 3. 2 we determine that the probability density function of sin X is: (1) Figure 4.1 6 graphs the r esult from (1) PAGE 103 94 Figure 4.1 6 PDF of the sine function. T he n the v ariance of the s ine f unction is calculated as follows : Letting y = sin u then u = sin 1 y y 2 = sin 2 u = (1 cos 2 u )/2, and we get Letting v = 2 u then dv = 2 du and we conclude that Next we consider the s ine f unction in Gaussian w hite n oise Let y ( x ) = sin( x ) + ( x ), where the noise ~ (0,1) with PDF ( y ) The probability density function g ( y ), of this process is the convolution integral of the individual density functions [ 2 p. 61] We set the bounds to be 1 to 1 since PDF of the sine function is zero outside these bounds: PAGE 104 95 We let x = cos u Then dx = sin u du and cos 1 x = u thus cos 1 ( 1) = and cos 1 (1) = 0 Also x 2 = cos 2 u = 1 sin 2 u thus 1 x 2 = sin 2 u Substituting these equalities into the equation above, we get: Because sin u 0 on [0, ], we can simplify: In F igure 4.1 7 we graph this result for various noise levels Figure 4.1 7 PDFs of various noisy sine functions. PAGE 105 96 We no w come to our feature example, where we take a discrete sampling of a noisy sine function ( n = 256) and denoise it through our process of using the Haar Wavelet T ransform ation the hard thresholding rule, and set to the theoretical ideal of noise level Figure 4.1 8 shows the before and after of noisy sine function. (a) noisy sine function, = 2 = 0.010. (b) denoised sine function, = 0.006 Figure 4.1 8 The denoising of a corrupted sine fun ction, before and after. What about other values of ? Is really the best ? We now take t he previous example and vary for various level s of noise looking for the one value that gives us the minimum MSE for each noise level. The results are gra phed in F igure 4.1 9 PAGE 106 97 Figure 4.1 9 Finding the best for denoising a corrupted sine function with noise What we notice is that for low levels of noise, the best for denoising is below the noise level Then as the noise level increases, the ne cessary increase s relatively as well. It jumps above the noise level, to the point where the noise gets so strong that needs to annihilate the entire highpass data. How can we explain this? If we go back to our earlier result of the expected MSE, we can determine what the expected results for various 's should be. To accomplish this we take the inner product integral of the function that describes the expected MSE (for each ) and the function that describes the density of the highpass data That brings us to a question. What is the PDF of the hig hpass data of a sine function? In Figure 4.20 we show the graph of the lowpass and highpass data of a clean sine function. What you should notice is that the highpass data looks like a graph of a cosine function, only with much smaller amplitude. It is a reasonable assumption that highpass algorithm operates like the derivative on our original data (which is a function). We just need to recall that the highpass algorithm calculates the differences betwe en two adjacent data points. Taking that knowledge, along with the basic understanding of how the derivative of a continuous function is the result of a limiting process on the difference between close neighbors, then we should feel comfortable making thi s assumption. If so, PAGE 107 98 (a) lowpass data. (b) highpass data (assumed to behave as cosine) (c) lowpass and highpass combined. Figure 4. 20 The lowpass and highpass data of a clean sine function. PAGE 108 99 then we can say that the highpass data i s described by the PDF of the cosine function. It so happens that the cosine function has the same PDF as the sine function (we only need to adjust it for the smaller amplitude). W e are now ready to calculate the inner product of the se two functions for various 's and noise level 's. A few of these results are plotted in F igure 4. 21 along with corresponding plots of how the oracle would perform under a changing (dashed lines) The only assumption we make on the highpass data, so as to make the observations easy to see, is that the the amplitude of the cosine function is 1. Figure 4.2 1 Plots of the MSE for various levels of noise as varies from 0 to 4 What we notice now is that the best for those noise levels below 2 = is zero and that th e best for those noise levels above 2 = is This is not quite in agreement with our experimental results from earlier though i t is somewhat close to it In our experiment the best does stay relatively low up onto the point that it reaches the variance of the highpass data After that point, it does eventually take off to completely annihilate the highpass data. If we were to actually run our experiment for longer periods (i.e., more cycles of the sine function ) our results would eventually a pproach our theoretical expected result. For w hat we have actually created here is a stochastic PAGE 109 100 proce ss It' s ju st that we 've on ly done one run of the process If we do ever longer runs of the experiment, then the stochastic process will approach its th eoretical expected value There is another observation to make here. It looks like our should never be set to the theoretical ideal of What we need to realize is that the PDF of the cosi ne function, see F igure 4.2 2 (a), is very heavy on the values that are close to its amplitude. Otherwise, it' s very light especially near zero. If we then look at how the expected MSE graph behaves, see F igure 4.2 2 (b), we should realize that we need the PDF of the highpass data to be heavier towards the values th at are below 0.71 in order to have a net reduction of the MSE. (a) cosine PDF. (b) expected Figure 4.2 2 PDF of the cosine function. T he expected of the highpass data. One final observation to make about thi s illustrative example is that once the variance of the noise ( i.e., the noise level ) is equal to the variance of the signal's highpass data, 2 = Var (cos X ) = then the highpass signal has essentially been lost to the noise. PAGE 110 101 4.7 Summa ry & Conclusions Through the course of research ing for this discussion, there have been many interesting realizations made about the topics of statis tics, w avelets, and their applications. The sheer volume and depth of knowledge that is available to read on these subject s is quite astounding. The major observation s made here are that ideals, general expectations, and stochastic processes do not necessarily lead to the same conclusions. In Section 4.4 we demonstrated through equation (1) and Figure 4.5 how the theoretica l ideal MSE is often far lower than the generally expected MSE when we applied the hard thresholding rule onto t he highpass data In equation (2) and Figure 4.6 we extended this result to observe that varying our choice of threshold will affect the exp ected MSE in varying ways, depending on the particular g i being evaluated. In equation (3) and Figure 4.7 we introduced a new type of oracle for the simple purpose of illustrating how changing will affect the expected MSE calculation for any particular g i (see Figure s 4.10, 4.11, and 4.12 ) The larger point of all this was to communicate the idea that the effectiveness of "Wavelet Shrinkage and the best threshold to use, will be dependent upon the nature of the true signal's highpass data (i.e. the probability density function of a signal's highpass wavelet coefficients). In Section 4.5 we validated the original oracle's theoretical choice of the best being the noise level We followed up on this in Section 4.6 with an experimental investigati on In that section we took a simple sine wave corrupted it with some noise and then applied our wavelet techniques to denoise it Figure 4.18 illustrated one experimental run showing that our denoising techniques can reduce the MSE. The n in Figure 4 .19 we extended this single result after numerous experimental runs of varying noise and thresholding levels, to demonstrate where the best actually existed. What we found out was that rarely was the best to use. We then moved on to investigate whether the calculations of expected MSE for a denoised sine function would suggest a best to use. Those results were presented in Figure 4. 21 and indicated that the best PAGE 111 102 was either 0 or depending on the noise level. Like the oracle's theoretical ideal, these expected calculations did not agree with our experimental results either This lead to the hypothesis that for the given conditi ons of the experiment, the deeper complexities of a "stochastic process" need s t o be considered in determining a theoretical best For future research considerations, one promising direction to take this discussion in is to wards the challenge of obtaini ng some better algorithms for determin ing our thresholding values For example, can we simply look at a noisy signal's data set, and use the probability density of the highpass' raw data in a deterministic way? Beyond that, we acknowledge that this di scussion only touches the surface of what researchers have come up with in the field of denoising and wavelets. There are many other techniques that incorporate wavelets besides the Haar wavelet transform and hard thresholding In the papers by David Don oho (a central figure in this field) three comprehensive techniques are mentioned : RiskShrink [ 4 p. 440] VisuShrink [ 4 p. 444] and SUREShrink [ 3 pp. 1203 1208] Further, it is clear that to fully appreciate these papers will require a substantive study of "decision theory i n particular the decision theory topic of "minimax." W hat was also observed, in the referenced research papers for this discussion, was frequent mention of certain important function spaces (beyond the fundamental ones of Leb esgue and Hilbert). In Brani Vidakovic's text, "Statistical Modeling by Wavelets" [ 22 pp. 36 37] three function spaces are listed : Sobolev, Holder, and Besov. In one of David Donoho's papers [ 3 ] a fourth is also identified, Triebel. Finally it is cle ar that appreciating the "stochastic process" will be quite important to advanci ng this discussion further. Most of o ur experiences in life occur one event at a time. Thus we understand that normal expectations are typically not the norm PAGE 112 103 Referenc es [ 1 ] O. Bretscher, Linear Algebra with Applications 3rd ed., Pearson Prentice Hall, Upper Saddle River, NJ, 2005. [ 2 ] N. P. Cheremisinoff, Practical Statistics for Engineers and Scientists Technomic Publishing Company, Lancaster, PA, 1987. [ 3 ] D. L Donoho and I. M. Johnstone, Adapting to Unknown Smoothness via Wavelet Shrinkage Journal of the American Statistical Association, 90 (1995) no. 43 2 1200 1224. [ 4 ] D. L. Donoho and I. M. Johnstone, Ideal Spatial Adaptation by Wavelet Shrinkage Biome trika, 81 (1994) no. 3 425 455. [ 5 ] W. Fourati, F. Kammoun, and M. S. Bouhlel, Medical Image Denoising Using Wavelet Thresholding Journal of Testing and Evaluation, 33 (2005) no. 5, 1 6. [ 6 ] D. Gabor, Theory of Communication Journal of the Institution of Electrical Engineers, 93 (1946), Part III, no. 26, 429 457. [ 7 ] P. Goupillaud, Geophysicists Jean P. Morlet The SEG (Society of Exploration Geophysicists) Virtual Museum, 2006, Web site, http://www.mssu.edu/seg vm/bio_jea n_p__morlet.html, 7 Apr. 2010. [ 8 ] A. Haar, Zur Theorie der orthogonalen Funktionensysteme (Erste Mitteilung), Mathematische Annalen, 69 (1910), no. 3, 331 371. [ 9 ] T. N. Herzog, Introduction to Credibility Theory 3rd ed., ACTEX Publications, Winsted, CT, 1999. [ 10 ] J. Kahane and P. Lemari Rieusset, Fourier Series and Wavelets Gordon and Breach Pub lishers, Amsterdam,1995. PAGE 113 104 [ 11 ] D. W. Kammler, A First Course in Fourier Analysis Prentice Hall, Upper Saddle River, NJ, 2000. [ 12 ] N. Kingsbury The Haar Transform The Connexions Project 2005 Web site, h ttp://cnx.org/content/m11087/latest/ 9 Apr. 2 010. [ 13 ] J. N. McDonald and N. A. Weiss, A Course in Real Analysis Academic Press, San Diego, CA, 1999. [ 14 ] J. Morlet, G. Arens, E. Fourgeau, and D. Giard, Wave propagation and sampling theory Part I: Complex signal and scattering in multilayered medi a GEOPHYSICS, 47 (1982), no. 2, 203 221. [ 15 ] J. Morlet, G. Arens, E. Fourgeau, and D. Giard, Wave propagation and sampling theory Part II: Sampling theory and complex waves GEOPHYSICS, 47 (1982), no. 2, 222 236. [ 16 ] V. K. Rohatgi and A. K. Md. E. Sal eh, An Introduction to Probability and Statistics 2nd ed., Wiley Interscience, New York, NY, 2001. [ 17 ] W. Rudin, Principles of Mathematical Analys is 3rd ed., McGraw Hill, New York, NY, 1976. [ 18 ] W. Rudin, Real and Complex Analysis 3rd ed., WCB/McGraw Hill, Boston, MA, 1987. [ 19 ] C. E Shannon, Communication in the Presence of Noise Proceedings of the I RE 37 (1949), no. 1, 10 21. [ 2 0 ] H. G. Tucker, A Graduate Cou rse in Probability Academic Press, New York, NY, 1967. [ 21 ] P. J. Van Fleet, Discre te Wavelet Transformations: An Elementary Approach with Applications Wiley Interscience, Hoboken, NJ, 2008. [ 22 ] B. Vidakovic, Statistical Modeling by Wavelets Wiley Interscience, New York, NY, 1999. [ 23 ] D. F. Walnut, An Introduction to Wavelet Analys is Birkh user, Boston, MA, 2002. PAGE 114 105 Bibliography A. Antoniadis and G. Oppenheim, eds., Wavelets and Statistics Springer Verlag, New York, NY, 1995. M. Barlaud, ed., Wavelets in Image Communication Elsevier, Amsterdam, 1994. S. K. Bleiler, Orthog onal Filters and the Implications of Wrapping on Discrete Wavelet Transforms MA thesis, U. of South Florida, Tampa, FL, 2008. G. L. Bradley and K. J. Smith, Calculus Prentice Hall, Englewood Cliffs, NJ, 1995. O. Bretscher, Linear Algebra with Applicati ons 3rd ed., Pearson Prentice Hall, Upper Saddle River, NJ, 2005. A. G. Bruce and H. Gao, Understanding WaveShrink: Variance and B ias Estimation Biometrika, 83 (1996), no. 4, 727 745. R. Chandramouli, Wavelet Methods for Filtering and Hypothesis Testin g MA Thesis, U. of South Florida, Tampa, FL, 1999. N. P. Cheremisinoff, Practical Statistics for Engineers and Scientists Technomic Publishing Company, Lancaster, PA, 1987. D. L. Donoho and I. M. Johnstone, Adapting to Unknown Smoothness via Wavelet Sh rinkage Journal of the American Statistical Association, 90 (1995), no. 432, 1200 1224. D. L. Donoho, De Noising by Soft Thresholding IEEE Transactions on Information Theory, 41 (1995), no. 3, 613 627. D. L. Donoho and I. M. Johnstone, Ideal Spatial Ad aptation by Wavelet Shrinkage Biometrika, 81 (1994), no. 3, 425 455. D. L. Donoho and I. M. Johnstone, Minimax Estimation via Wavelet Shrinkage The Annals of Statistics, 26 (1998), no. 3, 879 921. PAGE 115 106 D. L. Donoho and I. M. Johnstone, Neo Classical Minimax Problems, Thresholding, and Adaptation Bernoulli, 2 (1996), no. 1, 39 62. D. L. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard, Wavelet Shrinkage: Asymptopia? Journal of the Royal Statistical Society, Series B (Methodological), 57 (1995), no. 2, 301 369. D. L. Donoho, Wavelet Shrinkage and W.V.D.: A 10 minute tour Progress in Wavelet Analysis and Applications (Toulouse, France) (Y. Meyer and S. Rogues, eds.), Editions Frontiers, 1992, pp. 109 128. W. Fourati, F. Kammoun, and M. S. Bouhlel, Medical Image Denoising Using Wavelet Thresholding Journal of Testing and Evaluation, 33 (2005), no. 5, 1 6. V. T. Franques, Wavelet based Embedded Zerotree Coding of Images: Software Implementation Diss., U. of South Florida, Tampa, FL, 1997. J. H. Fr anz and V. K. Jain, Optical Communications: Components and Systems CRC Press, Boca Raton, FL, 2000. D. Gabor, Theory of Communication Journal of the Institution of Electrical Engineers, 93 (1946), Part III, no. 26, 429 457. H. Gao and A. G. Bruce, Wave shrink with Firm Shrinkage Statistica Sinica, 7 (1997), no. 4, 855 874. R. C. Gonzalez and R. E. Woods, Digital Image Processing 2nd ed., Prentice Hall, Upper Saddle River, NJ, 2002. P. Goupillaud, Geophysicists Jean P. Morlet The SEG (Society of Ex ploration Geophysicists) Virtual Museum, 2006, Web site, http://www.mssu.edu/seg vm/bio_jean_p__morlet.html, 7 Apr. 2010. A. Haar, Zur Theorie der orthogonalen Funktionensysteme (Erste Mitteilung), Mathematische Annalen, 69 (1910), no. 3, 331 371. F. R. Hampel, The Influence Curve and Its Role in Robust Estimation Journal of the American Statistical Association, 69 (1974), no. 346, 383 393. PAGE 116 107 T. N. Herzog, Introduction to Credibility Theory 3rd ed., ACTEX Publications, Winsted, CT, 1999. J. Kahane and P. Lemari Rieusset, Fourier Series and Wavelets Gordon and Breach Publishers, Amsterdam,1995. D. W. Kammler, A First Co urse in Fourier Analysis Prentice Hall, Upper Saddle River, NJ, 2000. N. Kingsbury The Haar Transform The Connexions Project 2005 Web site, h ttp://cnx.org/content/m11087/latest/ 9 Apr. 2010. J. N. McDonald and N. A. Weiss, A Course in Real Analysis Academic Press, San Diego, CA, 1999. J. Morlet, G. Arens, E. Fourgeau, and D. Giard, Wave propagation and sampling theory Part I: Complex signal and scattering in multilayered media GEOPHYSICS, 47 (1982), no. 2, 203 221. J. Morlet, G. Arens, E. Fourgea u, and D. Giard, Wave propagation and sampling theory Part II: Sampling theory and complex waves GEOPHYSICS, 47 (1982), no. 2, 222 236. K. R. Namuduri, Gabor Filter based Computational Models for Low Level Vision Diss., U. of South Florida, Tampa, FL, 1 992. A. Pizurica, Image Denoising Using Wavelets and Spatial Context Modeling Diss., Universiteit Gent, Gent, Belgium, 2002. V. N. Ramaswamy, Lossless Image Compression Using Wavelet Decomposition Diss., U. of South Florida, Tampa, FL, 1998. V. K. Roh atgi and A. K. Md. E. Saleh, An Introduction to Probability and Statistics 2nd ed., Wiley Interscience, New York, NY, 2001. H. L. Royden, Real Analysis 3rd ed., Pearson Education Prentice Hall, Upper Saddle River, NJ, 1988. W. Rudin, Principles of Math ematical Analysis 3rd ed., McGraw Hill, New York, NY, 1976. PAGE 117 108 W. Rudin, Real and Complex Analysis 3rd ed., WCB/McGraw Hill, Boston, MA, 1987. C. E Shannon, Communication in the Presence of Noise Proceedings of the I RE 37 (1949), no. 1, 10 21. C. E. Shannon and W. Weaver, The Mathematical Theory of Communication The University of Illinois Press, Urbana, IL, 1949. C. M. Stein, Estimation of the Mean of a Multivariate Normal Distribution The Annals of Statistics, 9 (1981), no. 6, 1135 1151. C. Stein, Inadmissibility of the Usual Estimator for the Mean of a Multivariate Normal Distribution Proceedings o f the Third Berkeley Symposium on Mathematical Statistics and Probability (1954/1955), Volume 1: Contributions to the Theory of Statistics, University of California Press, Berkeley, CA, 1956, pp. 197 206. G. Strang and T. Nguyen, Wavelets and Filter Banks Wellesley Cambridge Press, Wellesley, MA, 1996. C. Taswell, Experiments in Wavelet Shrinkage Denoising Journal of Computational Methods in Sciences and Engineering, 1 (2001), no. 2s 3s, 315 326. C. Taswell, The What, How, and Why of Wavelet Shrinkage Denoising Computing in Science and Engineering, 2 (2000), no. 3, 12 19. H. G. Tucker, A Graduate Course in Probability Academic Press, New York, NY, 1967. C. Valens, A Really Friendly Guide to Wavelets PolyValens.com, 1999, Web site, http://pagesperso orange.fr/polyvalens/clemens/wavelets/wavelets.html, 3 Apr. 2010. P. J. Van Fleet, Discrete Wavelet Transformations: An Elementary Approach with Applications Wiley Interscience, Hoboken, NJ, 2008. B. Vidakovic, Statistical Modeling by Wavelets Wiley Interscience, New York, NY, 1999. D. F. Walnut, An Introduction to Wavelet Analysis Birkh user, Boston, MA, 2002. A. B. Watson, ed., Digital Images and Human Vision Bradford Books, Cambridge, MA, 1993. PAGE 118 109 Appendices PAGE 119 110 Appendix A: Sel ected Mathematica Code Figure 1.2 The Fourier series approximation s of the sawtooth function ( n = 1, 3, 5). FourierSawtooth1 = Plot[ (2/Pi)*Sum[Sin[k*Pi*t]*( 1)^k/k,{k,1,1}], {t, 12/Pi,12/Pi}] FourierSawtooth3 = Plot[ (2/Pi)*Sum[Sin[k*Pi*t]*( 1)^k/k,{k,1,3}], {t, 12/Pi,12/Pi}] FourierSawtooth5 = Plot[ (2/Pi)*Sum[Sin[k*Pi*t]*( 1)^k/k,{k,1,5}], {t, 12/Pi,12/Pi}] Figure 2.1 The wavelet series approximation s of the sawtooth function ( n 1). HaarSawtoothNeg1 = Plot[.5*SquareWave[t/2], {t, 12/Pi,12/Pi}] HaarSawtoothZero = Plot[.5*SquareWave[t/2] + Sum[ 2^( j 2)*SquareWave[(2^j)*t],{j,0,0}], {t, 12/Pi,12/Pi}] PAGE 120 111 Appendix A: (Continued) HaarSawtoothOne = Plot[.5*SquareWave[t/2] + Sum[ 2^( j 2)*SquareWave[(2^j)*t],{j,0,1}], {t, 12/Pi,12/Pi}] Figure 2.4 Complete image before and after a Haar wavelet transformation. << DiscreteWavelets`DiscreteWavelets` gray = ImageNam es[ImageType >GrayScale,ListThumbnails > True]; clown = ImageRead[gray[[6]]]; clown1 = Take[clown, {4 1, 200}, {1, 160}]; ImagePlot[clown1] {clownrows, clowncols} = Dimensions[clown1]; b = {Sqrt[2]/2,Sqrt[2]/2}; d = Table[0,{clownrows 2}]; f = Join[b,d ]; h = Table[RotateRight[f,2k],{k,0,Length[f]/2 1}]; a = {Sqrt[2]/2, Sqrt[2]/2}; c = Table[0,{ clownrows 2}]; e = Join[a,c]; g = Table[RotateRight[e,2l], {l,0,Length[e]/2 1}]; haar = ArrayFlatten[{{h},{g}}]; clown1transformed = haar.clown1.Transpo se[haar]; PAGE 121 112 Appendix A: (Continued) clown1blu r = Take[clown1transformed,{1,80},{1, 80}]; adjustclown1blur = .5*clown1blur; gray127quarter = Table[255,{i,1,80},{j,1,80}]; clo wn1hd = Take[clown1transformed,{81,160},{1, 80}]; adjustclown1hd = .5*(clown 1hd+ gray127quarter); clo wn1vd = Take[clown1transformed,{1,80},{81, 160}]; adjus tclown1vd = .5*(clown1vd+ gray127quarter); clo wn1dd = Take[clown1transformed,{81,160},{81, 160}]; adjustclown1dd = .5*(clown1dd+ gray127quarter); clownHWT = A rrayFlatten[{{a djustclown1blur, adjustclown1vd}, {adjustclown1hd, adjustclown1dd}}]; imageclownHWTA = ImagePlot[clownHWT] Figure 3.2 Linear l og p lot s of some colored n oise spectral power densities. pink = LogLinearPlot[ 10*Log[10, 10^2/ (2 x )],{x,100, 22000}]; blue = LogLinearPlot[10*Log[10,2x/10^6],{x,1 00 22000}]; white = LogLinearPlot[10*Log[10 ,1/(2^5)],{x,1 00 22000}]; colornoise = Show[pink,blue, white] PAGE 122 113 Appendix A: (Continued) Figure 4.2 Noise before and after a Haar wavelet tran sformation << DiscreteWavelets`DiscreteWavelets` gray127 = Table[127,{i,1,160},{j,1, 160}]; gray127quarter = Table[255,{i,1,80},{j,1, 80}]; nd = NormalDistribution[0,1 ]; SeedRandom[]; noise = Table[Random[nd],{i,1,160},{j,1, 160}]; sigma = 32; (*e nter a choice for noise level sigma*) sigmanoise = sigma*noise; noisygray127 = gray127 + sigmanoise; ImagePlot[noisygray127] noisygray127transformed = haar.noisygray127.Transpose[haar]; noisygray127blur = Take[noisygray127transformed, {1, 80}, {1, 80}]; adjust noisygray127blur = .5*noisygray127blur; noisygray127hd = Take[noisygray127transformed, {81, 160}, {1, 80}]; adjustnoisygray127hd = .5*(noisygray127hd + gray127quarter); noisygray127vd = Take[noisygray127transformed, {1, 80}, { 81, 160}]; adjustnoisygray127vd = .5*(noisygray127vd + gray127quarter); PAGE 123 114 Appendix A: (Continued) noisygray127dd = Take[noisygray127transformed, {81, 160}, {81, 160}]; adjustnoisygray127dd = .5*(noisygray127dd + gray127quarter); noisygray127HWT = Ar rayFlatten[{{adjustnoisygray127blur adjustnoisygray127vd}, {adjustnoisygray127hd, adjustnoisygray127dd}}]; imagenoisygray127HWT = ImagePlot[noisygray127HWT] Table 4.3 : MSE calculations for various individual signal strengths, g i sigma= 1; (*enter a choice for noise level sigma*) lambda=sigma; (*set threshold lambda*) g=2.5*sigma; (*enter a choice for "g" to evaluate*) 1/(sigma*Sqrt[2*Pi])*NIntegrate[1/Exp[t^2/(2*sigma^2)],{t,a,b}] PAGE 124 115 Appendix A: (Continued) 1/(sigma*Sqrt[2*Pi])* (NIntegrate[ t ^2/Exp[ t ^2/(2*sigma^2)], { t, Infinity, g lambda}] + NIntegrate[ t ^2/Exp[ t ^2/(2*sigma^2)], { t, g+lambda, Infinity}])/ (1/(sigma*Sqrt[2*Pi])* ( NIntegrate[1/Exp[ t ^2/(2 *sigma^2)], { t, Infinity, g lambda}] + NIntegrate[1/Exp[ t ^2/(2*sigma^2)], { t, g+lambda, Infinity}]) ) Figure 4.5 Comparison of expected to ideal t hresholding at = sigma=1; (*enter a choice for noise level sigma*) lambda=sigma; (*set threshold lambda*) Plot[1/(sigma*Sqrt[2*Pi])* (NIntegrate[ t ^2/Exp[ t ^2/(2*sigma^2)], { t Infinity, lambda g }] + NIntegrate[ g ^2/Exp[ t ^2/(2*sigma^2)], { t lambda g lambda g }] + NIntegrate[ t ^2/Exp[ t ^2/(2*sigma^2)], { t ,lambda g ,Infinity}]), { g ,0,5}] PAGE 125 116 Appendix A: (Continued) Figure 4.8 Comparison of oracle 2 's ideal to oracle 1 and previous expectations. sigma=1; (*enter a choice for noise level sigma*) Plot[1/(sigma*Sqrt[2*Pi])* (NIntegrate[ g ^2/Exp[ t ^2/(2*sigma^2)],{ t Infinity, g }] + NIntegrate[ t ^2/Exp[ t ^2/(2*sigma^2)],{ t g g }] + NIntegrate[ g ^2/Exp[ t ^2/(2*sigma^2)],{ t g ,I nfinity}]), { g ,0,5}] Figure 4.11 Effects of an increasing on different strengths of g i 's. sigma=1; (*enter a choice for noise level sigma*) g = 0 .55 *sigma ; (* enter a choice for "g" to plot*) Plot[1/(sigma*Sqrt[2*Pi])* ( NIntegrat e[ t ^2/Exp[ t ^2/(2*sigma^2)],{ t Infinity, g lamb da}] + NIntegrat e[g^2/Exp[ t ^2/(2*sigma^2)],{ t g lambda, g+lambda}] + NIntegrat e[ t ^2/Exp[ t ^2/(2*sigma^2)],{ t g + lambda, Infinity}] ) {lambda, 0, 5* g }] PAGE 126 117 Appendix A: (Continued) Figure 4.17 PDFs of various noisy sine functions. sigma = 0. 1; (*enter a choice for noise level sigma*) minrange = 8; (*enter lower limit of evalutation*) maxrange = 8; (*enter upper limit of evalutation*) Table[1/(sigma*Pi*Sqrt[2*Pi])* Integrate[Exp[ (y Cos[x]) ^2 /( 2 *sigma ^2 ) ],{x,0,Pi }], {y,minrange+0.01/2,maxrange 0.01/2,0.01}] Figure 4.18 The denoising of a corrupted sine function, before and after. << DiscreteWavelets`DiscreteWavelets` cleansine = Table[Sin[2Pi*(2x 1)/(2*256)],{x,1,256}]; CleanSampleSine = ListLinePlot[clea nsine]; nd=NormalDistribution[0,1]; SeedRandom[]; noise=Table[Random[nd],{k,1,256}]; sigma=0.1; (*enter choice of noise level sigma *) sigmanoise = sigma*noise; noisysine = cleansine+sigmanoise; NoisySampleSine = ListPlot[noisysine]; PAGE 127 118 Appendix A: (Continued) noisysinetransformed = HWT1D1[noisysine]; noisysinetransformedlist = WaveletVectorToList[noisysinetransformed]; noisysinelowpass = First[noisysinetransformedlist]; noisysinehighpass = Flatten[Drop[noisysinetransformedlist,1]]; lambd a = sigma; denoisedsinehighpass = Map[ShrinkageFunction[#,lambda]&,noisysinehighpass]; denoisedtransformedsine = Join[noisysinelowpass,denoisedsinehighpass]; denoisedsine = IHWT1D1[denoisedtransformedsine]; DenoisedSampleSine = ListPlot[denoisedsi ne]; Show[CleanSampleSine,NoisySampleSine] Show[CleanSampleSine,DenoisedSampleSine] Figure 4.19 Finding the best for denoising a corrupted sine function with noise << DiscreteWavelets`DiscreteWavelets` cleansine = Table[Sin[2Pi*(2x 1)/(2*256 )],{x,1,256}]; cleansinetransformed = HWT1D1[cleansine]; setdetail = .001; (*set increments for table*) PAGE 128 119 Appendix A: (Continued) MSE = Table[ Table[ Norm[ cleansinetransformed J oin[ First[WaveletVectorToList [HWT1D1[cleansine+sigma*noise]]], Map[ShrinkageFunction[#,lambda]&, Flatten[Drop[WaveletVectorToList [HWT1D1[cleansine+sigma*noise]],1]]]] ]^2/256, {sigma,0,0.1,setdetail} ], {lambda,0,0.3,setdetail}]; {rows, cols} = Dimensions[MSE] ; MSEp = Par tition[Flatten[Transpose[MSE]], rows]; BestLambda = Flatten[ Table[Min[Position[ Flatten[Drop[Drop[MSEp,k],k (cols 1)]], Min[Drop[Drop[MS Ep,k],k (cols 1)]] ] 1],{k,0,cols 1}] ; ] PAGE 129 120 LambdaAnnihilates = Table[Min[Position[ Flatten[Drop[Drop[MSEp,k],k (cols 1)]], Max[Drop[ Flatten[Drop[Drop[MSEp,k],k (cols 1)]], Min[Position[ Fla tten[Drop[Drop[MSEp,k],k (cols 1)]], Min[Drop[Drop[MSEp,k],k (cols 1)]] ] 1] ]] ] 1],{k,0,(cols 1)}] ; LambdaEqual Sigma = Table[y,{y,0,(cols 1)}] ; BestLambdaPlot = ListLinePlot[{BestLambda, LambdaAnnihilates, Lambd aEqual Sigma}] Figure 4.20 The lowpass and highpass data of a clean sine function. << DiscreteWavelets`DiscreteWavelets` cleansine = Table[Sin[2Pi*(2x 1)/(2*256)],{x,1,256}]; cleansinetransformed = HWT1D1[cleansine]; cleansinetra nsformedlist = WaveletVectorToList[cleansinetransformed]; cleansinelowpass = First[cleansinetransformedlist]; cleansinehighpass = Flatten [Drop[cleansinetransformedlist, 1]]; ListLinePlot[cleansinelowpass] ListLinePlot[cleansinehighpass] ListLinePlot[cleansinetransfo rmed] PAGE 130 121 Appendix A: (Continued) Figure 4.21 Plots of the MSE for various levels of noise as varies from 0 to 5 Plot[1/(sigma*Sqrt[2*Pi]) NIntegrate[1/(Pi*Sqrt[amplitude^2 g ^2]) (NIntegrate[ t ^2/Exp[ t ^2/(2*sigma^2)], { t Infinity, g lambda}] + NIntegrate[ g ^2/Exp[ t ^2/(2*sigma^2)], { t g lambda, g +lambda}] + NIntegrate[ t ^2/Exp[ t ^2/(2*sigma^2)], { t g +lambda,Infinity}]), { g amplitude,amplitude}] {lambda,0,5.2}] xml version 1.0 encoding UTF8 standalone no record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd leader nam 22 Ka 4500 controlfield tag 007 crbnuuuuuu 008 s2010 flu s 000 0 eng d datafield ind1 8 ind2 024 subfield code a E14SFE0003415 035 (OCoLC) 040 FHM c FHM 049 FHMM 090 XX9999 (Online) 1 100 DeNooyer, EricJan. 0 245 Statistical idealities and expected realities in the wavelet techniques used for denoising h [electronic resource] / by EricJan DeNooyer. 260 [Tampa, Fla] : b University of South Florida, 2010. 500 Title from PDF of title page. Document formatted into pages; contains X pages. 502 Thesis (M.A.)University of South Florida, 2010. 504 Includes bibliographical references. 516 Text (Electronic thesis) in PDF format. 538 Mode of access: World Wide Web. System requirements: World Wide Web browser and PDF reader. 3 520 ABSTRACT: In the field of signal processing, one of the underlying enemies in obtaining a good quality signal is noise. The most common examples of signals that can be corrupted by noise are images and audio signals. Since the early 1980's, a time when wavelet transformations became a modernly defined tool, statistical techniques have been incorporated into processes that use wavelets with the goal of maximizing signaltonoise ratios. We provide a brief history of wavelet theory, going back to Alfrd Haar's 1909 dissertation on orthogonal functions, as well as its important relationship to the earlier work of Joseph Fourier (circa 1801), which brought about that famous mathematical transformation, the Fourier series. We demonstrate how wavelet theory can be used to reconstruct an analyzed function, ergo, that it can be used to analyze and reconstruct images and audio signals as well. Then, in order to ground the understanding of the application of wavelets to the science of denoising, we discuss some important concepts from statistics. From all of these, we introduce the subject of wavelet shrinkage, a technique that combines wavelets and statistics into a "thresholding" scheme that effectively reduces noise without doing too much damage to the desired signal. Subsequently, we discuss how the effectiveness of these techniques are measured, both in the ideal sense and in the expected sense. We then look at an illustrative example in the application of one technique. Finally, we analyze this example more generally, in accordance with the underlying theory, and make some conclusions as to when wavelets are an effective technique in increasing a signaltonoise ratio. 590 Advisor: Catherine Bnteau, Ph.D. 653 Expected value Filters Fourier series Mean square error Wavelet shrinkage 690 Dissertations, Academic z USF x Mathematics and Statistics Masters. 773 t USF Electronic Theses and Dissertations. 4 856 u http://digital.lib.usf.edu/?e14.3415 