USF Libraries
USF Digital Collections

Vector flow model in video estimation and effects of network congestion in low bit-rate compression standards

MISSING IMAGE

Material Information

Title:
Vector flow model in video estimation and effects of network congestion in low bit-rate compression standards
Physical Description:
Book
Language:
English
Creator:
Ramadoss, Balaji
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla.
Publication Date:

Subjects

Subjects / Keywords:
h.263
gradient vector flow
video compression
deformable super quadrics
congestion control
video segmentation
medical imaging
network behavior
Dissertations, Academic -- Electrical Engineering -- Masters -- USF   ( lcsh )
Genre:
government publication (state, provincial, terriorial, dependent)   ( marcgt )
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Summary:
ABSTRACT: The use of digitized information is rapidly gaining acceptance in bio-medical applications. Video compression plays an important role in the archiving and transmission of different digital diagnostic modalities. The present scheme of video compression for low bit-rate networks is not suitable for medical video sequences. The instability is the result of block artifacts resulting from the block based DCT coefficient quantization. The possibility of applying deformable motion estimation techniques to make the video compression standard (H.263) more adaptable for bio-medial applications was studied in detail. The study on the network characteristics and the behavior of various congestion control mechanisms was used to analyze the complete characteristics of existing low bit rate video compression algorithms. The study was conducted in three phases. The first phase involved the implementation and study of the present H.263 compression standard and its limitations. The second phase dealt with the analysis of an external force for active contours which was used to obtain estimates for deformable objects. The external force, which is termed Gradient Vector Flow (GVF), was computed as a diffusion of the gradient vectors associated with a gray-level or binary edge map derived from the image. The mathematical aspect of a multi-scale framework based on a medial representation for the segmentation and shape characterization of anatomical objects in medical imagery was derived in detail. The medial representations were based on a hierarchical representation of linked figural models such as protrusions, indentations, neighboring figures and included figures--which represented solid regions and their boundaries. The third phase dealt with the vital parameters for effective video streaming over the internet in the bottleneck bandwidth, which gives the upper limit for the speed of data delivery from one end point to the other in a network. If a codec attempts to send data beyond this limit, all packets above the limit will be lost. On the other hand, sending under this limit will clearly result in suboptimal video quality. During this phase the packet-drop-rate (PDR) performance of TCP(1/2) was investigated in conjunction with a few representative TCP-friendly congestion control protocols (CCP). The CCPs were TCP(1/256), SQRT(1/256) and TFRC (256), with and without self clocking. The CCPs were studied when subjected to an abrupt reduction in the available bandwidth. Additionally, the investigation studied the effect on the drop rates of TCP-Compatible algorithms by changing the queuing scheme from Random Early Detection (RED) to DropTail.
Thesis:
Thesis (M.S.E.E.)--University of South Florida, 2003.
Bibliography:
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
Statement of Responsibility:
by Balaji Ramadoss.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 76 pages.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001441466
oclc - 54018070
notis - AJM5906
usfldc doi - E14-SFE0000139
usfldc handle - e14.139
System ID:
SFS0024835:00001


This item is only available as the following downloads:


Full Text

PAGE 1

VECTOR FLOW MODEL IN VIDEO ESTIMATION AND EFFECTS OF NETWORK CONGESTION IN LOW BIT-RATE COMPRESSION STANDARDS by BALAJI RAMADOSS A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering Department of Electrical Engineering College of Engineering University of South Florida Major Professor: Wilfrido A. Moreno, Ph.D. James T. Leffew, Ph.D. Wei Qian, Ph.D. Date of Approval: October 16, 2003 Keywords: h.263, compression, deformable super quadrics, video segmentation, medical imaging, network behavior Copyright 2003 Balaji Ramadoss

PAGE 2

Dedication This thesis is dedicated to my parent s and teachers for their continuous support and motivation.

PAGE 3

Acknowledgments I would like to express my sincere thanks and appreci ation to Dr. Wilfrido A. Moreno for his guidance and tute lage throughout this endeavor. I would also like to sincerely thank Dr James T. Leffew and Dr. Wei Qian for being available with time and usef ul advice as committee members. My heart-felt personal thanks to Dr Wilfrido A. Moreno for supporting and sponsoring this research venture.

PAGE 4

i Table of Contents List of Tables iv List of Figures v Abstract vii Chapter 1 Introduction 1 1.1 Objectives 4 Chapter 2 Video Compression 5 2.1 H.263: An Example 8 2.2 H.263 Encoder 9 2.2.1 Motion Estimation and Compensation 10 2.2.2 Discrete Cosine Transform(DCT) 10 2.2.3 Quantization 11 2.2.4 Entropy Encoding 11 2.2.5 Frame Store 12 2.3 H.263 Decoder 12 2.3.1 Entropy Decoder 13 2.3.2 Rescale 13 2.3.3 Inverse Discrete Cosine Transform 13 2.3.4 Motion Compensation 13 2.4 Block-Matching Motion Compensation 14

PAGE 5

ii 2.4.1 Frame Based Block-Matchi ng Motion Compensation 14 2.4.1.1 Fixed Size Block Matching (FSBM) 14 2.4.1.2 Variable Size Block Matching (VSBM) 16 2.4.2 Object Based Block-Matc hing Motion Compensation 20 2.5 Implementation 21 Chapter 3 Deformable Model and Shape Analysis 23 3.1 Parametric Snake Model 25 3.2 Example and Behavior of Traditional Snakes 27 Chapter 4 Gradient Vector Flow 30 4.1 Snake: Introduction 30 4.1.1 Edge Map 31 4.2 Mathematical Model 32 4.3 Numerical Implementation 34 4.4 MATLAB Results 36 Chapter 5 Medical Applications 38 5.1 Deformable Super Quadrics 38 5.2 Medial Representation of Objects 42 5.3 Single-Figure Description via M-Rep 42 5.4 Discretizing Figural Segments 44 Chapter 6 Compressed Video Over Networks 49 6.1 Effect of Congestion 49 6.1.1 Congestion Mechanisms: Environment of the Study 50 6.2 Congestion Control Mechanisms 52

PAGE 6

iii 6.2.1 Drop Rate Analysis 52 6.2.2 Observations 54 6.3 Behavior Analysis of Slowly Responsive Congestion Control Algorithms 55 6.3.1 Long Term Fairness 55 6.3.2 TCP(Reno) 56 6.3.3 TCP(NEW RENO) 57 6.3.4 TCP(VEGAS) 58 Chapter 7 Results And Further Work 60 7.1 Conclusion 60 7.2 Possible Future Work 61 References 62 Bibliography 65

PAGE 7

iv List of Tables Table 1.1: Multimedia Data Types and Uncomp ressed Storage Space Requirements 2 Table 2.1: Compressed Video Results 21

PAGE 8

List of Figures Figure 1.1: Overview of The Coder/Decoder and Network Under Study 3 Figure 2.1: Video Coder/Decoder Module 8 Figure 2.2: H.263 Encoder 9 Figure 2.3: H.263 Decoder 12 Figure 2.4: FSBM Example 16 Figure 2.5: VSBM Example 18 Figure 2.6: FSBM Block Structure 19 Figure 2.7: VSBM Block Structure 20 Figure 3.1: U Shaped Object 27 Figure 3.2: Iterations and Convergence 28 Figure 4.1: Test Image With its Traditional Forces 36 Figure 4.2 Iterations 36 Figure 4.3: U Shaped Test Object Undergoing GVF Iterations 37 Figure 4.4: Iterations on the Block Test Image and its Convergence 37 Figure 5.1: Examples of Slab Like and Tubular Figures 44 Figure 5.2: 2D Medial M Represents a Double Tangency 45 of a Circle to The Boundary Figure 5.3: An End Atom is a Medial Atom With an Additional Component 45 v

PAGE 9

vi Figure 5.4: Two Consecutive Frames Where The Image Has Undergone Changes 47 Figure 5.5: Iterations and Conve rgence of Frame 1 48 Figure 5.6: Iterations and Conve rgence of Frame 2 48 Figure 6.1: Network Topology 50 Figure 6.2: Drop Rate For Sl owcc Algorithm Using The RED Queuing Scheme TCP (SACK) 52 Figure 6.3: Drop Rate For Sl owcc Algorithm Using The DropTail Queuing Scheme TCP (SACK) 53 Figure 6.4: Drop Rate For Sl owcc Algorithm Using The RED Queuing Scheme TCP (Vegas) 55 Figure 6.5: Throughput of TCP and TFRC 56 Figure 6.6: Throughput of TCP and SQRT 56 Figure 6.7: Throughput of TCP and TCP (1/8) 56 Figure 6.8: Throughput of TCP and TCP (1/8) 57 Figure 6.9: Throughput of TCP and SQRT 57 Figure 6.10: Throughput of TCP and TFRC 57 Figure 6.11: Throughput of TCP and TFRC 58 Figure 6.12: Throughput of TCP and SQRT 59 Figure 6.13: Throughput of TCP and TCP (1/8) 59

PAGE 10

vii VECTOR FLOW MODEL FOR VIDEO ESTIMATION AND TH E EFFECTS OF NETWORK CONGESTION IN LOW BIT-RATE COMPRESSION Balaji Ramadoss ABSTRACT The use of digitized information is ra pidly gaining acceptance in bio-medical applications. Video compression plays an important role in the archiving and transmission of different digital diagnostic modalities. The present scheme of video compression for low bit-rate networks is not suitable for medical video sequences. The instability is the result of block artifacts re sulting from the block based DCT coefficient quantization. The possibility of applying deformable motion estimation techniques to make the video compression standard (H.263) more adaptable for biomedial applications was studied in detail. The study on the netw ork characteristics and the behavior of various congestion control mechanisms was used to analyze the complete characteristics of existing low bit-rate vi deo compression algorithms. The study was conducted in three phases. The first phase involved the implementation and study of the present H .263 compression standard and its associated limitations. The second phase dealt with the analysis of an external force for active contours, which was used to obtain estimates for deformable objects. The external force, which is termed Gradient Vector Flow (GVF), was computed as a diffusion of the gradient vectors as

PAGE 11

viii sociated with a gray-level or binary edge map derived from the image. The mathematical aspect of a multi-scale framework based on a medial representation for the segmentation and shape characteriza tion of anatomical objects in medical imagery was derived in detail. The medial representations were based on a hierarchical representation of linked figural models such as protrusi ons, indentations, neighboring figures and included figures, which represen ted solid regions and their boundaries. The third phase dealt with the vital parameters for effective video streaming over the internet in the bottleneck bandwidth, which gives the upper limit for the speed of data delivery from one end point to the other in a network. If a codec attemp ts to send data beyond this limit all packets above the limit will be lost On the other hand, sending under this limit will clearly result in suboptimal video quality. During this phase the packet-drop-rate, (PDR), performance of TCP(1/2) was i nvestigated in conjunction with a few representative TCP friendly Conges tion Control Protocols, (CCP). The CCPs were TCP(1/256), SQRT(1/256) and TFRC (256), with and without self clocking. The CCPs were studied when subjected to an abrupt reduction in the available bandwidth. Additionally, the inves tigation studied the effect on the drop rates of the TCP-Compatible algorithms by cha nging the queuing scheme from Random Early Detection (RED) to DropTail.

PAGE 12

1 CHAPTER 1 INTRODUCTION We have to do the best we can. This is our sacred human responsibility. --Albert Einstein There is a growing demand for data sh aring, offshore development and service calls for better and accurate means of multimedia compression and transmission techniques. The development of compressi on algorithms has mainly focused on the general demand for entertainment multimedia and other applications, which require cost reduction. The need for high quality, erro r free, video stream ing for bio-medial applications has evolved rapidly and is market driven. The use of digitized information is rapidly gaining acceptance in bio-medical a pplications. Video compression plays an important role in the archiving and transmission of different digital diagnostic modalities. Various compression schemes such as JPEG a nd MPEG are usually applied to telephone conferencing, cable video transmission and other non-medical applications. The JPEG and MPEG compression schemes are not suitable for medical video sequences such as angiograms. The JPEG and MPEG compression schemes suffer from instability due to block artifacts resulting from the block ba sed DCT coefficient quantization. The image

PAGE 13

2 quality degrades severely with consecutive frame processing due to the accumulation of errors across frames. Despite rapid progress in mass-storage density, processor speeds and digital communication system performance, dema nd for data storage capacity and data transmission bandwidth continues to outstrip the capabilities of ava ilable technologies. To appreciate the need for compression a nd coding of the indi vidual signals that constitute the multimedia experience, Ta ble 1.1 presents data for a few typical multimedia data types and the resulting uncompressed storage space requirements. The numbers indicate the qualitative transition from simple text to fullmotion video data and demonstrate the need for compression. Add itionally, the data presented in Table 1.1 clearly illustrates the need for large storage sp aces for the image, a udio and video signals. Table 1.1: Multimedia Data Types and Uncompressed Storage Space Requirements Multimedia Data Size/Duration Bits/Pixel or Bits/Sample Uncompressed Size A page of text 11" x 8.5" Varying resolution 16-32 Kbits Telephone quality speech 1 sec 8 bps 64 Kbits Grayscale Image 512 x 512 8 bpp 2.1 Mbits Color Image 512 x 512 24 bpp 6.29 Mbits Medical Image 2048 x 1680 12 bpp 41.3 Mbits SHD Image 2048 x 2048 24 bpp 100 Mbits Full-motion Video 640 x 480, 10 sec 24 bpp 2.21 Gbits This research presents the development of a compression scheme for bio-medical video sequence coding based on deformable motion estimation. The present state of technology, the prevalent video compression st andard, (H.263), was implemented. In

PAGE 14

addition, the possibility of applying deformable motion estimation techniques, termed Gradient Vector Flow, (GVF), was introduced to the video compression standard H.263. The introduction of GVF made the H.263 compression standard more adaptable for bio-medial applications, which were studied in detail. The study of the network characteristics and the behavior of various congestion control mechanisms were used to analyze the complete characteristics of the existing low bit-rate video compression algorithms. The importance of this study can be more highly appreciated if the effect of congestion on a network, specifically a low bit-rate network, is understood. Compressed video communication for biomedical applications was studied in great detail. The discussion of this research is divided into an analysis of the three elements central to the need for better compression for biomedical video. The three elements are depicted in Figure 1.1. Figure 1.1: Overview of The Coder/Decoder and Network Under Study 3

PAGE 15

4 1.1 Objectives This research work was conducted in three phases: Definition and implementation of the compression scheme in H.263 Implementation of Deformable Motion Estimation Behavior analysis of slowly responsive congestion control algorithms The first part was the impl ementation of the existing H .263 algorithm, which included a measure achieved by its compression algorith m. The second phase dealt with the analysis of other means of motion estimation. During the second phase a Gradient Vector Flow, (GVF), model was developed fo r deformable motion estimation. The third phase dealt with the networ ks behavior during congesti on and its impact on network traffic.

PAGE 16

5 CHAPTER 2 VIDEO COMPRESSION MPEG, H.261, and H.263 are three closely re lated codecs for motion video. They are all international, nonproprietary standards. MPEG is an International Standard of the International Organization for Standard ization, (ISO), while H.261 and H.263 are Recommendations of the International Te lecommunications Union, (ITU). MPEG is intended for playback of movies from digital storage media while the other two codec is intended for teleconferencing. MPEG is an acronym that stands for M oving Picture Experts Group, which is the name of the ISO committee that developed it. All three codecs are based on the discrete cosine transform, (DCT), predicted frames and motion estimation. A predicted frame is essentially a difference frame, which is th e difference between the current input frame and the previously encoded and reconstructe d frame. The difference should be small over most of the frame area except around th e edges of moving obj ects and where new objects are introduced into the frame. A pred icted frame is termed a P-frame; another name is inter-frame. A frame that is encoded independently of other frames is termed an intra-frame or I-frame.

PAGE 17

6 Motion estimation estimates the translational motion of objects in the current frame relative to the previous frame. This allows the encoder to reduce the energy in the frame difference by moving pixels around in order to simulate object motion. Such action incurs a cost since the encoder must insert a small amount of motion information in the compressed data so that the decode r can reproduce the pixe l shuffling exactly. Both MPEG and H.263 enhance predicti on and motion estimation by using bidirectionally-predicted or B-frames. One can think of a B-frame as the average of two Pframes that use previous and future input frames as predictors for the current input frame. Generally speaking, a B-frame can be about one third the size of a P-frame. Obviously, the use of B frames implies out-of-order enco ding since the encoder can only encode a Bframe after encoding the requisite previous and future frames. Note that a B-frame never predicts another B-frame; only I-frames and P-frames are used to predict B-frames. One feature that H.261 and H.263 share that MPEG does not possess is support for variable frame rate within a video sequence. This is important in teleconferencing for two reasons. First, the bit rate in a teleconference can be very low, so an encoder must be able to lower the frame rate to maintain reasonable visual quality. Second, the encoder must be able to adjust to sudden changes in video content in real ti me without warning. For example, at a scene change the first compressed frame tends to be large. However, with variable-frame-rate encoding the encoder can encode the frame and then skip a few input frames before encoding the next frame. In fact, the human eye will not see motion in the video for a while after the scene change.

PAGE 18

7 Historically, MPEG-1 was derived fro m H.261 and JPEG. H.263 is based on H.261 and MPEG-1 and adds some enhancements of its own. On the other hand, MPEG has some enhancements that are not presen t in H.261 or H.263. A ll other things being equal, at bit rates of 1 Mbps or above, MP EG-1 video will look better than the same content encoded with either H.261 or H.263. At the common rate of approximately 1.2 Mbps, with a frame resolution of 352x240 and a fr ame rate of 30 fps, the visual quality is comparable to or better than that from an an alog VCR. When operated at the limit, these codecs can produce visual artifacts similar to those for JPEG. These include blocks and artifacts near the edges of objects. Such artifacts can be common in low bit rate teleconferencing and at unexpected scene chan ges. H.261 will be the most sensitive to such problems followed by H.263 and then MPEG In low bit rate teleconferencing, 128 kbps and below, H.261 and H.263 video might look better than MPEG video since the first two codecs can vary the frame rate within a video seque nce. H.263 can run at lower bit rates than H.261. H.263 can also run at higher bit rates and support larger frames, which are up to 4 times larger in each dimension. However, if an H.263 encoder does not use its optional modes, its output should be comparable to that from a similar H.261 encoder. The ability to reduce the blocks and artifacts are the most important characteristics of the algorithm developed during this research when it is used in biomedical applications.

PAGE 19

2.1 H.263: An Example The H.263 standard supports video compression for videoconferencing and video telephony applications. The H.263 standard is published by the International Telecommunications Union, (ITU). Figure 2.1 provides a macro-level representation of the H.263 system [1]. Figure 2.1: Video Coder/Decoder Module Videoconferencing and video telephony have a wide range of applications that include: Desktop and room-based conferencing Video over the Internet and over telephone lines Surveillance and monitoring 8

PAGE 20

Telemedicine (medical consultation and diagnosis at a distance) Computer-based training and education In each case video, and perhaps audio, information is transmitted over telecommunication links, which include networks, telephone lines, ISDN and radio. Video has a high "bandwidth" that requires many bytes of information per second. Therefore, these applications require video compression or video coding technology in order to reduce the bandwidth before transmission. 2.2 H.263 Encoder A block diagram of the H.263 encoder is presented in Figure 2.2. Figure 2.2: H.263 Encoder 9

PAGE 21

10 2.2.1 Motion Estimation and Compensation The first step in reducing the bandwidth is to subtract the previously transmitted frame from the current frame. This action leaves only the difference or residue for encoding and transmission. Therefore, areas of the frame that do not change, such as the background, are not encoded. Further reduc tion is achieved by attempting to estimate where areas of the previous frame will occur in the current frame and compensate for the movement, which is termed motion estimati on and compensation. The motion estimation module compares each 16x16 pixel macro-block in the current frame with its surrounding area in the previous frame and attempts to find a match. The matching area is moved into the current macro-block position by the moti on compensator module. Then the motion compensated macro-block is subtracted from the current macro-block. If the motion estimation and compensation process is effici ent, the remaining or "residual" macroblock should only contain a small amount of information [2]. 2.2.2 Discrete Cosine Transform (DCT) The DCT transforms a block of pixel or residual values into a set of "spatial frequency" coefficients. This is analogous to transforming a time domain signal into a frequency domain signal using a Fast Four ier Transform. The DCT operates on a 2dimensional block of pixels rather than a 1-di mensional signal and is particularly good at "compacting" the energy of the block of values into a small number of coefficients. This

PAGE 22

11 means that only a few DCT coefficients are required to recreate a recognizable copy of the original block of pixels [3]. 2.2.3 Quantization In a typical block of pixels, most of the coefficients produced by the DCT are close to zero. The quantizer module reduces th e precision of each coefficient so that the near-zero coefficients are set to zero and only a few significa nt non-zero coefficients are left. This action is performed practically by dividing each coefficient by an integer scale factor and truncating the result. It is important to realize th at the quantizer "throws away" information [3]. 2.2.4 Entropy Encoding An entropy encoder, such as a Huffman encoder, replaces values that occur frequently with short binary codes and replaces values that occur infrequently with longer binary codes. The entropy encoding in H.263 is based on such a technique and is used to compress the quantized DCT coefficients. The result is a sequence of variable-length binary codes. These codes are combined with synchronization and control information such as the motion "vectors", which are re quired to reconstruct the motion-compensated reference frame in order to form the encoded H.263 bit stream.

PAGE 23

2.2.5 Frame Store The current frame must be stored so that it can be used as a reference when the next frame is encoded. Instead of simply copying the current frame into a store, the quantized coefficients are re-scaled; inverse transformed using an Inverse Discrete Cosine Transform and added to the motion-compensated reference block in order to create a reconstructed frame that is placed in a store termed the frame store. This ensures that the contents of the frame store in the encoder are identical to the contents of the frame store in the decoder. When the next frame is encoded, the motion estimator uses the contents of the frame store to determine the best matching area for motion compensation. 2.3 H.263 Decoder A block diagram of the H.263 decoder is presented in Figure 2.3. EntropyDecode MotionEstimation FrameStore IDCT Rescale ++Motion VectorDecoder Figure 2.3: H.263 Decoder 12

PAGE 24

13 2.3.1 Entropy Decoder The variable-length codes that make up th e H.263 bit stream are decoded in order to extract the coefficient values and motion vector information [3]. 2.3.2 Rescale Rescale is the "reverse" of quantizati on. During rescale the coefficients are multiplied by the same scaling factor that was used in the quantizer. However, because the quantizer discarded the frac tional remainder, the rescaled coefficients are not identical to the original coefficients. 2.3.3 Inverse Discrete Cosine Transform (IDCT) The IDCT reverses the DCT operation in or der to create a block of samples. These samples typically correspond to the diffe rence values that were produced by the motion compensator in the encoder. 2.3.4 Motion Compensation The difference values are added to a recons tructed area from the previous frame. The motion vector information is used to pick the correct area, which is the same reference area that was used in the encoder. The result is a reconstruction of the original frame. The reconstructed frame will not be identical to the original because of the "lossy" quantization stage, whic h causes the image quality to be poorer than the original.

PAGE 25

14 The reconstructed frame is placed in a fram e store and it is used to motion-compensate the next received frame. 2.4 Block-Matching Motion Compensation Predictive coding is widely used in video transmission, especially for low bit-rate coding. Typically, only a fraction of an im age changes from frame to frame, which allows for a straightforward prediction from previous frames. Motion compensation is used as part of the predictive process. If an image sequence shows moving objects, then their motion within the scene can be measured and the information used to predict the content of frames later in the sequence [4]. 2.4.1 Frame Based Block-Matching Motion Compensation 2.4.1.1 Fixed Size Block-Matching (FSBM) The Fixed Size Block-Matchi ng, (FSBM) technique was originally described by Jain and Jain [6]. The technique is easy to implement and widely adopted. Each image frame is divided into a fixed number of usua lly square blocks. For each block in the frame a search is made in the reference frame over an area of the image that allows for the maximum translation that can be used by the coder. The intent of the search is location of the best matching block that yiel ds the least prediction error. Usually the search is conducted with the goal of minimizing either the mean square difference or the

PAGE 26

15 mean absolute difference, which is easier to compute. Typical block sizes are of the order of 16x16 pixels and the maximum displace ment might be plus or minus 64 pixels from a block's original position. Several sear ch strategies are possible. Some kind of sampling mechanism is usually employed but th e most straightforward approach is an exhaustive search. However, an exhaustive search is computationally demanding in terms of data throughput but algorithmically simple and relatively easy to implement in hardware. A good match during the search mean s that a good prediction can be made but the improvement in prediction must outweigh the cost of transmitting the motion vector. A good match requires that the whole block undergone the same translation. In addition, the block should not overlap ob jects in the image, includi ng the background, that have different degrees of motion. The choice of the block-size to use for motion compensation is always a compromise. Smaller and more numerous bl ocks can better represent complex motion than fewer number of large ones. This reduces the work and transmission costs of subsequent correction stages but increases the cost of the motion information itself. The problem has been investigated by Ribas-Corb era and Neuhoff [7]. They concluded that the choice of block-size could be affected not only by motion vector accuracy but also by other scene characteristics such as texture and inter-frame noise.

PAGE 27

Figure 2.4: FSBM Example The motion vectors resulting from FSBM are well correlated. Therefore, vector information can be coded differentially using variable length codes. This is performed in a number of codecs such as the ITU-T H.263 [3]. Variable length codes have also been proposed for the MPEG-4 video standard [8]. An example of the block structure generated is presented in Figure 2.4, which is a frame from the MPEG-4 test sequence known as "Foreman". Most noticeable is the stationary background that is represented by large numbers of blocks with very similar motion vectors. The short lines starting from the center of each block represent the motion vectors. Subsequently the motion vectors are variable length coded through the use of a differential 2D prediction mechanism. 2.4.1.2 Variable Size Block-Matching (VSBM) Proposals have been presented that specify improvements to FSBM by varying the size of blocks in order to more accurately match moving areas. Such methods are known as variable size block matching, (VSBM), methods. Chan, Yu and Constantinides 16

PAGE 28

17 have proposed a scheme that starts with rela tively large blocks, which are then repeatedly divided, which is a so-called top down approach [9]. Whenever the best matching error for a block is above a specified threshold, the block is divided into four smaller blocks until the maximum number of blocks or locally minimum errors are obtained. The application of such top-down methods may gene rate block structures for an image that match real moving objects but it seems that an approach that more directly seeks out areas of uniform motion mi ght be more effective. The VSBM technique detects areas of common motion and groups them into variable sized blocks for use with a coding strategy based on the use of quad-trees [10]. Use of a quad-tree obviates the need to de scribe the size and position of each block explicitly. Quad-tree use requi res only the tree description. The vectors for each block in the tree are identical in nature to those of the FSBM. Since the process is a grouping together of smaller blocks to form larger ones, it is generally regarded as a bottom-up technique. An example of the block stru cture generated is presented in Figure 2.5. Comparatively few large blocks represent the stationary background. However, the moving parts of the image are represented by smaller blocks and a larger number of motion vectors.

PAGE 29

Figure 2.5: VSBM Example For the same number of blocks per frame as FSBM, the VSBM method results in a smaller mean square error, (MSE), or better prediction. More significantly, for a similar level of MSE as FSBM, the VSBM technique can represent the inherent motion using fewer variable-sized blocks, which translates into the use of a reduced number of motion vectors. Subsequently the motion vectors are variable length coded using a quad-tree based 2d predictor mechanism [11]. Since frame-based VSBM results in a better estimate of "true" motion and more efficient coding of vector information one would expect that it can be applied to object-based systems with similar effects [12]. The expectation is true but there are two problems to overcome when using a basic block matching approach to find true motion. The first is the majority effect, where any small area of motion inside a block will simply be lost since the matching error for the block is determined by the majority of the block. This is an argument against the use of large block sizes. 18

PAGE 30

Furthermore, a single block cannot effectively represent more than one motion. Therefore, there is always a trade-off between block size and error quality of match. The aperture problem is the second difficulty, which is associated with small block sizes. The fewer pixels there are to match, the more spurious matches there will be due to ambiguity. Additionally, there is little point in having the overhead of many small blocks if they all have the same vector but if they don't the vectors are unlikely to all be correct. FSBM and VSBM were applied to the same frame of video data. Figure 2.6 presents the result of applying FSBM and Figure 2.7 presents the results of applying VSBM to the same video frame for the same quality prediction. While FSBM required 109 blocks, VSBM required only 44, which represents a saving of approximately 60 percent. Motion vectors were then variable length coded using a differential, object-based 2D prediction strategy [13]. Figure 2.6: FSBM Block Structure 19

PAGE 31

Figure 2.7: VSBM Block Structure 2.4.2 Object Based Block-Matching Motion Compensation Evolving object-based video coding standards such as MPEG-4 permit arbitrarly-shaped objects to be encoded and decoded as separate video object planes, (VOPs). There are several motivating scenarios behind the use of objects: Where transmission bandwidth or decoder performance is limited, the user may be able to select some subset of all video objects, which are of particular interest. The user may wish to manipulate objects at the receiver such as a change in position, size and depth ordering, which may evolve strictly as a function of interest. It may be possible to replace the content of an object with material generated later or local to the receiver/display that can be used for enhanced visualization and "augmented reality". 20

PAGE 32

21 2.5 Implementation The H.263 algorithm was implemented and th e results are tabulated in Table 2.1. A wide variety of video sequences were c hosen for the analysis. The common input format chosen for the video is termed Common Intermediate Format, ( CIF), and is based on a progressively scanned format with 360 x 288 pixels/frame at 30 frames/sec while the Quarter Common Intermediate Format, (QCIF), was 180 x 144. In addition, CIF compatibility was made optional but QCIF compatibility was mandatory. Therefore, all codecs had to be able to operate with QCIF CIF is primarily for videoconferencing, while QCIF is suitable for a desktop videophone QCIF was used, which was the mandatory requirement for the codec. Tabl e 2.1 summarizes the results of various video formats being compressed with both QCIF and CIF formats utilized [5]. Table 2.1: Compressed Video Results Sequence Name File Name Source Format Number of Pictures Size / KB H.263 File / Size KB Container container.qcif QCIF 300 11,138 87 Foreman foreman.qcif QCIF 400 14,850 658 News news.qcif QCIF 300 11,138 144 Silent silent.qcif QCIF 300 11,138 138 Mobile mobile.cif CIF 300 44,550 6391 Paris paris.cif CIF 1000 158,153 2624 Tempete tempete.cif CIF 260 38,610 2398

PAGE 33

22 The H.263 Coder, decoder a nd the QCIF reader is presented in the appendix. Video communication over narrow-band channels such as the Internet, ISDN, modem or mobile communication suffers from a loss of image quality. The most annoying artifacts of block-based coders are th e discontinuities at block boundaries. Many authors reduced this problem by applying smoothing operators to the block edges or by prediction of the image structure. Filtering decreases the blocking discontinuities but destroys image structures at the block edges [14]. When dealing with medical imaging application the threshold of allowed image/video loss and the quality is extremely critical. Therefore, this research investigated the feasibility of applying deformable motion and shape analysis to the H.263 algorithm for better pr ediction and as a me thod of avoiding the artifacts.

PAGE 34

23 CHAPTER 3 DEFORMABLE MODEL AND SHAPE ANALYSIS This section of the study analyses a new ex ternal force for active contours, which was used to estimate the deformable objects. The reason for this study was to analyze the feasibility of using some sort of estimation other than the motion estimation in order to reduce the block artifacts and increase the usability of the H.263 algorithm for medical applications. This study envisioned two different applications for medical video applications. The first application involved th e ability to incorporat e a vector flow model into the motion analysis part of video co mpression models. Such action involves taking the compressed video validation obtained by appl ying the vector flow model at both the transmitter and receiver end and comparing th e numerical data for image distortion and loss. The second application involved the ab ility to develop a mathematical model for the deformable motion analysis and the numer ical implementation of various forces and factors, which could be applied in medical im aging. The external force, which is called the gradient vector flow (GVF), is computed as a diffusi on of the gradient vectors of a gray-level or binary edge map derived from the image [15]. Snakes, or active contours are curves defined within an image domain th at can move under the influence of internal forces coming from within the curve itself and external forces computed from the image

PAGE 35

24 data. The internal and external forces are defined so that the snake will conform to an object boundary or other desired features within an image. In this research, the focus was on parametric active contours. Parametric active contours synthesize parametric curves within an image domain and allow them to m ove toward the desired features, which are usually edges. Typically, the curves are drawn toward the edges by potential forces, which are defined to be the negative gradient of a potential function. Additional forces such as pressures, coupled with the potential forces comprise the external forces. There are also internal forces such as elasticity forces and be nding forces that are designed to hold the curve together and to keep it from be nding too much. Care must be maintained since pressures can push an active contour in to boundary concavitie s but cannot be too strong or weak edges will be overwhelme d. How these forces are used in the mathematical model is also presented in the la ter part of the paper. Gradient vector flow (GVF), fields are dense vector fields derived from imag es by minimizing certain energy functionals in a variational framework. The minimization was achieved by solving a pair of decoupled linear partial differential equati ons that diffuse the gr adient vectors of a gray-level or binary edge map that is computed from the image. The active contour that uses the GVF field as its external force is termed a GVF snake. The GVF snake is distinguished from nearly all previous snake formulations since its external forces cannot be written as the negative gradient of a potential function. As a result it could not be formulated usi ng the standard energy minimization framework. Therefore, it was specified direc tly from a force balance condition. To be more precise about the model, th e research proposed a Bayesian approach incorporated from prior knowledge of the anat omical variations and the variation of the

PAGE 36

imaging modalities. Following the deformable templates paradigm, exemplary templates were constructed to incorporate prior information about the geometry and shape of the anatomical objects under study. The infinite anatomical variability was accommodated in the Bayesian framework by defining probabilistic transformations on the templates. The segmentation problem in this paradigm involved finding the transformation S, of the template that maximized the posterior )()/()/(SpSdatapdataSp (3.1) where p(S) is the prior probability function that captures prior knowledge of the anatomy and its variability and is the data likelihood function that captures the image data-to-geometry relationship [16]. For efficiency of implementation, the log-posterior given by )/(Sdatap )(log)/()/(SpSdataLogpdataSLogp (3.2) was equivalently maximized up to an additive constant. The modeling approach adopted in this research for building exemplary templates and associated transformations was based on a multi-scale medial representation. The transformations defined in this framework are parameterized directly in terms of natural shape operations such as thickening and bending and their locations. 3.1 Parametric Snake Model A traditional snake is a curve, given by 1,0,)().()( ssysxsX (3.3) 25

PAGE 37

that moves through the spatial domain of the image in order to minimize the energy function dssXEsxsXEext))(()(")(2122'10 (3.4) where and are weighting parameters that control snakes tension and rigidity respectively and X(s) and X(s) denote the first and the second order derivatives of X(s) with respect to the parameter s. The external energy function, is derived from the image so that it takes on its smaller values at the features of interest such as the boundaries. Given a gray-level image I(x,y), which is viewed as a continuous function of the position variables (x,y), the typical external energies were designed to lead an active contour toward step edges given by extE 2)2(2)1().(*),(),().(),(yxIyxGyxEyxIyxEextext (3.5) where is a two-dimensional Gaussian function with standard deviation ),(yxG and is the gradient operator. If the image is a line drawing that is black on white then the appropriate external energies include ),(*),(),(),().(0)4()3(yxIyxGyxEyxIyxEextext (3.6) The definitions in equation (3.5) and equation (3.6) show that larger s will cause the boundaries to become blurry. However, large s are often necessary, in order to increase the capture range of the active contour. A snake that minimizes E must satisfy the Euler equation 0)("")(" extEsXsX (3.7) 26

PAGE 38

This can be viewed as a force balance equation given by 0)(intpextFF (3.8) where extpextEFandsxsxF)(int)("")(" (3.9) The internal force, discourages stretching and bending while the external potential force, ,pulls the snake toward the edges of the desired image [17]. intF )(pextF 3.2 Example and Behavior of Traditional Snakes An example of the behavior of a traditional snake is presented in this section. Figure 3.1 presents a 64 x 64-pixel line drawing of a U-shaped object, shown in gray, that has a boundary concavity at the top. Figure 3.1: U Shaped Object 27

PAGE 39

Figure 3.2: Iterations and Convergence Figure 3.2 presents a sequence of curves, shown in red, that depict the iterative progression of the solution by a traditional snake ( 0.0,6.0 ) initialized outside the object but within the capture range of the potential force field. The potential force field is given by )4()(extpextEF (3.10) where the pixels, shown in blue, are also presented in Figure 3.2. Note that the final solution in Figure 3.2 solves the Euler equations of the snake formulation but remains split across the concave region [18]. The reason for the poor convergence of this snake is revealed in a close-up of the external force field within the boundary concavity. Although the external forces correctly point toward the object boundary, within the boundary concavity the forces point horizontally in opposite directions. Therefore, the active contour is pulled apart toward each of the fingers of the U-shape but is not made to progress downward into the concavity. There is no choice of the parameters that will correct this problem. Another key problem with traditional snake formulations is the 28

PAGE 40

problem of limited capture range, which can be understood by examining Figure 3.2. In Figure3.2 the magnitude of the external forces dies out quite rapidly away from the object boundary. Increasing in ),(*),(),(0)4(yxIyxGyxEext (3.11) will increase this range but the boundary localization will become less accurate and distinct, which will ultimately obliterate the concavity itself when becomes too large. In Figure 3.2 the snake fails to converge exactly on the object and has a lot of disparity in finding the edges. These are the usual problems found with the traditional snake method. Therefore, the distance potential forces do not solve the problem of convergence to boundary concavities. 29

PAGE 41

Chapter 4 GRADIENT VECTOR FLOW 4.1 Snake: Introduction The overall approach started with the force balance condition (0) (4.1) )(intpextFF as a starting point for designing a snake. A new static external force field ),(yxvFgext (4.2) which was termed the gradient vector flow, (GVF), field was developed. In order to obtain the corresponding dynamic snake equation, the potential force was replaced in extE exttEtsxtsXtsX ),(""),("),( (4.3) with v(x,y), which yielded VtsXtsXtsXt ),(""),("),( (4.4) The parametric curve that solves equation (4.4) is termed a GVF snake. Equation (4.4) was solved numerically by through a process of discretization and iteration, which is a process identical to that utilized to produce a traditional snake. Although the final configuration of a GVF snake satisfied the force-balance equation, equation (4.1), the 30

PAGE 42

equation does not, in general, represent the Euler equations of the energy minimization problem where dssXEsxsXEext))(()(")(2122'10 (4.5) since v(x,y) will not, in general, be a non-rotational field. However, the resulting loss of the optimality property was well-compensated by the significantly improved performance of the GVF snake [19]. 4.1.1 Edge Map An edge map was derived from the image I(x,y), which had the property that it was larger near the image edges. Any gray-level or binary edge map could have been used. For example, ),(),()(yxEyxfiext (4.6) where i=1, 2, 3, or 4 could have been used. Three general properties of edge maps are important in the present context. First, the gradient of an edge map, has vectors pointing toward the edges and become normal to the edges at the edges. Second, these vectors generally have large magnitudes only in the immediate vicinity of the edges. Third, in homogeneous regions, where I(x,y) is nearly constant, f f is nearly zero. These properties affect the behavior of a traditional snake when the gradient of an edge map is used as an external force. The first property causes a snake initialized close to the edge to converge to a stable configuration near the edge, which is a highly desirable property. However, the second property, in general, causes the capture range to be very small. The third property causes homogeneous regions to have no external forces 31

PAGE 43

whatsoever. The second and third properties are undesirable. Therefore, the approach adopted in this research was to keep the highly desirable property of the gradients near the edges but to extend the gradient map farther away from the edges and into homogeneous regions using a computational diffusion process. This approach produced an important benefit. The inherent competition of the diffusion process created vectors that pointed into boundary concavities. 4.2 Mathematical Model A gradient vector flow field was defined as the vector field ),().,(),(yxvyxuyxV (4.7) with the purpose of minimizing the energy functional dxdyfvfvvuuyxyx222222)( (4.8) This variational formulation of provided the benefit of making the result smooth when there was no data. In particular, when f is small the energy is dominated by the sum of the squares of the partial derivatives of the vector field, which produces a slowly varying field. However, when f is large the second term of the integrand dominates the integrand, which can then be easily be minimized by setting v equal to f Minimization of the integrand in produces the desired effect of keeping v approximately equal to the gradient of the edge map when it is large but forces the field to be slowly varying in homogeneous regions. The parameter is a regularization parameter governing the tradeoff between the first term and the second term in the integrand. The parameter should be set according to the amount of noise present in the image. As the 32

PAGE 44

noise increases increases. The smoothing term, which is the first term within the integrand of where is repeated as equation (4.9) for convenience, dxdyfvfvvuuyxyx222222)( (4.9) is the same term as found in the classical formulation of optical flow. It has recently been shown that this term produces an equal penalty on the divergence and curl of the vector field. Therefore, the vector field resulting from this minimization can be expected to be neither entirely non-rotational nor entirely solenoid. Using the calculus of variations, it can be shown that the GVF field can be found by solving the following Euler equations. The Euler equations are given by 0))((0))((222222yxxyxxfffvvfffuu (4.10) where is the Laplacian operator. The Euler equations provided the motivation for the GVF formulation utilized in this research. In a homogeneous region, where I(x, y) is constant, the second term in each equation is zero since the gradient of f(x, y) is zero. Therefore, within such a region, u and v are each determined by Laplaces equation and the resulting GVF field is interpolated from the regions boundary, which indicates the existence of a kind of competition among the boundary vectors. This explains why GVF yields vectors that point into boundary concavities [20]. 2 33

PAGE 45

4.3 Numerical Implementation The Euler equations can be solved if u and v are treated as functions of time. Solving the Euler equations for u and v yields 222222),(),(.),(),,(),,(),,(),(),(.),(),,(),,(),,(yxfyxfyxftyxvtyxvtyxvyxfyxfyxftyxutyxutyxuyxytyxxt (4.11) The steady-state solutions of these linear parabolic equations are the desired solution of the Euler equations, equation (4.10). The expressions in equation (4.11) are decoupled. Therefore, they can be solved as separate scalar partial differential equations in u and v. The expressions in equation (4.11) are known as generalized diffusion equations. For convenience, they are rewritten as presented in equation (4.12) as yxctyxvyxbtyxvtyxvyxctyxuyxbtyxutyxutt,,,(),),,(),,(,,,(),),,(),,(2212 (4.12) where ),(),(),(),(),(),(),(),(),(2122yxfyxbyxcyxfyxbyxcyxfyxfyxbyxyx (4.13) Any digital image gradient operator can be used to calculate and In the examples presented in this thesis a simple central difference was used. The coefficients b(x,y), and can be computed and fixed for the entire iterative process. In order to set up the iterative solution, let the indices i, j and n correspond to x, y, and t, respectively and let the spacing between pixels be xf yf ),(1yxc ),(2yxc x and y and the time step for each iteration be The required partial derivatives can be approximated as t 34

PAGE 46

)4(1)4(1)(1)(1,1,,11,,12,1,,11,,12,1,,1,jijijijijijijijijijinjinjitnjinjitvvvvvyxvuuuuuyxuvvtvuutu (4.14) Substituting these approximations into equation (4.13) yields an iterative solution for the GVF that is given by tcvvvvvrvtbvtcuuuuurutbujinjinjinjinjinjinjijinjijinjinjinjinjinjinjijinji2,,1,,11,,1,,1,1,,1,,11,,1,,1,)4()1()4()1( (4.15) where yxtr (4.16) Convergence of the above iterative process is guaranteed since it is a standard procedure from the theory of numerical methods. Provided that b(x,y), and are bounded, the results in equation (4.15) are stable whenever the CourantFriedrichsLewy step-size restriction of ),(1yxc ),(2yxc 41r is maintained. Since x and y and are normally fixed, use of the definition of r in equation (4.16) imposes a restriction on the time-step must be maintained in order to guarantee convergence of the GVF. The restriction on the time step demands that 35 4yxt (4.17) 4.4 MATLAB Results

PAGE 47

36lated in MATLAB. Some predetermined shapes were used as Iterations for compod performed on The GVF field was simu test diagrams. Figure 4.1 presents the test images and the traditional forces acting on them. Figure 4.1: Test Image With its Traditional Forces uting the object shape with the traditional force meth the test image. Figure 4.2 illustrates the process followed in identifying the deformed testimage. Figure 4.2: Iterations

PAGE 48

The experimental results obtained by use of the GVF method on the U shaped object is presented in the Figure 4.3. Figure 4.3: U Shaped Test Object Undergoing GVF Iterations The improved GVF iteration method with different parameters was applied to the block image. The results obtained are presented in Figure 4.4. Figure 4.4: Iterations on the Block Test Image and its Convergence 37

PAGE 49

CHAPTER 5 MEDICAL APPLICATIONS 5.1 Deformable Super Quadrics Deformable super quadrics are three-dimensional, (3D), constructs that are used for describing closed-surface shapes. They can be used for both 3D graphics and computer vision. Super quadrics use one consistent representation to describe many different objects such as cubes, spheres and head shapes. The super quadric can be fitted to an existing set of data points, which unifies geometric modeling and physical modeling. They also provide for quick point-to-point distance and surface distance calculations. Super quadrics are represented mathematically by 2/2/;)(sin)(cos)(;)(sin)(cos)(1122eeeemh (5.1) The functions presented in equation (5.10) are two-dimensional, (2D), parametric curves. The function h is a horizontal curve whose values traverse a full circle. The function m modulates h and yields values that traverse a half circle. The e1 parameter changes the 38

PAGE 50

scale of h while the parameter e2 raises/lowers h. The parameters e1 and e2 control the squareness of the surface. The basic functions, given in equation (5.1) yield a representation for a 3D surface that is given by the cross product of m and that is given by hmr (5.2) The cross product yields )(sin)(sin)(cos)(cos)(cos),(12121eeeeetsrr (5.3) where the and parameters can be eliminated by using the relation 1cossin22rr (5.3) The super quadric can be transformed through the use of X = R D m r + b (5.4) where X is the 3x1 resulting point, R is a 3x3 modal deformation matrix, r is the 3x1 point to be transformed, b is a 3x1 translation vector. The transformation allows placement of the super quadric anywhere in space with many possible shapes. The Modal Deformation Matrix is given by 39

PAGE 51

222120121110020100dddddddddDm (5.5) where each entry is a polynomial. The Surface Normal to a super quadric is given by )(sin)(cos)()(sin)(cos)(12122222eeeenmnh (5.6) The cross product of equation (5.6) is given by )(sin1)(sin)(cos1)(cos)(cos1),(22222tsrn (5.7) Using a Finite Element Model, instead of a continuous surface, forces the use of a discrete set of points across the surface. The physical forces acting on the points are modeled by Global coordinates: U = [ U V W ] T (5.8) and Local coordinates: u = [u v w] T (5.9) In addition to the super quadric, the differential equations for describing the associated physics are given by Body forces: TBzByBxBffff (5.10) Surface forces: Tszsysxsffff (5.11) 40

PAGE 52

Attachment forces: Tlzlylxlffff (5.12) Strains: Tyzxzxyzzyyxx (5.13) and Stresses: Txzyzxyzzyyxx (5.14) Stiffness modeling is possible through the use of the functions F R dvfHRdvfHRdvfHRcMvmImTmIMvmsmTmsMvmBmTmBmmm )()()()()()()()()()()()( (5.15) Inertia and Damping Modeling was performed so that the final solution for the deformation energy equation would be given by )()()()(tRtFtFtF EDI (5.16) which mathematically expresses the fact that Inertia plus Damping plus Elasticity equals the Response. Equation (5.16) is a second order differential equation and is solved by using either of two methods. The solution methods are termed Direct Integration Methods Mode Superposition Method. 41

PAGE 53

42 5.2 Medial Representa tion of Objects M-reps, the medial representation used, are based on a hierarchical representation of linked figural models such as protru sions, indentations, neighboring figures and included figures, which represent solid regions and their boundaries simultaneously. The linked collections of figural components impl y a fuzzy or probabilistically described boundary position with a width-proportional to lerance. At small scale the figural boundaries are made precise by displacing a dense sampling of the m-rep implied boundary. A model for a single figure is made from a net, also termed a mesh or a chain, of medial atoms. Each atom describes a pos ition and a width. In addition, each atom a local figural frame that provi des figural directions and an object angle between opposing corresponding positions, which are medial i nvolutes on the implied boundary. A figure can be expressed as a sequence of over-scale medial nets, which implies successively refined or smaller tolerance ve rsions of the figural boundary. At the final stage a dense displacement field is defined on the boundary of the medially implied object that accommodates the fine-scale perturbations of the object boundary. 5.3 Single-Figure Description via M-Rep The medial representation used was base d on the medial framework. In this framework a geometrical object is represented as a set of connected continuous medial manifolds. These medial manifolds are formed by the centers of all spheres, circles in two-dimensions, which are interior to the object and tangent to the objects boundary at two or more points. The medial description is defined by the centers of the inscribed spheres and by the associated s calar field of their radii. Each continuous segment of the

PAGE 54

43 medial manifold represents a medial figure. This research focused on objects that could be represented by a single medial figure. In two-dimensions, there at two basic types of medial figural segments and each type possesse s medial manifolds of dimension zero and one. Figural segments with a single point represent the degenerate case of circular objects and are termed zero-dimensional. In three-dimensions there are three basic types of medially defined figural segments with corresponding medial manifolds of dimension zero, one and two. Figural segm ents with 2D medial manifolds M represent slab-like segments. Tube-like segments, where the medial manifold M is a 1D space curve, and spherical segments, where the medial manifold M consists of a single point, are degenerate cases. Examples of slab like and tubular shapes are depicted in Figure 5.1 [21].

PAGE 55

Figure 5.1: Examples of Slab Like and Tubular Figures 5.4 Discretizing Figural Segments The purpose performing a discrete transformation on the figural segments is to provide a a very accurate model or representation of the atomic structure of the tissue and cells to which deformable motion segmentation is easily applied. To illustrate, the structures presented in Figure 5.2 and Figure 5.3, which were obtained for mathematical analysis and also modeling need to be discritized. The structure presented in Figure 5.2 44

PAGE 56

represents a four-tuple defined by position x, width r, vector b tangent to the medial axis and the object angle. s n b p x 1y1y r 0Y Figure 5.2: 2D Medial M Represents a Double Tangency of a Circle To The Boundary Figure 5.3: An End Atom is a Medial Atom With an Additional Component A 2D medial atom carries first-order geometric information at a point on the 1D medial manifold. A zeroth-order description consists of the position x and the radius r of the inscribed circle that is centered at x. A first-order description includes the unit spatial tangent b of the medial manifold at x and captures the first-order width information by the object angle which describes the change in radius along the medial axis by the Blum relationship dsdr cos (5.17) 45

PAGE 57

for the arc length on the medial manifold. The places where the inscribed circle centered at x touches the two halves of the boundary, indexed by 1,-1, are defined as with respective normals given by 11,yy 11,nn 111111)()(rnxyrnxybRnbRn (5.18) where )( R is the rotation matrix. These parameters and coordinates are illustrated in Figure 5.3. At the crest a medial atom is introduced at the ends of the medial chains to insure robust sampling. These medial atoms include an extra parameter that defines the position of the crest point on the object boundary that is given by 0.1,0brxy (5.19) where 0.1 represents a circular end cap and 0.1 represents increasing elongation. The two opposing boundary points implied by the medial atom are given by y 1 and y -1 which are the respective normals given by 1111),(1),(1)()(rnxyrnxybRnbRnnbnb (5.20) where )(),( nbR is a rotation by degrees in the (b,n) plane. As in two dimensions, for stability at the crest, medial atoms on the boundary of the medial manifold also include an extra elongation parameter that determines the crest position. This representation x gives the central location of the solid section of the figure that is being represented by 46

PAGE 58

the atom m in both two-dimensions and three-dimensions. The scalar r gives the local scale and size of the solid section of the figure that is being represented by the atom. The object angle and the direction also define the gradient of the scalar field r via the relationship cosbr (5.21) The scalar field also provides a local ruler for the precise statistical analysis of the object. For example the application of the geometric active contour model algorithm to an echocardiograph image sequence. Two consecutive video frames were taken where the object under study had undergone deformation. Then the algorithm developed was manipulated in order to determine if it was possible to capture the edges of the deformed object. The two frames that were considered are presented in the Figure 5.4. Figure 5.4: Two Consecutive Frames Where The Image Has Undergone Changes The GVF algorithm was applied to the two frames presented in Figure 5.4 with predetermined iteration parameters. Figure 5.5 presents the iteration convergence for the first frame. 47

PAGE 59

Figure 5.5: Iterations and Convergence of Frame 1 The same algorithm was applied to the second frame and the iterations are presented in Figure 5.6. Figure 5.6: Iterations and Convergence of Frame 2 With the known parameters such as the iterations and the other constant values determined in the mathematical model this algorithm can be used to validate the received image/frame with the transmitted image/frame. 48

PAGE 60

CHAPTER 6 COMPRESSED VIDEO OVER NETWORKS This section deals with the study and behavior of the network with compressed video. After analyzing the various parameters that affect video compression, the parameter that was chosen for detailed analysis was network congestion and the behavior of the network with various congestion control mechanisms [22]. 6.1 Effect of Congestion A vital parameter for effective video streaming over the internet is the bottleneck bandwidth, which gives the upper limit to the speed of a network in delivering data from one end point to the other. If a codec tries to send data at a rate above the upper limit all extra packets will be lost. However, sending data at a rate under the upper limit will clearly result in suboptimal video quality. Although accurate estimation of this time varying limit is difficult, techniques are reported that adapt their rates to a measured bottleneck bandwidth. The bursty nature of Internet traffic is the main obstacle in estimating the bottleneck bandwidth. Another problem associated with rate adjustment at the transmitter relates to a multicast application where each receiver may experience a different bottleneck bandwidth. Rate control also poses a problem with the stored media. 49

PAGE 61

Changing the storage rate increases the computational load for the server and affects the compressed video stream, which requires real time trans-coding. 6.1.1 Congestion Mechanisms: Environment of the Study A single bottleneck dumbbell network topology was studied. The network topology is presented in Figure 6.1. All simulations were performed with the ns-2.1 simulator [23]. Sink 1 N 1 N 2 Source 1 Source n Sink n Figure 6.1: Network Topology The simulations examined the response of slowcc algorithms to a sudden drop in congestion by removing a competing CBR flow. The RED queue had a Queue size of 2.5 times the bandwidth-delay product. The min-thresh and max-thresh were set to 0.25 and 1.25 times the bandwidth delay product respectively. Round trip time was set to 50ms. These parameters were set in the simulation code, which is given by if {$queue=="RED"}{ $redq set thresh_ [expr 0.25*$rate_w_mb*50/8.0] $redq set maxthresh_ [expr 1.25*$rate_w_mb*50/8.0] }. 50

PAGE 62

The DropTail queue had a queue size of one bandwidth delay product, which is shown in the code by if {$queue=="DropTail"} { set limit [expr $rate_w_mb*50/8.0] $redq set thresh_ $limit $redq set maxthresh_ $limit } [24]. The competing CBR flow was a square on/off waveform that was created for the purpose of creating a dynamic scenario. Binomial congestion control, (BCC), algorithms are defined using four parameters. Alteration of the parameters provides different BCC versions [25]. The SQRT version of the BCC algorithm was used during simulation. The congestion control mechanism of the TFRC used a throughput equation to establish the allowed sending rate as a function of the loss event rate and the round-trip time. In order to compete fairly with the TCP the TFRC used the TCP throughput equation, which approximately described the TCP's sending rate as a function of the loss event rate, round-trip time and packet size. This research used the TRFC with both self-clocking and no clocking. TRFC with self-clocking imposed a limit on the amount that the sending rate could exceed the receive rate, even in the absence of loss. TFRC without self-clocking limited the senders sending rate to at most twice the rate that the receiver in the previous round trip received data. 51

PAGE 63

6.2 Congestion Control Mechanisms This section presents an analysis of the simulation results and plots of the data obtained from the study on drop rates for the queuing mechanisms considered. In addition, the results are presented for the long-term fairness of different TCP versions and the variation in the TFRC behavior for TCP(Vegas). 6.2.1. Drop Rate Analysis The behavior of different slowcc algorithms that were subjected to a sudden reduction in the available bandwidth was examined. Twenty, (20), long-lived slowcc flows where the bandwidth was controlled by an ON/OFF square wave CBR flow for 150 seconds were used. The 150-second cycle was restarted after a delay of 30 seconds and the cycle repeated. The results illustrating drop rates achieved for a slowcc algorithm using the RED queuing scheme for TCP(SACK) are presented in Figure 6.2. Figure 6.2: Drop Rate For Slowcc Algorithm Using The RED Queuing Scheme TCP (SACK) 52

PAGE 64

Figure 6.3: Drop Rate For Slowcc Algorithm Using The DropTail Queuing Scheme TCP(SACK) The behaviors of the different slowcc versions using RED queuing have been demonstrated. The behaviors obtained using DropTail queuing instead of RED queuing are presented in Figure 6.3. Significant behavioral differences were obtained for the two queuing schemes. The reasons for the differing results from these two queuing schemes can be discerned from a discussion of the basic structure of the schemes. DropTail is a basic first-in, first-out queuing technique where the first packet in the queue is the first packet that is processed. When queues become full congestion occurs and incoming packets are dropped. DropTail relies on end systems to control congestion via congestion control mechanisms. RED is an active queue management scheme that provides a mechanism for congestion avoidance. RED, unlike DropTail, uses statistical methods to drop packets in a "probabilistic" way before queues overflow. Dropping packets in a probabilistic way slows a source down enough to keep the queue steady and reduces the number of packets that would be lost when a queue overflows and a sender is transmitting at a high rate. 53

PAGE 65

6.2.2 Observations The general trend is that TCP(1/2) has a maximum average drop rate. This can be explained by the fact that TCP is the quickest in increasing its sending rate in order to utilize the new bandwidth that is created by the removal of the CBR flow. Once the limit is reached TCP(1/2) starts dropping packets multiplicatively. For DropTail, even though all the algorithms are TCP friendly, when CBR is removed a lot of packet drops occur since DropTail has no way of informing the senders of impending full queues. When RED is used drop rates fall by almost half of the average drop rate when CBR flow is removed. This can be attributed to the way RED handles queues. TCP(1/256) and SQRT (1/256) produce very low drop rates in the presence or in the absence of CBR flow and drop almost no packets. The important characteristic observed in the first 20 seconds after the removal of the CBR flow was that only TCP(1/2) lost packets. During the remaining 10 seconds both TFRC(256)s lost a few packets. However, when RED was used the removal of the CBR source caused the loss of a few packets for all the slowcc with the exception of SQRT(1/256). Once the CBR was reintroduced there was a sudden increase in the drop rates for all the slowcc mechanisms. However, the TRFC versions, both self-clocked and non-self-clocked, had more prominent peaks. TFRC(Self-clocking) automatically slows down the source when the network becomes congested and acknowledgments are delayed. The self-clocked version of TRFC(256) converged and stabilized faster than the non-self-clocked version. In any event the TFRCs had longer convergence times than the algorithms. A similar performance analysis was also conducted with all the different TCP versions with the slowcc mechanisms using RED queuing. The most significant change was 54

PAGE 66

observed with TCP(Vegas). SQRT(1/256) displayed a much higher drop rate when compared to TCP(SACK). The drop rate actually reached zero but only for a small interval of time immediately after the CBR flow was removed. These results are presented in Figure 6.4. Figure 6.4: Drop Rate For Slowcc Algorithm Using The RED Queuing Scheme TCP (Vegas) 6.3 Behavior Analysis of Slowly Responsive Congestion Control Algorithms 6.3.1 Long Term Fairness The scenario considered for studying the long term fairness was a network topology with ten long lived flows where five were TCP and the other five were either TCP(1/8), TFRC or SQRT(1/2). The bandwidth was divided in a ratio of 3:1 between the two flows and CBR. The overall bandwidth utilization was observed to be high when the CBR flow period was short. However, when the period of CBR flow was increased TCP performed better than the competing TCP(1/8), TFRC or SQRT(1/2). 55

PAGE 67

6.3.2 TCP(RENO) When TCP and TFRC are competing with shorter CBR flows, TFRC acquires more bandwidth than TCP. However, TCP eventually takes more bandwidth than TFRC when competing fairly. This result is illustrated in Figure 6.5. When TCP competed with the other two flows the bandwidth utilization characteristic remained similar to that for TCP(SACK). The results are presented in Figure 6.6, Figure 6.6 and Figure 6.7. Figure 6.5: Throughput of TCP and TFRC Figure 6.6: Throughput of TCP and SQRT Figure 6.7: Throughput of TCP and TCP(1/8) 56

PAGE 68

6.3.3 TCP(NEW RENO) With the use of TCP(New Reno), when TCP(1/8) competed with TCP, to a great extent, TCP(1/8) lost its bandwidth margin, which was not the case with TCP(Sack). This comparison is presented in Figure 6.8. Figure 6.8: Throughput of TCP and TCP(1/8) When TCP competed with the other two flows the bandwidth utilization characteristic remained similar to that for TCP(SACK). The results are presented in Figure 6.9 and Figure 6.10. Figure 6.9: Throughput of TCP and SQRT Figure 6.10: Throughput of TCP and TFRC 57

PAGE 69

TFRC or SQRT competing with TCP demonstrated little change in characteristics from those observed for TCP(SACK). 6.3.4 TCP(VEGAS) When compared to all other TCP protocols, a major difference was observed in bandwidth sharing when TCP and TFRC was competing. The comparison is presented in Figure 6.11. In TCP(Vegas), from the very beginning, TFRC obtained a larger share of the bandwidth than TCP, which was a condition that continued throughout the simulation. In contrast to the TCP(Reno) algorithm, which induces congestion in order to learn the available network capacity, a TCP(Vegas) source anticipates the onset of congestion by monitoring the difference between the rate it is expecting to see and the rate it is actually realizing. TCP(Vegas) strategy adjusts the sources sending rate or window size in an attempt to keep a small number of packets buffered in the routers along the path [26]. A comparison of the TCP and TFRC results is presented in Figure 6.11. Figure 6.11: Throughput of TCP and TFRC A bandwidth utilization pattern similar to TCP(SACK) was observed when TCP completed with the other two flows. This observation is presented in Figures 6.12 and 6.13. 58

PAGE 70

Figure 6.12: Throughput of TCP and SQRT Figure 6.13: Throughput of TCP and TCP(1/8) 59

PAGE 71

CHAPTER 7 RESULTS AND FUTURE WORK 7.1 Conclusions With the rapid growth of video traffic in the Internet it is in the interest and the need of the market to have better video coding algorithms for medical imaging applications. This research addressed the problem with the introduction of a new external force model algorithm for the active contours and deformable surfaces. The algorithm was tested and the results plotted. The algorithm was implemented in MATLAB and the action of the gradient vectors was presented. This research analyzed the statistical derivations for deformable model generation associated with the Anatomical model and the deformable physical forces model that incorporates such parameters as elasticity and inertial characteristics. An algorithm implementation that includes all these techniques would be able to estimate the changes a deformable object undergoes. The performances of different slowly responsive congestion control algorithms were investigated. These algorithms were subjected to dynamically changing traffic condition using a Drop Tail queuing scheme for the bottleneck link and compared to the RED queuing scheme. It was found that the drop rate characteristic exhibited by TCP 60

PAGE 72

was much worse in DropTail as compared to that in RED. However, a few slowcc algorithms have reduced drop rates for shorter periods in DropTail than in RED. 7.2 Possible Future Work The possibility and scope for future study is tremendous. This research, although important and successful, did little but highlight the immense need for further research. One very important area that was highlighted by this research is the need to develop the complete integration of deformable model segmentation and incorporate it into the H.263 algorithm. Additionally, study of transport modification, which takes into consideration the congestion parameter and the mechanisms used for congestion control, seems to be especially lucrative and important. 61

PAGE 73

REFERENCES [1] http://www.4i2i.com/h263_video_codec.htm [2] Motion Search Performance using the H.263 Encoder by Alice Yu, http://ise0.stanford.edu/class/ee392c/demo/yu/paper.html [3] [H.263] ITU-T Recommendation H.263 Video coding for low bit rate Communication [4] Block Matching and Compensation by Graham R. Martin, Computer Science Department, University Of Warwick, J.P. Muleller (1998) http://www.dcs.wareick.ac.uk/research/mcg [5] http://kbs.cs.tu-berlin.de/~stewe/vceg/sequences.htm, The TML Project WEB-Page and Archive [6] J.R. Jain and A.K. Jain, "Displacement measurement and its application in interframe image coding", IEEE Trans. Commun., Vol. COM-29, No. 12, pp. 1799-1808, Dec.,1981 [7] J. Ribas-Corbera and D.L. Neuhoff, "On the optimal block size for block-based motion compensated video coders", SPIE Proceedings of Visual Communications and Image Processing, Vol. 3024, pp 1132-1143, February 1997 [8] MPEG-4 Video Group, "MPEG-4 Video Verification Model Version 10.0", ISO/IEC/JTC1/SC19/WG11 Document MPEG98/N1992, San Jose, February 1998 [9] M.H. Chan, Y.B. Yu, A.G. Constantinides, "Variable size block matching motion compensation with applications to video coding", IEE Proceedings, Vol 137, Pt. 1, No. 4, August 1990 [10] G.R. Martin, R.A. Packwood and I. Rhee, "Variable size block matching estimation with minimal error", SPIE Conference on Digital Video Compression: Algorithms and Technologies 1996, San Jose, USA, Vol. 2668, pp 324-333, February 1996 62

PAGE 74

[11] G.R. Martin & R.A. Packwood & M.K. Steliaros, "Reduced entropy motion compensation using variable sized blocks", SPIE Proceedings of Visual Communications and Image Processing, Vol. 3024, February 1997, pp 293-302 [12] R.A. Packwood, M.K. Steliaros and G.R. Martin, "Variable size block matching motion compensation for object-based video coding", IEE 6th International Conference on Image Processing & its Applications, Dublin, Ireland, July 1997, pp 56-60 [13] M.K. Steliaros & G.R. Martin & R.A. Packwood, "Locally-accurate motion estimation for objectbased video coding", SPIE Proceedings of Visual Communications and Image Processing, Vol. 3309, January 1998, pp 306-316. [14] A Method for Reducing Block Artifacts by Interpolating Block Borders by Stathis Panis Guillaume Stamm, and Robert Kutka Siemens AG, Dept. ZT IK 2, 81730 Munich, Germany Ecole des Mines de Saint Etienne, France [15] Snakes, Shapes, and Gradient Vector Flow Chenyang Xu, Student Member, IEEE, and Jerry L. Prince, Senior Member, IEEE (Used their MATLAB Code Modules) [16] Deformable Model-Based Shape and Motion Analysis from Images using Motion Residual Error Douglas DeCarlo and Dimitris Metaxas [17] P. Zerfass, C. D. Werner, F. B. Sachse, and O. Dssel, "Deformation of surface nets for interactive segmentation of tomographic data", to appear in Proc. 34, Jahrestagung DGBMT, 2000 [18] C.Nikou, F. Heitz, and J. P. Armspach. Multimodal image registration using statistically constrained deformable multimodels. In Proceedings of the IEEE International Conference on Image Processing (ICIP'98) [19] M. Kass, A. Witkin, and D. Terzopoulos, Snakes: Active contour models,. Vis., 1987 [20] D. Terzopoulos and K. Fleischer, Deformable models, Vis. Comput, 1988 [21] T. McInerney and D. Terzopoulos, A dynamic finite element surface model for segmentation and tracking in multidimensional medical images with application to cardiac 4D image analysis, Comput. Med. Imag. Graph [22] S. Floyd and K. Fall, Promoting the use of End-to-End Congestion Control in the Internet, IEEE/ACM transactions on Networking, 7(4) August 1999 63

PAGE 75

[23] Jacobson, Congestion Avoidance and Control, Proceedings of ACM/SIGCOMM August 1988 [24] D. Bansal, H. Balakrishnan, S. Floyd and S. Shenker, Dynamic Behavior of Slowly Responsive Congestion Control Algorithms, SIGCOMM 2001 [25] D.Bansal and H.Balakrishnan, Binomial Congestion Control Algorithms, Proceedings of the Conference on Computer Communications IEEE Infocom, April 2001 [26] S.H.Low, L.L. Peterson and L Wang, Understanding TCP Vegas: A Duality Model 64

PAGE 76

BIBLIOGRAPHY Compressed Video Communication by Abdul H.Sadka, Wiley Publications 2002 Compressed Video over a Network, Edited by Ming-Ting Sun and Amy R. Reibman, Signal Processing and Communication Series Digital Video Processing by A. Murat Tekalp, Prentice Hall Signal Processing Series 65


xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001441466
003 fts
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 031203s2003 flua sbm s000|0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0000139
035
(OCoLC)54018070
9
AJM5906
b SE
SFE0000139
040
FHM
c FHM
090
TK145
1 100
Ramadoss, Balaji.
0 245
Vector flow model in video estimation and effects of network congestion in low bit-rate compression standards
h [electronic resource] /
by Balaji Ramadoss.
260
[Tampa, Fla.] :
University of South Florida,
2003.
502
Thesis (M.S.E.E.)--University of South Florida, 2003.
504
Includes bibliographical references.
516
Text (Electronic thesis) in PDF format.
538
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
500
Title from PDF of title page.
Document formatted into pages; contains 76 pages.
520
ABSTRACT: The use of digitized information is rapidly gaining acceptance in bio-medical applications. Video compression plays an important role in the archiving and transmission of different digital diagnostic modalities. The present scheme of video compression for low bit-rate networks is not suitable for medical video sequences. The instability is the result of block artifacts resulting from the block based DCT coefficient quantization. The possibility of applying deformable motion estimation techniques to make the video compression standard (H.263) more adaptable for bio-medial applications was studied in detail. The study on the network characteristics and the behavior of various congestion control mechanisms was used to analyze the complete characteristics of existing low bit rate video compression algorithms. The study was conducted in three phases. The first phase involved the implementation and study of the present H.263 compression standard and its limitations. The second phase dealt with the analysis of an external force for active contours which was used to obtain estimates for deformable objects. The external force, which is termed Gradient Vector Flow (GVF), was computed as a diffusion of the gradient vectors associated with a gray-level or binary edge map derived from the image. The mathematical aspect of a multi-scale framework based on a medial representation for the segmentation and shape characterization of anatomical objects in medical imagery was derived in detail. The medial representations were based on a hierarchical representation of linked figural models such as protrusions, indentations, neighboring figures and included figures--which represented solid regions and their boundaries. The third phase dealt with the vital parameters for effective video streaming over the internet in the bottleneck bandwidth, which gives the upper limit for the speed of data delivery from one end point to the other in a network. If a codec attempts to send data beyond this limit, all packets above the limit will be lost. On the other hand, sending under this limit will clearly result in suboptimal video quality. During this phase the packet-drop-rate (PDR) performance of TCP(1/2) was investigated in conjunction with a few representative TCP-friendly congestion control protocols (CCP). The CCPs were TCP(1/256), SQRT(1/256) and TFRC (256), with and without self clocking. The CCPs were studied when subjected to an abrupt reduction in the available bandwidth. Additionally, the investigation studied the effect on the drop rates of TCP-Compatible algorithms by changing the queuing scheme from Random Early Detection (RED) to DropTail.
590
Adviser: Ph.D, Dr.Wilfrido A. Moreno
653
h.263.
gradient vector flow.
video compression.
deformable super quadrics.
congestion control.
video segmentation.
medical imaging.
network behavior.
690
Dissertations, Academic
z USF
x Electrical Engineering
Masters.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.139