USFDC Home  USF Electronic Theses and Dissertations   RSS 
Material Information
Subjects
Notes
Record Information

Full Text 
PAGE 1 VECTOR FLOW MODEL IN VIDEO ESTIMATION AND EFFECTS OF NETWORK CONGESTION IN LOW BITRATE COMPRESSION STANDARDS by BALAJI RAMADOSS A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering Department of Electrical Engineering College of Engineering University of South Florida Major Professor: Wilfrido A. Moreno, Ph.D. James T. Leffew, Ph.D. Wei Qian, Ph.D. Date of Approval: October 16, 2003 Keywords: h.263, compression, deformable super quadrics, video segmentation, medical imaging, network behavior Copyright 2003 Balaji Ramadoss PAGE 2 Dedication This thesis is dedicated to my parent s and teachers for their continuous support and motivation. PAGE 3 Acknowledgments I would like to express my sincere thanks and appreci ation to Dr. Wilfrido A. Moreno for his guidance and tute lage throughout this endeavor. I would also like to sincerely thank Dr James T. Leffew and Dr. Wei Qian for being available with time and usef ul advice as committee members. My heartfelt personal thanks to Dr Wilfrido A. Moreno for supporting and sponsoring this research venture. PAGE 4 i Table of Contents List of Tables iv List of Figures v Abstract vii Chapter 1 Introduction 1 1.1 Objectives 4 Chapter 2 Video Compression 5 2.1 H.263: An Example 8 2.2 H.263 Encoder 9 2.2.1 Motion Estimation and Compensation 10 2.2.2 Discrete Cosine Transform(DCT) 10 2.2.3 Quantization 11 2.2.4 Entropy Encoding 11 2.2.5 Frame Store 12 2.3 H.263 Decoder 12 2.3.1 Entropy Decoder 13 2.3.2 Rescale 13 2.3.3 Inverse Discrete Cosine Transform 13 2.3.4 Motion Compensation 13 2.4 BlockMatching Motion Compensation 14 PAGE 5 ii 2.4.1 Frame Based BlockMatchi ng Motion Compensation 14 2.4.1.1 Fixed Size Block Matching (FSBM) 14 2.4.1.2 Variable Size Block Matching (VSBM) 16 2.4.2 Object Based BlockMatc hing Motion Compensation 20 2.5 Implementation 21 Chapter 3 Deformable Model and Shape Analysis 23 3.1 Parametric Snake Model 25 3.2 Example and Behavior of Traditional Snakes 27 Chapter 4 Gradient Vector Flow 30 4.1 Snake: Introduction 30 4.1.1 Edge Map 31 4.2 Mathematical Model 32 4.3 Numerical Implementation 34 4.4 MATLAB Results 36 Chapter 5 Medical Applications 38 5.1 Deformable Super Quadrics 38 5.2 Medial Representation of Objects 42 5.3 SingleFigure Description via MRep 42 5.4 Discretizing Figural Segments 44 Chapter 6 Compressed Video Over Networks 49 6.1 Effect of Congestion 49 6.1.1 Congestion Mechanisms: Environment of the Study 50 6.2 Congestion Control Mechanisms 52 PAGE 6 iii 6.2.1 Drop Rate Analysis 52 6.2.2 Observations 54 6.3 Behavior Analysis of Slowly Responsive Congestion Control Algorithms 55 6.3.1 Long Term Fairness 55 6.3.2 TCP(Reno) 56 6.3.3 TCP(NEW RENO) 57 6.3.4 TCP(VEGAS) 58 Chapter 7 Results And Further Work 60 7.1 Conclusion 60 7.2 Possible Future Work 61 References 62 Bibliography 65 PAGE 7 iv List of Tables Table 1.1: Multimedia Data Types and Uncomp ressed Storage Space Requirements 2 Table 2.1: Compressed Video Results 21 PAGE 8 List of Figures Figure 1.1: Overview of The Coder/Decoder and Network Under Study 3 Figure 2.1: Video Coder/Decoder Module 8 Figure 2.2: H.263 Encoder 9 Figure 2.3: H.263 Decoder 12 Figure 2.4: FSBM Example 16 Figure 2.5: VSBM Example 18 Figure 2.6: FSBM Block Structure 19 Figure 2.7: VSBM Block Structure 20 Figure 3.1: U Shaped Object 27 Figure 3.2: Iterations and Convergence 28 Figure 4.1: Test Image With its Traditional Forces 36 Figure 4.2 Iterations 36 Figure 4.3: U Shaped Test Object Undergoing GVF Iterations 37 Figure 4.4: Iterations on the Block Test Image and its Convergence 37 Figure 5.1: Examples of Slab Like and Tubular Figures 44 Figure 5.2: 2D Medial M Represents a Double Tangency 45 of a Circle to The Boundary Figure 5.3: An End Atom is a Medial Atom With an Additional Component 45 v PAGE 9 vi Figure 5.4: Two Consecutive Frames Where The Image Has Undergone Changes 47 Figure 5.5: Iterations and Conve rgence of Frame 1 48 Figure 5.6: Iterations and Conve rgence of Frame 2 48 Figure 6.1: Network Topology 50 Figure 6.2: Drop Rate For Sl owcc Algorithm Using The RED Queuing Scheme TCP (SACK) 52 Figure 6.3: Drop Rate For Sl owcc Algorithm Using The DropTail Queuing Scheme TCP (SACK) 53 Figure 6.4: Drop Rate For Sl owcc Algorithm Using The RED Queuing Scheme TCP (Vegas) 55 Figure 6.5: Throughput of TCP and TFRC 56 Figure 6.6: Throughput of TCP and SQRT 56 Figure 6.7: Throughput of TCP and TCP (1/8) 56 Figure 6.8: Throughput of TCP and TCP (1/8) 57 Figure 6.9: Throughput of TCP and SQRT 57 Figure 6.10: Throughput of TCP and TFRC 57 Figure 6.11: Throughput of TCP and TFRC 58 Figure 6.12: Throughput of TCP and SQRT 59 Figure 6.13: Throughput of TCP and TCP (1/8) 59 PAGE 10 vii VECTOR FLOW MODEL FOR VIDEO ESTIMATION AND TH E EFFECTS OF NETWORK CONGESTION IN LOW BITRATE COMPRESSION Balaji Ramadoss ABSTRACT The use of digitized information is ra pidly gaining acceptance in biomedical applications. Video compression plays an important role in the archiving and transmission of different digital diagnostic modalities. The present scheme of video compression for low bitrate networks is not suitable for medical video sequences. The instability is the result of block artifacts re sulting from the block based DCT coefficient quantization. The possibility of applying deformable motion estimation techniques to make the video compression standard (H.263) more adaptable for biomedial applications was studied in detail. The study on the netw ork characteristics and the behavior of various congestion control mechanisms was used to analyze the complete characteristics of existing low bitrate vi deo compression algorithms. The study was conducted in three phases. The first phase involved the implementation and study of the present H .263 compression standard and its associated limitations. The second phase dealt with the analysis of an external force for active contours, which was used to obtain estimates for deformable objects. The external force, which is termed Gradient Vector Flow (GVF), was computed as a diffusion of the gradient vectors as PAGE 11 viii sociated with a graylevel or binary edge map derived from the image. The mathematical aspect of a multiscale framework based on a medial representation for the segmentation and shape characteriza tion of anatomical objects in medical imagery was derived in detail. The medial representations were based on a hierarchical representation of linked figural models such as protrusi ons, indentations, neighboring figures and included figures, which represen ted solid regions and their boundaries. The third phase dealt with the vital parameters for effective video streaming over the internet in the bottleneck bandwidth, which gives the upper limit for the speed of data delivery from one end point to the other in a network. If a codec attemp ts to send data beyond this limit all packets above the limit will be lost On the other hand, sending under this limit will clearly result in suboptimal video quality. During this phase the packetdroprate, (PDR), performance of TCP(1/2) was i nvestigated in conjunction with a few representative TCP friendly Conges tion Control Protocols, (CCP). The CCPs were TCP(1/256), SQRT(1/256) and TFRC (256), with and without self clocking. The CCPs were studied when subjected to an abrupt reduction in the available bandwidth. Additionally, the inves tigation studied the effect on the drop rates of the TCPCompatible algorithms by cha nging the queuing scheme from Random Early Detection (RED) to DropTail. PAGE 12 1 CHAPTER 1 INTRODUCTION We have to do the best we can. This is our sacred human responsibility. Albert Einstein There is a growing demand for data sh aring, offshore development and service calls for better and accurate means of multimedia compression and transmission techniques. The development of compressi on algorithms has mainly focused on the general demand for entertainment multimedia and other applications, which require cost reduction. The need for high quality, erro r free, video stream ing for biomedial applications has evolved rapidly and is market driven. The use of digitized information is rapidly gaining acceptance in biomedical a pplications. Video compression plays an important role in the archiving and transmission of different digital diagnostic modalities. Various compression schemes such as JPEG a nd MPEG are usually applied to telephone conferencing, cable video transmission and other nonmedical applications. The JPEG and MPEG compression schemes are not suitable for medical video sequences such as angiograms. The JPEG and MPEG compression schemes suffer from instability due to block artifacts resulting from the block ba sed DCT coefficient quantization. The image PAGE 13 2 quality degrades severely with consecutive frame processing due to the accumulation of errors across frames. Despite rapid progress in massstorage density, processor speeds and digital communication system performance, dema nd for data storage capacity and data transmission bandwidth continues to outstrip the capabilities of ava ilable technologies. To appreciate the need for compression a nd coding of the indi vidual signals that constitute the multimedia experience, Ta ble 1.1 presents data for a few typical multimedia data types and the resulting uncompressed storage space requirements. The numbers indicate the qualitative transition from simple text to fullmotion video data and demonstrate the need for compression. Add itionally, the data presented in Table 1.1 clearly illustrates the need for large storage sp aces for the image, a udio and video signals. Table 1.1: Multimedia Data Types and Uncompressed Storage Space Requirements Multimedia Data Size/Duration Bits/Pixel or Bits/Sample Uncompressed Size A page of text 11" x 8.5" Varying resolution 1632 Kbits Telephone quality speech 1 sec 8 bps 64 Kbits Grayscale Image 512 x 512 8 bpp 2.1 Mbits Color Image 512 x 512 24 bpp 6.29 Mbits Medical Image 2048 x 1680 12 bpp 41.3 Mbits SHD Image 2048 x 2048 24 bpp 100 Mbits Fullmotion Video 640 x 480, 10 sec 24 bpp 2.21 Gbits This research presents the development of a compression scheme for biomedical video sequence coding based on deformable motion estimation. The present state of technology, the prevalent video compression st andard, (H.263), was implemented. In PAGE 14 addition, the possibility of applying deformable motion estimation techniques, termed Gradient Vector Flow, (GVF), was introduced to the video compression standard H.263. The introduction of GVF made the H.263 compression standard more adaptable for biomedial applications, which were studied in detail. The study of the network characteristics and the behavior of various congestion control mechanisms were used to analyze the complete characteristics of the existing low bitrate video compression algorithms. The importance of this study can be more highly appreciated if the effect of congestion on a network, specifically a low bitrate network, is understood. Compressed video communication for biomedical applications was studied in great detail. The discussion of this research is divided into an analysis of the three elements central to the need for better compression for biomedical video. The three elements are depicted in Figure 1.1. Figure 1.1: Overview of The Coder/Decoder and Network Under Study 3 PAGE 15 4 1.1 Objectives This research work was conducted in three phases: Definition and implementation of the compression scheme in H.263 Implementation of Deformable Motion Estimation Behavior analysis of slowly responsive congestion control algorithms The first part was the impl ementation of the existing H .263 algorithm, which included a measure achieved by its compression algorith m. The second phase dealt with the analysis of other means of motion estimation. During the second phase a Gradient Vector Flow, (GVF), model was developed fo r deformable motion estimation. The third phase dealt with the networ ks behavior during congesti on and its impact on network traffic. PAGE 16 5 CHAPTER 2 VIDEO COMPRESSION MPEG, H.261, and H.263 are three closely re lated codecs for motion video. They are all international, nonproprietary standards. MPEG is an International Standard of the International Organization for Standard ization, (ISO), while H.261 and H.263 are Recommendations of the International Te lecommunications Union, (ITU). MPEG is intended for playback of movies from digital storage media while the other two codec is intended for teleconferencing. MPEG is an acronym that stands for M oving Picture Experts Group, which is the name of the ISO committee that developed it. All three codecs are based on the discrete cosine transform, (DCT), predicted frames and motion estimation. A predicted frame is essentially a difference frame, which is th e difference between the current input frame and the previously encoded and reconstructe d frame. The difference should be small over most of the frame area except around th e edges of moving obj ects and where new objects are introduced into the frame. A pred icted frame is termed a Pframe; another name is interframe. A frame that is encoded independently of other frames is termed an intraframe or Iframe. PAGE 17 6 Motion estimation estimates the translational motion of objects in the current frame relative to the previous frame. This allows the encoder to reduce the energy in the frame difference by moving pixels around in order to simulate object motion. Such action incurs a cost since the encoder must insert a small amount of motion information in the compressed data so that the decode r can reproduce the pixe l shuffling exactly. Both MPEG and H.263 enhance predicti on and motion estimation by using bidirectionallypredicted or Bframes. One can think of a Bframe as the average of two Pframes that use previous and future input frames as predictors for the current input frame. Generally speaking, a Bframe can be about one third the size of a Pframe. Obviously, the use of B frames implies outoforder enco ding since the encoder can only encode a Bframe after encoding the requisite previous and future frames. Note that a Bframe never predicts another Bframe; only Iframes and Pframes are used to predict Bframes. One feature that H.261 and H.263 share that MPEG does not possess is support for variable frame rate within a video sequence. This is important in teleconferencing for two reasons. First, the bit rate in a teleconference can be very low, so an encoder must be able to lower the frame rate to maintain reasonable visual quality. Second, the encoder must be able to adjust to sudden changes in video content in real ti me without warning. For example, at a scene change the first compressed frame tends to be large. However, with variableframerate encoding the encoder can encode the frame and then skip a few input frames before encoding the next frame. In fact, the human eye will not see motion in the video for a while after the scene change. PAGE 18 7 Historically, MPEG1 was derived fro m H.261 and JPEG. H.263 is based on H.261 and MPEG1 and adds some enhancements of its own. On the other hand, MPEG has some enhancements that are not presen t in H.261 or H.263. A ll other things being equal, at bit rates of 1 Mbps or above, MP EG1 video will look better than the same content encoded with either H.261 or H.263. At the common rate of approximately 1.2 Mbps, with a frame resolution of 352x240 and a fr ame rate of 30 fps, the visual quality is comparable to or better than that from an an alog VCR. When operated at the limit, these codecs can produce visual artifacts similar to those for JPEG. These include blocks and artifacts near the edges of objects. Such artifacts can be common in low bit rate teleconferencing and at unexpected scene chan ges. H.261 will be the most sensitive to such problems followed by H.263 and then MPEG In low bit rate teleconferencing, 128 kbps and below, H.261 and H.263 video might look better than MPEG video since the first two codecs can vary the frame rate within a video seque nce. H.263 can run at lower bit rates than H.261. H.263 can also run at higher bit rates and support larger frames, which are up to 4 times larger in each dimension. However, if an H.263 encoder does not use its optional modes, its output should be comparable to that from a similar H.261 encoder. The ability to reduce the blocks and artifacts are the most important characteristics of the algorithm developed during this research when it is used in biomedical applications. PAGE 19 2.1 H.263: An Example The H.263 standard supports video compression for videoconferencing and video telephony applications. The H.263 standard is published by the International Telecommunications Union, (ITU). Figure 2.1 provides a macrolevel representation of the H.263 system [1]. Figure 2.1: Video Coder/Decoder Module Videoconferencing and video telephony have a wide range of applications that include: Desktop and roombased conferencing Video over the Internet and over telephone lines Surveillance and monitoring 8 PAGE 20 Telemedicine (medical consultation and diagnosis at a distance) Computerbased training and education In each case video, and perhaps audio, information is transmitted over telecommunication links, which include networks, telephone lines, ISDN and radio. Video has a high "bandwidth" that requires many bytes of information per second. Therefore, these applications require video compression or video coding technology in order to reduce the bandwidth before transmission. 2.2 H.263 Encoder A block diagram of the H.263 encoder is presented in Figure 2.2. Figure 2.2: H.263 Encoder 9 PAGE 21 10 2.2.1 Motion Estimation and Compensation The first step in reducing the bandwidth is to subtract the previously transmitted frame from the current frame. This action leaves only the difference or residue for encoding and transmission. Therefore, areas of the frame that do not change, such as the background, are not encoded. Further reduc tion is achieved by attempting to estimate where areas of the previous frame will occur in the current frame and compensate for the movement, which is termed motion estimati on and compensation. The motion estimation module compares each 16x16 pixel macroblock in the current frame with its surrounding area in the previous frame and attempts to find a match. The matching area is moved into the current macroblock position by the moti on compensator module. Then the motion compensated macroblock is subtracted from the current macroblock. If the motion estimation and compensation process is effici ent, the remaining or "residual" macroblock should only contain a small amount of information [2]. 2.2.2 Discrete Cosine Transform (DCT) The DCT transforms a block of pixel or residual values into a set of "spatial frequency" coefficients. This is analogous to transforming a time domain signal into a frequency domain signal using a Fast Four ier Transform. The DCT operates on a 2dimensional block of pixels rather than a 1di mensional signal and is particularly good at "compacting" the energy of the block of values into a small number of coefficients. This PAGE 22 11 means that only a few DCT coefficients are required to recreate a recognizable copy of the original block of pixels [3]. 2.2.3 Quantization In a typical block of pixels, most of the coefficients produced by the DCT are close to zero. The quantizer module reduces th e precision of each coefficient so that the nearzero coefficients are set to zero and only a few significa nt nonzero coefficients are left. This action is performed practically by dividing each coefficient by an integer scale factor and truncating the result. It is important to realize th at the quantizer "throws away" information [3]. 2.2.4 Entropy Encoding An entropy encoder, such as a Huffman encoder, replaces values that occur frequently with short binary codes and replaces values that occur infrequently with longer binary codes. The entropy encoding in H.263 is based on such a technique and is used to compress the quantized DCT coefficients. The result is a sequence of variablelength binary codes. These codes are combined with synchronization and control information such as the motion "vectors", which are re quired to reconstruct the motioncompensated reference frame in order to form the encoded H.263 bit stream. PAGE 23 2.2.5 Frame Store The current frame must be stored so that it can be used as a reference when the next frame is encoded. Instead of simply copying the current frame into a store, the quantized coefficients are rescaled; inverse transformed using an Inverse Discrete Cosine Transform and added to the motioncompensated reference block in order to create a reconstructed frame that is placed in a store termed the frame store. This ensures that the contents of the frame store in the encoder are identical to the contents of the frame store in the decoder. When the next frame is encoded, the motion estimator uses the contents of the frame store to determine the best matching area for motion compensation. 2.3 H.263 Decoder A block diagram of the H.263 decoder is presented in Figure 2.3. EntropyDecode MotionEstimation FrameStore IDCT Rescale ++Motion VectorDecoder Figure 2.3: H.263 Decoder 12 PAGE 24 13 2.3.1 Entropy Decoder The variablelength codes that make up th e H.263 bit stream are decoded in order to extract the coefficient values and motion vector information [3]. 2.3.2 Rescale Rescale is the "reverse" of quantizati on. During rescale the coefficients are multiplied by the same scaling factor that was used in the quantizer. However, because the quantizer discarded the frac tional remainder, the rescaled coefficients are not identical to the original coefficients. 2.3.3 Inverse Discrete Cosine Transform (IDCT) The IDCT reverses the DCT operation in or der to create a block of samples. These samples typically correspond to the diffe rence values that were produced by the motion compensator in the encoder. 2.3.4 Motion Compensation The difference values are added to a recons tructed area from the previous frame. The motion vector information is used to pick the correct area, which is the same reference area that was used in the encoder. The result is a reconstruction of the original frame. The reconstructed frame will not be identical to the original because of the "lossy" quantization stage, whic h causes the image quality to be poorer than the original. PAGE 25 14 The reconstructed frame is placed in a fram e store and it is used to motioncompensate the next received frame. 2.4 BlockMatching Motion Compensation Predictive coding is widely used in video transmission, especially for low bitrate coding. Typically, only a fraction of an im age changes from frame to frame, which allows for a straightforward prediction from previous frames. Motion compensation is used as part of the predictive process. If an image sequence shows moving objects, then their motion within the scene can be measured and the information used to predict the content of frames later in the sequence [4]. 2.4.1 Frame Based BlockMatching Motion Compensation 2.4.1.1 Fixed Size BlockMatching (FSBM) The Fixed Size BlockMatchi ng, (FSBM) technique was originally described by Jain and Jain [6]. The technique is easy to implement and widely adopted. Each image frame is divided into a fixed number of usua lly square blocks. For each block in the frame a search is made in the reference frame over an area of the image that allows for the maximum translation that can be used by the coder. The intent of the search is location of the best matching block that yiel ds the least prediction error. Usually the search is conducted with the goal of minimizing either the mean square difference or the PAGE 26 15 mean absolute difference, which is easier to compute. Typical block sizes are of the order of 16x16 pixels and the maximum displace ment might be plus or minus 64 pixels from a block's original position. Several sear ch strategies are possible. Some kind of sampling mechanism is usually employed but th e most straightforward approach is an exhaustive search. However, an exhaustive search is computationally demanding in terms of data throughput but algorithmically simple and relatively easy to implement in hardware. A good match during the search mean s that a good prediction can be made but the improvement in prediction must outweigh the cost of transmitting the motion vector. A good match requires that the whole block undergone the same translation. In addition, the block should not overlap ob jects in the image, includi ng the background, that have different degrees of motion. The choice of the blocksize to use for motion compensation is always a compromise. Smaller and more numerous bl ocks can better represent complex motion than fewer number of large ones. This reduces the work and transmission costs of subsequent correction stages but increases the cost of the motion information itself. The problem has been investigated by RibasCorb era and Neuhoff [7]. They concluded that the choice of blocksize could be affected not only by motion vector accuracy but also by other scene characteristics such as texture and interframe noise. PAGE 27 Figure 2.4: FSBM Example The motion vectors resulting from FSBM are well correlated. Therefore, vector information can be coded differentially using variable length codes. This is performed in a number of codecs such as the ITUT H.263 [3]. Variable length codes have also been proposed for the MPEG4 video standard [8]. An example of the block structure generated is presented in Figure 2.4, which is a frame from the MPEG4 test sequence known as "Foreman". Most noticeable is the stationary background that is represented by large numbers of blocks with very similar motion vectors. The short lines starting from the center of each block represent the motion vectors. Subsequently the motion vectors are variable length coded through the use of a differential 2D prediction mechanism. 2.4.1.2 Variable Size BlockMatching (VSBM) Proposals have been presented that specify improvements to FSBM by varying the size of blocks in order to more accurately match moving areas. Such methods are known as variable size block matching, (VSBM), methods. Chan, Yu and Constantinides 16 PAGE 28 17 have proposed a scheme that starts with rela tively large blocks, which are then repeatedly divided, which is a socalled top down approach [9]. Whenever the best matching error for a block is above a specified threshold, the block is divided into four smaller blocks until the maximum number of blocks or locally minimum errors are obtained. The application of such topdown methods may gene rate block structures for an image that match real moving objects but it seems that an approach that more directly seeks out areas of uniform motion mi ght be more effective. The VSBM technique detects areas of common motion and groups them into variable sized blocks for use with a coding strategy based on the use of quadtrees [10]. Use of a quadtree obviates the need to de scribe the size and position of each block explicitly. Quadtree use requi res only the tree description. The vectors for each block in the tree are identical in nature to those of the FSBM. Since the process is a grouping together of smaller blocks to form larger ones, it is generally regarded as a bottomup technique. An example of the block stru cture generated is presented in Figure 2.5. Comparatively few large blocks represent the stationary background. However, the moving parts of the image are represented by smaller blocks and a larger number of motion vectors. PAGE 29 Figure 2.5: VSBM Example For the same number of blocks per frame as FSBM, the VSBM method results in a smaller mean square error, (MSE), or better prediction. More significantly, for a similar level of MSE as FSBM, the VSBM technique can represent the inherent motion using fewer variablesized blocks, which translates into the use of a reduced number of motion vectors. Subsequently the motion vectors are variable length coded using a quadtree based 2d predictor mechanism [11]. Since framebased VSBM results in a better estimate of "true" motion and more efficient coding of vector information one would expect that it can be applied to objectbased systems with similar effects [12]. The expectation is true but there are two problems to overcome when using a basic block matching approach to find true motion. The first is the majority effect, where any small area of motion inside a block will simply be lost since the matching error for the block is determined by the majority of the block. This is an argument against the use of large block sizes. 18 PAGE 30 Furthermore, a single block cannot effectively represent more than one motion. Therefore, there is always a tradeoff between block size and error quality of match. The aperture problem is the second difficulty, which is associated with small block sizes. The fewer pixels there are to match, the more spurious matches there will be due to ambiguity. Additionally, there is little point in having the overhead of many small blocks if they all have the same vector but if they don't the vectors are unlikely to all be correct. FSBM and VSBM were applied to the same frame of video data. Figure 2.6 presents the result of applying FSBM and Figure 2.7 presents the results of applying VSBM to the same video frame for the same quality prediction. While FSBM required 109 blocks, VSBM required only 44, which represents a saving of approximately 60 percent. Motion vectors were then variable length coded using a differential, objectbased 2D prediction strategy [13]. Figure 2.6: FSBM Block Structure 19 PAGE 31 Figure 2.7: VSBM Block Structure 2.4.2 Object Based BlockMatching Motion Compensation Evolving objectbased video coding standards such as MPEG4 permit arbitrarlyshaped objects to be encoded and decoded as separate video object planes, (VOPs). There are several motivating scenarios behind the use of objects: Where transmission bandwidth or decoder performance is limited, the user may be able to select some subset of all video objects, which are of particular interest. The user may wish to manipulate objects at the receiver such as a change in position, size and depth ordering, which may evolve strictly as a function of interest. It may be possible to replace the content of an object with material generated later or local to the receiver/display that can be used for enhanced visualization and "augmented reality". 20 PAGE 32 21 2.5 Implementation The H.263 algorithm was implemented and th e results are tabulated in Table 2.1. A wide variety of video sequences were c hosen for the analysis. The common input format chosen for the video is termed Common Intermediate Format, ( CIF), and is based on a progressively scanned format with 360 x 288 pixels/frame at 30 frames/sec while the Quarter Common Intermediate Format, (QCIF), was 180 x 144. In addition, CIF compatibility was made optional but QCIF compatibility was mandatory. Therefore, all codecs had to be able to operate with QCIF CIF is primarily for videoconferencing, while QCIF is suitable for a desktop videophone QCIF was used, which was the mandatory requirement for the codec. Tabl e 2.1 summarizes the results of various video formats being compressed with both QCIF and CIF formats utilized [5]. Table 2.1: Compressed Video Results Sequence Name File Name Source Format Number of Pictures Size / KB H.263 File / Size KB Container container.qcif QCIF 300 11,138 87 Foreman foreman.qcif QCIF 400 14,850 658 News news.qcif QCIF 300 11,138 144 Silent silent.qcif QCIF 300 11,138 138 Mobile mobile.cif CIF 300 44,550 6391 Paris paris.cif CIF 1000 158,153 2624 Tempete tempete.cif CIF 260 38,610 2398 PAGE 33 22 The H.263 Coder, decoder a nd the QCIF reader is presented in the appendix. Video communication over narrowband channels such as the Internet, ISDN, modem or mobile communication suffers from a loss of image quality. The most annoying artifacts of blockbased coders are th e discontinuities at block boundaries. Many authors reduced this problem by applying smoothing operators to the block edges or by prediction of the image structure. Filtering decreases the blocking discontinuities but destroys image structures at the block edges [14]. When dealing with medical imaging application the threshold of allowed image/video loss and the quality is extremely critical. Therefore, this research investigated the feasibility of applying deformable motion and shape analysis to the H.263 algorithm for better pr ediction and as a me thod of avoiding the artifacts. PAGE 34 23 CHAPTER 3 DEFORMABLE MODEL AND SHAPE ANALYSIS This section of the study analyses a new ex ternal force for active contours, which was used to estimate the deformable objects. The reason for this study was to analyze the feasibility of using some sort of estimation other than the motion estimation in order to reduce the block artifacts and increase the usability of the H.263 algorithm for medical applications. This study envisioned two different applications for medical video applications. The first application involved th e ability to incorporat e a vector flow model into the motion analysis part of video co mpression models. Such action involves taking the compressed video validation obtained by appl ying the vector flow model at both the transmitter and receiver end and comparing th e numerical data for image distortion and loss. The second application involved the ab ility to develop a mathematical model for the deformable motion analysis and the numer ical implementation of various forces and factors, which could be applied in medical im aging. The external force, which is called the gradient vector flow (GVF), is computed as a diffusi on of the gradient vectors of a graylevel or binary edge map derived from the image [15]. Snakes, or active contours are curves defined within an image domain th at can move under the influence of internal forces coming from within the curve itself and external forces computed from the image PAGE 35 24 data. The internal and external forces are defined so that the snake will conform to an object boundary or other desired features within an image. In this research, the focus was on parametric active contours. Parametric active contours synthesize parametric curves within an image domain and allow them to m ove toward the desired features, which are usually edges. Typically, the curves are drawn toward the edges by potential forces, which are defined to be the negative gradient of a potential function. Additional forces such as pressures, coupled with the potential forces comprise the external forces. There are also internal forces such as elasticity forces and be nding forces that are designed to hold the curve together and to keep it from be nding too much. Care must be maintained since pressures can push an active contour in to boundary concavitie s but cannot be too strong or weak edges will be overwhelme d. How these forces are used in the mathematical model is also presented in the la ter part of the paper. Gradient vector flow (GVF), fields are dense vector fields derived from imag es by minimizing certain energy functionals in a variational framework. The minimization was achieved by solving a pair of decoupled linear partial differential equati ons that diffuse the gr adient vectors of a graylevel or binary edge map that is computed from the image. The active contour that uses the GVF field as its external force is termed a GVF snake. The GVF snake is distinguished from nearly all previous snake formulations since its external forces cannot be written as the negative gradient of a potential function. As a result it could not be formulated usi ng the standard energy minimization framework. Therefore, it was specified direc tly from a force balance condition. To be more precise about the model, th e research proposed a Bayesian approach incorporated from prior knowledge of the anat omical variations and the variation of the PAGE 36 imaging modalities. Following the deformable templates paradigm, exemplary templates were constructed to incorporate prior information about the geometry and shape of the anatomical objects under study. The infinite anatomical variability was accommodated in the Bayesian framework by defining probabilistic transformations on the templates. The segmentation problem in this paradigm involved finding the transformation S, of the template that maximized the posterior )()/()/(SpSdatapdataSp (3.1) where p(S) is the prior probability function that captures prior knowledge of the anatomy and its variability and is the data likelihood function that captures the image datatogeometry relationship [16]. For efficiency of implementation, the logposterior given by )/(Sdatap )(log)/()/(SpSdataLogpdataSLogp (3.2) was equivalently maximized up to an additive constant. The modeling approach adopted in this research for building exemplary templates and associated transformations was based on a multiscale medial representation. The transformations defined in this framework are parameterized directly in terms of natural shape operations such as thickening and bending and their locations. 3.1 Parametric Snake Model A traditional snake is a curve, given by 1,0,)().()( ssysxsX (3.3) 25 PAGE 37 that moves through the spatial domain of the image in order to minimize the energy function dssXEsxsXEext))(()(")(2122'10 (3.4) where and are weighting parameters that control snakes tension and rigidity respectively and X(s) and X(s) denote the first and the second order derivatives of X(s) with respect to the parameter s. The external energy function, is derived from the image so that it takes on its smaller values at the features of interest such as the boundaries. Given a graylevel image I(x,y), which is viewed as a continuous function of the position variables (x,y), the typical external energies were designed to lead an active contour toward step edges given by extE 2)2(2)1().(*),(),().(),(yxIyxGyxEyxIyxEextext (3.5) where is a twodimensional Gaussian function with standard deviation ),(yxG and is the gradient operator. If the image is a line drawing that is black on white then the appropriate external energies include ),(*),(),(),().(0)4()3(yxIyxGyxEyxIyxEextext (3.6) The definitions in equation (3.5) and equation (3.6) show that larger s will cause the boundaries to become blurry. However, large s are often necessary, in order to increase the capture range of the active contour. A snake that minimizes E must satisfy the Euler equation 0)("")(" extEsXsX (3.7) 26 PAGE 38 This can be viewed as a force balance equation given by 0)(intpextFF (3.8) where extpextEFandsxsxF)(int)("")(" (3.9) The internal force, discourages stretching and bending while the external potential force, ,pulls the snake toward the edges of the desired image [17]. intF )(pextF 3.2 Example and Behavior of Traditional Snakes An example of the behavior of a traditional snake is presented in this section. Figure 3.1 presents a 64 x 64pixel line drawing of a Ushaped object, shown in gray, that has a boundary concavity at the top. Figure 3.1: U Shaped Object 27 PAGE 39 Figure 3.2: Iterations and Convergence Figure 3.2 presents a sequence of curves, shown in red, that depict the iterative progression of the solution by a traditional snake ( 0.0,6.0 ) initialized outside the object but within the capture range of the potential force field. The potential force field is given by )4()(extpextEF (3.10) where the pixels, shown in blue, are also presented in Figure 3.2. Note that the final solution in Figure 3.2 solves the Euler equations of the snake formulation but remains split across the concave region [18]. The reason for the poor convergence of this snake is revealed in a closeup of the external force field within the boundary concavity. Although the external forces correctly point toward the object boundary, within the boundary concavity the forces point horizontally in opposite directions. Therefore, the active contour is pulled apart toward each of the fingers of the Ushape but is not made to progress downward into the concavity. There is no choice of the parameters that will correct this problem. Another key problem with traditional snake formulations is the 28 PAGE 40 problem of limited capture range, which can be understood by examining Figure 3.2. In Figure3.2 the magnitude of the external forces dies out quite rapidly away from the object boundary. Increasing in ),(*),(),(0)4(yxIyxGyxEext (3.11) will increase this range but the boundary localization will become less accurate and distinct, which will ultimately obliterate the concavity itself when becomes too large. In Figure 3.2 the snake fails to converge exactly on the object and has a lot of disparity in finding the edges. These are the usual problems found with the traditional snake method. Therefore, the distance potential forces do not solve the problem of convergence to boundary concavities. 29 PAGE 41 Chapter 4 GRADIENT VECTOR FLOW 4.1 Snake: Introduction The overall approach started with the force balance condition (0) (4.1) )(intpextFF as a starting point for designing a snake. A new static external force field ),(yxvFgext (4.2) which was termed the gradient vector flow, (GVF), field was developed. In order to obtain the corresponding dynamic snake equation, the potential force was replaced in extE exttEtsxtsXtsX ),(""),("),( (4.3) with v(x,y), which yielded VtsXtsXtsXt ),(""),("),( (4.4) The parametric curve that solves equation (4.4) is termed a GVF snake. Equation (4.4) was solved numerically by through a process of discretization and iteration, which is a process identical to that utilized to produce a traditional snake. Although the final configuration of a GVF snake satisfied the forcebalance equation, equation (4.1), the 30 PAGE 42 equation does not, in general, represent the Euler equations of the energy minimization problem where dssXEsxsXEext))(()(")(2122'10 (4.5) since v(x,y) will not, in general, be a nonrotational field. However, the resulting loss of the optimality property was wellcompensated by the significantly improved performance of the GVF snake [19]. 4.1.1 Edge Map An edge map was derived from the image I(x,y), which had the property that it was larger near the image edges. Any graylevel or binary edge map could have been used. For example, ),(),()(yxEyxfiext (4.6) where i=1, 2, 3, or 4 could have been used. Three general properties of edge maps are important in the present context. First, the gradient of an edge map, has vectors pointing toward the edges and become normal to the edges at the edges. Second, these vectors generally have large magnitudes only in the immediate vicinity of the edges. Third, in homogeneous regions, where I(x,y) is nearly constant, f f is nearly zero. These properties affect the behavior of a traditional snake when the gradient of an edge map is used as an external force. The first property causes a snake initialized close to the edge to converge to a stable configuration near the edge, which is a highly desirable property. However, the second property, in general, causes the capture range to be very small. The third property causes homogeneous regions to have no external forces 31 PAGE 43 whatsoever. The second and third properties are undesirable. Therefore, the approach adopted in this research was to keep the highly desirable property of the gradients near the edges but to extend the gradient map farther away from the edges and into homogeneous regions using a computational diffusion process. This approach produced an important benefit. The inherent competition of the diffusion process created vectors that pointed into boundary concavities. 4.2 Mathematical Model A gradient vector flow field was defined as the vector field ),().,(),(yxvyxuyxV (4.7) with the purpose of minimizing the energy functional dxdyfvfvvuuyxyx222222)( (4.8) This variational formulation of provided the benefit of making the result smooth when there was no data. In particular, when f is small the energy is dominated by the sum of the squares of the partial derivatives of the vector field, which produces a slowly varying field. However, when f is large the second term of the integrand dominates the integrand, which can then be easily be minimized by setting v equal to f Minimization of the integrand in produces the desired effect of keeping v approximately equal to the gradient of the edge map when it is large but forces the field to be slowly varying in homogeneous regions. The parameter is a regularization parameter governing the tradeoff between the first term and the second term in the integrand. The parameter should be set according to the amount of noise present in the image. As the 32 PAGE 44 noise increases increases. The smoothing term, which is the first term within the integrand of where is repeated as equation (4.9) for convenience, dxdyfvfvvuuyxyx222222)( (4.9) is the same term as found in the classical formulation of optical flow. It has recently been shown that this term produces an equal penalty on the divergence and curl of the vector field. Therefore, the vector field resulting from this minimization can be expected to be neither entirely nonrotational nor entirely solenoid. Using the calculus of variations, it can be shown that the GVF field can be found by solving the following Euler equations. The Euler equations are given by 0))((0))((222222yxxyxxfffvvfffuu (4.10) where is the Laplacian operator. The Euler equations provided the motivation for the GVF formulation utilized in this research. In a homogeneous region, where I(x, y) is constant, the second term in each equation is zero since the gradient of f(x, y) is zero. Therefore, within such a region, u and v are each determined by Laplaces equation and the resulting GVF field is interpolated from the regions boundary, which indicates the existence of a kind of competition among the boundary vectors. This explains why GVF yields vectors that point into boundary concavities [20]. 2 33 PAGE 45 4.3 Numerical Implementation The Euler equations can be solved if u and v are treated as functions of time. Solving the Euler equations for u and v yields 222222),(),(.),(),,(),,(),,(),(),(.),(),,(),,(),,(yxfyxfyxftyxvtyxvtyxvyxfyxfyxftyxutyxutyxuyxytyxxt (4.11) The steadystate solutions of these linear parabolic equations are the desired solution of the Euler equations, equation (4.10). The expressions in equation (4.11) are decoupled. Therefore, they can be solved as separate scalar partial differential equations in u and v. The expressions in equation (4.11) are known as generalized diffusion equations. For convenience, they are rewritten as presented in equation (4.12) as yxctyxvyxbtyxvtyxvyxctyxuyxbtyxutyxutt,,,(),),,(),,(,,,(),),,(),,(2212 (4.12) where ),(),(),(),(),(),(),(),(),(2122yxfyxbyxcyxfyxbyxcyxfyxfyxbyxyx (4.13) Any digital image gradient operator can be used to calculate and In the examples presented in this thesis a simple central difference was used. The coefficients b(x,y), and can be computed and fixed for the entire iterative process. In order to set up the iterative solution, let the indices i, j and n correspond to x, y, and t, respectively and let the spacing between pixels be xf yf ),(1yxc ),(2yxc x and y and the time step for each iteration be The required partial derivatives can be approximated as t 34 PAGE 46 )4(1)4(1)(1)(1,1,,11,,12,1,,11,,12,1,,1,jijijijijijijijijijinjinjitnjinjitvvvvvyxvuuuuuyxuvvtvuutu (4.14) Substituting these approximations into equation (4.13) yields an iterative solution for the GVF that is given by tcvvvvvrvtbvtcuuuuurutbujinjinjinjinjinjinjijinjijinjinjinjinjinjinjijinji2,,1,,11,,1,,1,1,,1,,11,,1,,1,)4()1()4()1( (4.15) where yxtr (4.16) Convergence of the above iterative process is guaranteed since it is a standard procedure from the theory of numerical methods. Provided that b(x,y), and are bounded, the results in equation (4.15) are stable whenever the CourantFriedrichsLewy stepsize restriction of ),(1yxc ),(2yxc 41r is maintained. Since x and y and are normally fixed, use of the definition of r in equation (4.16) imposes a restriction on the timestep must be maintained in order to guarantee convergence of the GVF. The restriction on the time step demands that 35 4yxt (4.17) 4.4 MATLAB Results PAGE 47 36lated in MATLAB. Some predetermined shapes were used as Iterations for compod performed on The GVF field was simu test diagrams. Figure 4.1 presents the test images and the traditional forces acting on them. Figure 4.1: Test Image With its Traditional Forces uting the object shape with the traditional force meth the test image. Figure 4.2 illustrates the process followed in identifying the deformed testimage. Figure 4.2: Iterations PAGE 48 The experimental results obtained by use of the GVF method on the U shaped object is presented in the Figure 4.3. Figure 4.3: U Shaped Test Object Undergoing GVF Iterations The improved GVF iteration method with different parameters was applied to the block image. The results obtained are presented in Figure 4.4. Figure 4.4: Iterations on the Block Test Image and its Convergence 37 PAGE 49 CHAPTER 5 MEDICAL APPLICATIONS 5.1 Deformable Super Quadrics Deformable super quadrics are threedimensional, (3D), constructs that are used for describing closedsurface shapes. They can be used for both 3D graphics and computer vision. Super quadrics use one consistent representation to describe many different objects such as cubes, spheres and head shapes. The super quadric can be fitted to an existing set of data points, which unifies geometric modeling and physical modeling. They also provide for quick pointtopoint distance and surface distance calculations. Super quadrics are represented mathematically by 2/2/;)(sin)(cos)(;)(sin)(cos)(1122eeeemh (5.1) The functions presented in equation (5.10) are twodimensional, (2D), parametric curves. The function h is a horizontal curve whose values traverse a full circle. The function m modulates h and yields values that traverse a half circle. The e1 parameter changes the 38 PAGE 50 scale of h while the parameter e2 raises/lowers h. The parameters e1 and e2 control the squareness of the surface. The basic functions, given in equation (5.1) yield a representation for a 3D surface that is given by the cross product of m and that is given by hmr (5.2) The cross product yields )(sin)(sin)(cos)(cos)(cos),(12121eeeeetsrr (5.3) where the and parameters can be eliminated by using the relation 1cossin22rr (5.3) The super quadric can be transformed through the use of X = R D m r + b (5.4) where X is the 3x1 resulting point, R is a 3x3 modal deformation matrix, r is the 3x1 point to be transformed, b is a 3x1 translation vector. The transformation allows placement of the super quadric anywhere in space with many possible shapes. The Modal Deformation Matrix is given by 39 PAGE 51 222120121110020100dddddddddDm (5.5) where each entry is a polynomial. The Surface Normal to a super quadric is given by )(sin)(cos)()(sin)(cos)(12122222eeeenmnh (5.6) The cross product of equation (5.6) is given by )(sin1)(sin)(cos1)(cos)(cos1),(22222tsrn (5.7) Using a Finite Element Model, instead of a continuous surface, forces the use of a discrete set of points across the surface. The physical forces acting on the points are modeled by Global coordinates: U = [ U V W ] T (5.8) and Local coordinates: u = [u v w] T (5.9) In addition to the super quadric, the differential equations for describing the associated physics are given by Body forces: TBzByBxBffff (5.10) Surface forces: Tszsysxsffff (5.11) 40 PAGE 52 Attachment forces: Tlzlylxlffff (5.12) Strains: Tyzxzxyzzyyxx (5.13) and Stresses: Txzyzxyzzyyxx (5.14) Stiffness modeling is possible through the use of the functions F R dvfHRdvfHRdvfHRcMvmImTmIMvmsmTmsMvmBmTmBmmm )()()()()()()()()()()()( (5.15) Inertia and Damping Modeling was performed so that the final solution for the deformation energy equation would be given by )()()()(tRtFtFtF EDI (5.16) which mathematically expresses the fact that Inertia plus Damping plus Elasticity equals the Response. Equation (5.16) is a second order differential equation and is solved by using either of two methods. The solution methods are termed Direct Integration Methods Mode Superposition Method. 41 PAGE 53 42 5.2 Medial Representa tion of Objects Mreps, the medial representation used, are based on a hierarchical representation of linked figural models such as protru sions, indentations, neighboring figures and included figures, which represent solid regions and their boundaries simultaneously. The linked collections of figural components impl y a fuzzy or probabilistically described boundary position with a widthproportional to lerance. At small scale the figural boundaries are made precise by displacing a dense sampling of the mrep implied boundary. A model for a single figure is made from a net, also termed a mesh or a chain, of medial atoms. Each atom describes a pos ition and a width. In addition, each atom a local figural frame that provi des figural directions and an object angle between opposing corresponding positions, which are medial i nvolutes on the implied boundary. A figure can be expressed as a sequence of overscale medial nets, which implies successively refined or smaller tolerance ve rsions of the figural boundary. At the final stage a dense displacement field is defined on the boundary of the medially implied object that accommodates the finescale perturbations of the object boundary. 5.3 SingleFigure Description via MRep The medial representation used was base d on the medial framework. In this framework a geometrical object is represented as a set of connected continuous medial manifolds. These medial manifolds are formed by the centers of all spheres, circles in twodimensions, which are interior to the object and tangent to the objects boundary at two or more points. The medial description is defined by the centers of the inscribed spheres and by the associated s calar field of their radii. Each continuous segment of the PAGE 54 43 medial manifold represents a medial figure. This research focused on objects that could be represented by a single medial figure. In twodimensions, there at two basic types of medial figural segments and each type possesse s medial manifolds of dimension zero and one. Figural segments with a single point represent the degenerate case of circular objects and are termed zerodimensional. In threedimensions there are three basic types of medially defined figural segments with corresponding medial manifolds of dimension zero, one and two. Figural segm ents with 2D medial manifolds M represent slablike segments. Tubelike segments, where the medial manifold M is a 1D space curve, and spherical segments, where the medial manifold M consists of a single point, are degenerate cases. Examples of slab like and tubular shapes are depicted in Figure 5.1 [21]. PAGE 55 Figure 5.1: Examples of Slab Like and Tubular Figures 5.4 Discretizing Figural Segments The purpose performing a discrete transformation on the figural segments is to provide a a very accurate model or representation of the atomic structure of the tissue and cells to which deformable motion segmentation is easily applied. To illustrate, the structures presented in Figure 5.2 and Figure 5.3, which were obtained for mathematical analysis and also modeling need to be discritized. The structure presented in Figure 5.2 44 PAGE 56 represents a fourtuple defined by position x, width r, vector b tangent to the medial axis and the object angle. s n b p x 1y1y r 0Y Figure 5.2: 2D Medial M Represents a Double Tangency of a Circle To The Boundary Figure 5.3: An End Atom is a Medial Atom With an Additional Component A 2D medial atom carries firstorder geometric information at a point on the 1D medial manifold. A zerothorder description consists of the position x and the radius r of the inscribed circle that is centered at x. A firstorder description includes the unit spatial tangent b of the medial manifold at x and captures the firstorder width information by the object angle which describes the change in radius along the medial axis by the Blum relationship dsdr cos (5.17) 45 PAGE 57 for the arc length on the medial manifold. The places where the inscribed circle centered at x touches the two halves of the boundary, indexed by 1,1, are defined as with respective normals given by 11,yy 11,nn 111111)()(rnxyrnxybRnbRn (5.18) where )( R is the rotation matrix. These parameters and coordinates are illustrated in Figure 5.3. At the crest a medial atom is introduced at the ends of the medial chains to insure robust sampling. These medial atoms include an extra parameter that defines the position of the crest point on the object boundary that is given by 0.1,0brxy (5.19) where 0.1 represents a circular end cap and 0.1 represents increasing elongation. The two opposing boundary points implied by the medial atom are given by y 1 and y 1 which are the respective normals given by 1111),(1),(1)()(rnxyrnxybRnbRnnbnb (5.20) where )(),( nbR is a rotation by degrees in the (b,n) plane. As in two dimensions, for stability at the crest, medial atoms on the boundary of the medial manifold also include an extra elongation parameter that determines the crest position. This representation x gives the central location of the solid section of the figure that is being represented by 46 PAGE 58 the atom m in both twodimensions and threedimensions. The scalar r gives the local scale and size of the solid section of the figure that is being represented by the atom. The object angle and the direction also define the gradient of the scalar field r via the relationship cosbr (5.21) The scalar field also provides a local ruler for the precise statistical analysis of the object. For example the application of the geometric active contour model algorithm to an echocardiograph image sequence. Two consecutive video frames were taken where the object under study had undergone deformation. Then the algorithm developed was manipulated in order to determine if it was possible to capture the edges of the deformed object. The two frames that were considered are presented in the Figure 5.4. Figure 5.4: Two Consecutive Frames Where The Image Has Undergone Changes The GVF algorithm was applied to the two frames presented in Figure 5.4 with predetermined iteration parameters. Figure 5.5 presents the iteration convergence for the first frame. 47 PAGE 59 Figure 5.5: Iterations and Convergence of Frame 1 The same algorithm was applied to the second frame and the iterations are presented in Figure 5.6. Figure 5.6: Iterations and Convergence of Frame 2 With the known parameters such as the iterations and the other constant values determined in the mathematical model this algorithm can be used to validate the received image/frame with the transmitted image/frame. 48 PAGE 60 CHAPTER 6 COMPRESSED VIDEO OVER NETWORKS This section deals with the study and behavior of the network with compressed video. After analyzing the various parameters that affect video compression, the parameter that was chosen for detailed analysis was network congestion and the behavior of the network with various congestion control mechanisms [22]. 6.1 Effect of Congestion A vital parameter for effective video streaming over the internet is the bottleneck bandwidth, which gives the upper limit to the speed of a network in delivering data from one end point to the other. If a codec tries to send data at a rate above the upper limit all extra packets will be lost. However, sending data at a rate under the upper limit will clearly result in suboptimal video quality. Although accurate estimation of this time varying limit is difficult, techniques are reported that adapt their rates to a measured bottleneck bandwidth. The bursty nature of Internet traffic is the main obstacle in estimating the bottleneck bandwidth. Another problem associated with rate adjustment at the transmitter relates to a multicast application where each receiver may experience a different bottleneck bandwidth. Rate control also poses a problem with the stored media. 49 PAGE 61 Changing the storage rate increases the computational load for the server and affects the compressed video stream, which requires real time transcoding. 6.1.1 Congestion Mechanisms: Environment of the Study A single bottleneck dumbbell network topology was studied. The network topology is presented in Figure 6.1. All simulations were performed with the ns2.1 simulator [23]. Sink 1 N 1 N 2 Source 1 Source n Sink n Figure 6.1: Network Topology The simulations examined the response of slowcc algorithms to a sudden drop in congestion by removing a competing CBR flow. The RED queue had a Queue size of 2.5 times the bandwidthdelay product. The minthresh and maxthresh were set to 0.25 and 1.25 times the bandwidth delay product respectively. Round trip time was set to 50ms. These parameters were set in the simulation code, which is given by if {$queue=="RED"}{ $redq set thresh_ [expr 0.25*$rate_w_mb*50/8.0] $redq set maxthresh_ [expr 1.25*$rate_w_mb*50/8.0] }. 50 PAGE 62 The DropTail queue had a queue size of one bandwidth delay product, which is shown in the code by if {$queue=="DropTail"} { set limit [expr $rate_w_mb*50/8.0] $redq set thresh_ $limit $redq set maxthresh_ $limit } [24]. The competing CBR flow was a square on/off waveform that was created for the purpose of creating a dynamic scenario. Binomial congestion control, (BCC), algorithms are defined using four parameters. Alteration of the parameters provides different BCC versions [25]. The SQRT version of the BCC algorithm was used during simulation. The congestion control mechanism of the TFRC used a throughput equation to establish the allowed sending rate as a function of the loss event rate and the roundtrip time. In order to compete fairly with the TCP the TFRC used the TCP throughput equation, which approximately described the TCP's sending rate as a function of the loss event rate, roundtrip time and packet size. This research used the TRFC with both selfclocking and no clocking. TRFC with selfclocking imposed a limit on the amount that the sending rate could exceed the receive rate, even in the absence of loss. TFRC without selfclocking limited the senders sending rate to at most twice the rate that the receiver in the previous round trip received data. 51 PAGE 63 6.2 Congestion Control Mechanisms This section presents an analysis of the simulation results and plots of the data obtained from the study on drop rates for the queuing mechanisms considered. In addition, the results are presented for the longterm fairness of different TCP versions and the variation in the TFRC behavior for TCP(Vegas). 6.2.1. Drop Rate Analysis The behavior of different slowcc algorithms that were subjected to a sudden reduction in the available bandwidth was examined. Twenty, (20), longlived slowcc flows where the bandwidth was controlled by an ON/OFF square wave CBR flow for 150 seconds were used. The 150second cycle was restarted after a delay of 30 seconds and the cycle repeated. The results illustrating drop rates achieved for a slowcc algorithm using the RED queuing scheme for TCP(SACK) are presented in Figure 6.2. Figure 6.2: Drop Rate For Slowcc Algorithm Using The RED Queuing Scheme TCP (SACK) 52 PAGE 64 Figure 6.3: Drop Rate For Slowcc Algorithm Using The DropTail Queuing Scheme TCP(SACK) The behaviors of the different slowcc versions using RED queuing have been demonstrated. The behaviors obtained using DropTail queuing instead of RED queuing are presented in Figure 6.3. Significant behavioral differences were obtained for the two queuing schemes. The reasons for the differing results from these two queuing schemes can be discerned from a discussion of the basic structure of the schemes. DropTail is a basic firstin, firstout queuing technique where the first packet in the queue is the first packet that is processed. When queues become full congestion occurs and incoming packets are dropped. DropTail relies on end systems to control congestion via congestion control mechanisms. RED is an active queue management scheme that provides a mechanism for congestion avoidance. RED, unlike DropTail, uses statistical methods to drop packets in a "probabilistic" way before queues overflow. Dropping packets in a probabilistic way slows a source down enough to keep the queue steady and reduces the number of packets that would be lost when a queue overflows and a sender is transmitting at a high rate. 53 PAGE 65 6.2.2 Observations The general trend is that TCP(1/2) has a maximum average drop rate. This can be explained by the fact that TCP is the quickest in increasing its sending rate in order to utilize the new bandwidth that is created by the removal of the CBR flow. Once the limit is reached TCP(1/2) starts dropping packets multiplicatively. For DropTail, even though all the algorithms are TCP friendly, when CBR is removed a lot of packet drops occur since DropTail has no way of informing the senders of impending full queues. When RED is used drop rates fall by almost half of the average drop rate when CBR flow is removed. This can be attributed to the way RED handles queues. TCP(1/256) and SQRT (1/256) produce very low drop rates in the presence or in the absence of CBR flow and drop almost no packets. The important characteristic observed in the first 20 seconds after the removal of the CBR flow was that only TCP(1/2) lost packets. During the remaining 10 seconds both TFRC(256)s lost a few packets. However, when RED was used the removal of the CBR source caused the loss of a few packets for all the slowcc with the exception of SQRT(1/256). Once the CBR was reintroduced there was a sudden increase in the drop rates for all the slowcc mechanisms. However, the TRFC versions, both selfclocked and nonselfclocked, had more prominent peaks. TFRC(Selfclocking) automatically slows down the source when the network becomes congested and acknowledgments are delayed. The selfclocked version of TRFC(256) converged and stabilized faster than the nonselfclocked version. In any event the TFRCs had longer convergence times than the algorithms. A similar performance analysis was also conducted with all the different TCP versions with the slowcc mechanisms using RED queuing. The most significant change was 54 PAGE 66 observed with TCP(Vegas). SQRT(1/256) displayed a much higher drop rate when compared to TCP(SACK). The drop rate actually reached zero but only for a small interval of time immediately after the CBR flow was removed. These results are presented in Figure 6.4. Figure 6.4: Drop Rate For Slowcc Algorithm Using The RED Queuing Scheme TCP (Vegas) 6.3 Behavior Analysis of Slowly Responsive Congestion Control Algorithms 6.3.1 Long Term Fairness The scenario considered for studying the long term fairness was a network topology with ten long lived flows where five were TCP and the other five were either TCP(1/8), TFRC or SQRT(1/2). The bandwidth was divided in a ratio of 3:1 between the two flows and CBR. The overall bandwidth utilization was observed to be high when the CBR flow period was short. However, when the period of CBR flow was increased TCP performed better than the competing TCP(1/8), TFRC or SQRT(1/2). 55 PAGE 67 6.3.2 TCP(RENO) When TCP and TFRC are competing with shorter CBR flows, TFRC acquires more bandwidth than TCP. However, TCP eventually takes more bandwidth than TFRC when competing fairly. This result is illustrated in Figure 6.5. When TCP competed with the other two flows the bandwidth utilization characteristic remained similar to that for TCP(SACK). The results are presented in Figure 6.6, Figure 6.6 and Figure 6.7. Figure 6.5: Throughput of TCP and TFRC Figure 6.6: Throughput of TCP and SQRT Figure 6.7: Throughput of TCP and TCP(1/8) 56 PAGE 68 6.3.3 TCP(NEW RENO) With the use of TCP(New Reno), when TCP(1/8) competed with TCP, to a great extent, TCP(1/8) lost its bandwidth margin, which was not the case with TCP(Sack). This comparison is presented in Figure 6.8. Figure 6.8: Throughput of TCP and TCP(1/8) When TCP competed with the other two flows the bandwidth utilization characteristic remained similar to that for TCP(SACK). The results are presented in Figure 6.9 and Figure 6.10. Figure 6.9: Throughput of TCP and SQRT Figure 6.10: Throughput of TCP and TFRC 57 PAGE 69 TFRC or SQRT competing with TCP demonstrated little change in characteristics from those observed for TCP(SACK). 6.3.4 TCP(VEGAS) When compared to all other TCP protocols, a major difference was observed in bandwidth sharing when TCP and TFRC was competing. The comparison is presented in Figure 6.11. In TCP(Vegas), from the very beginning, TFRC obtained a larger share of the bandwidth than TCP, which was a condition that continued throughout the simulation. In contrast to the TCP(Reno) algorithm, which induces congestion in order to learn the available network capacity, a TCP(Vegas) source anticipates the onset of congestion by monitoring the difference between the rate it is expecting to see and the rate it is actually realizing. TCP(Vegas) strategy adjusts the sources sending rate or window size in an attempt to keep a small number of packets buffered in the routers along the path [26]. A comparison of the TCP and TFRC results is presented in Figure 6.11. Figure 6.11: Throughput of TCP and TFRC A bandwidth utilization pattern similar to TCP(SACK) was observed when TCP completed with the other two flows. This observation is presented in Figures 6.12 and 6.13. 58 PAGE 70 Figure 6.12: Throughput of TCP and SQRT Figure 6.13: Throughput of TCP and TCP(1/8) 59 PAGE 71 CHAPTER 7 RESULTS AND FUTURE WORK 7.1 Conclusions With the rapid growth of video traffic in the Internet it is in the interest and the need of the market to have better video coding algorithms for medical imaging applications. This research addressed the problem with the introduction of a new external force model algorithm for the active contours and deformable surfaces. The algorithm was tested and the results plotted. The algorithm was implemented in MATLAB and the action of the gradient vectors was presented. This research analyzed the statistical derivations for deformable model generation associated with the Anatomical model and the deformable physical forces model that incorporates such parameters as elasticity and inertial characteristics. An algorithm implementation that includes all these techniques would be able to estimate the changes a deformable object undergoes. The performances of different slowly responsive congestion control algorithms were investigated. These algorithms were subjected to dynamically changing traffic condition using a Drop Tail queuing scheme for the bottleneck link and compared to the RED queuing scheme. It was found that the drop rate characteristic exhibited by TCP 60 PAGE 72 was much worse in DropTail as compared to that in RED. However, a few slowcc algorithms have reduced drop rates for shorter periods in DropTail than in RED. 7.2 Possible Future Work The possibility and scope for future study is tremendous. This research, although important and successful, did little but highlight the immense need for further research. One very important area that was highlighted by this research is the need to develop the complete integration of deformable model segmentation and incorporate it into the H.263 algorithm. Additionally, study of transport modification, which takes into consideration the congestion parameter and the mechanisms used for congestion control, seems to be especially lucrative and important. 61 PAGE 73 REFERENCES [1] http://www.4i2i.com/h263_video_codec.htm [2] Motion Search Performance using the H.263 Encoder by Alice Yu, http://ise0.stanford.edu/class/ee392c/demo/yu/paper.html [3] [H.263] ITUT Recommendation H.263 Video coding for low bit rate Communication [4] Block Matching and Compensation by Graham R. Martin, Computer Science Department, University Of Warwick, J.P. Muleller (1998) http://www.dcs.wareick.ac.uk/research/mcg [5] http://kbs.cs.tuberlin.de/~stewe/vceg/sequences.htm, The TML Project WEBPage and Archive [6] J.R. Jain and A.K. Jain, "Displacement measurement and its application in interframe image coding", IEEE Trans. Commun., Vol. COM29, No. 12, pp. 17991808, Dec.,1981 [7] J. RibasCorbera and D.L. Neuhoff, "On the optimal block size for blockbased motion compensated video coders", SPIE Proceedings of Visual Communications and Image Processing, Vol. 3024, pp 11321143, February 1997 [8] MPEG4 Video Group, "MPEG4 Video Verification Model Version 10.0", ISO/IEC/JTC1/SC19/WG11 Document MPEG98/N1992, San Jose, February 1998 [9] M.H. Chan, Y.B. Yu, A.G. Constantinides, "Variable size block matching motion compensation with applications to video coding", IEE Proceedings, Vol 137, Pt. 1, No. 4, August 1990 [10] G.R. Martin, R.A. Packwood and I. Rhee, "Variable size block matching estimation with minimal error", SPIE Conference on Digital Video Compression: Algorithms and Technologies 1996, San Jose, USA, Vol. 2668, pp 324333, February 1996 62 PAGE 74 [11] G.R. Martin & R.A. Packwood & M.K. Steliaros, "Reduced entropy motion compensation using variable sized blocks", SPIE Proceedings of Visual Communications and Image Processing, Vol. 3024, February 1997, pp 293302 [12] R.A. Packwood, M.K. Steliaros and G.R. Martin, "Variable size block matching motion compensation for objectbased video coding", IEE 6th International Conference on Image Processing & its Applications, Dublin, Ireland, July 1997, pp 5660 [13] M.K. Steliaros & G.R. Martin & R.A. Packwood, "Locallyaccurate motion estimation for objectbased video coding", SPIE Proceedings of Visual Communications and Image Processing, Vol. 3309, January 1998, pp 306316. [14] A Method for Reducing Block Artifacts by Interpolating Block Borders by Stathis Panis Guillaume Stamm, and Robert Kutka Siemens AG, Dept. ZT IK 2, 81730 Munich, Germany Ecole des Mines de Saint Etienne, France [15] Snakes, Shapes, and Gradient Vector Flow Chenyang Xu, Student Member, IEEE, and Jerry L. Prince, Senior Member, IEEE (Used their MATLAB Code Modules) [16] Deformable ModelBased Shape and Motion Analysis from Images using Motion Residual Error Douglas DeCarlo and Dimitris Metaxas [17] P. Zerfass, C. D. Werner, F. B. Sachse, and O. Dssel, "Deformation of surface nets for interactive segmentation of tomographic data", to appear in Proc. 34, Jahrestagung DGBMT, 2000 [18] C.Nikou, F. Heitz, and J. P. Armspach. Multimodal image registration using statistically constrained deformable multimodels. In Proceedings of the IEEE International Conference on Image Processing (ICIP'98) [19] M. Kass, A. Witkin, and D. Terzopoulos, Snakes: Active contour models,. Vis., 1987 [20] D. Terzopoulos and K. Fleischer, Deformable models, Vis. Comput, 1988 [21] T. McInerney and D. Terzopoulos, A dynamic finite element surface model for segmentation and tracking in multidimensional medical images with application to cardiac 4D image analysis, Comput. Med. Imag. Graph [22] S. Floyd and K. Fall, Promoting the use of EndtoEnd Congestion Control in the Internet, IEEE/ACM transactions on Networking, 7(4) August 1999 63 PAGE 75 [23] Jacobson, Congestion Avoidance and Control, Proceedings of ACM/SIGCOMM August 1988 [24] D. Bansal, H. Balakrishnan, S. Floyd and S. Shenker, Dynamic Behavior of Slowly Responsive Congestion Control Algorithms, SIGCOMM 2001 [25] D.Bansal and H.Balakrishnan, Binomial Congestion Control Algorithms, Proceedings of the Conference on Computer Communications IEEE Infocom, April 2001 [26] S.H.Low, L.L. Peterson and L Wang, Understanding TCP Vegas: A Duality Model 64 PAGE 76 BIBLIOGRAPHY Compressed Video Communication by Abdul H.Sadka, Wiley Publications 2002 Compressed Video over a Network, Edited by MingTing Sun and Amy R. Reibman, Signal Processing and Communication Series Digital Video Processing by A. Murat Tekalp, Prentice Hall Signal Processing Series 65 xml version 1.0 encoding UTF8 standalone no record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd leader nam Ka controlfield tag 001 001441466 003 fts 006 med 007 cr mnuuuuuu 008 031203s2003 flua sbm s0000 eng d datafield ind1 8 ind2 024 subfield code a E14SFE0000139 035 (OCoLC)54018070 9 AJM5906 b SE SFE0000139 040 FHM c FHM 090 TK145 1 100 Ramadoss, Balaji. 0 245 Vector flow model in video estimation and effects of network congestion in low bitrate compression standards h [electronic resource] / by Balaji Ramadoss. 260 [Tampa, Fla.] : University of South Florida, 2003. 502 Thesis (M.S.E.E.)University of South Florida, 2003. 504 Includes bibliographical references. 516 Text (Electronic thesis) in PDF format. 538 System requirements: World Wide Web browser and PDF reader. Mode of access: World Wide Web. 500 Title from PDF of title page. Document formatted into pages; contains 76 pages. 520 ABSTRACT: The use of digitized information is rapidly gaining acceptance in biomedical applications. Video compression plays an important role in the archiving and transmission of different digital diagnostic modalities. The present scheme of video compression for low bitrate networks is not suitable for medical video sequences. The instability is the result of block artifacts resulting from the block based DCT coefficient quantization. The possibility of applying deformable motion estimation techniques to make the video compression standard (H.263) more adaptable for biomedial applications was studied in detail. The study on the network characteristics and the behavior of various congestion control mechanisms was used to analyze the complete characteristics of existing low bit rate video compression algorithms. The study was conducted in three phases. The first phase involved the implementation and study of the present H.263 compression standard and its limitations. The second phase dealt with the analysis of an external force for active contours which was used to obtain estimates for deformable objects. The external force, which is termed Gradient Vector Flow (GVF), was computed as a diffusion of the gradient vectors associated with a graylevel or binary edge map derived from the image. The mathematical aspect of a multiscale framework based on a medial representation for the segmentation and shape characterization of anatomical objects in medical imagery was derived in detail. The medial representations were based on a hierarchical representation of linked figural models such as protrusions, indentations, neighboring figures and included figureswhich represented solid regions and their boundaries. The third phase dealt with the vital parameters for effective video streaming over the internet in the bottleneck bandwidth, which gives the upper limit for the speed of data delivery from one end point to the other in a network. If a codec attempts to send data beyond this limit, all packets above the limit will be lost. On the other hand, sending under this limit will clearly result in suboptimal video quality. During this phase the packetdroprate (PDR) performance of TCP(1/2) was investigated in conjunction with a few representative TCPfriendly congestion control protocols (CCP). The CCPs were TCP(1/256), SQRT(1/256) and TFRC (256), with and without self clocking. The CCPs were studied when subjected to an abrupt reduction in the available bandwidth. Additionally, the investigation studied the effect on the drop rates of TCPCompatible algorithms by changing the queuing scheme from Random Early Detection (RED) to DropTail. 590 Adviser: Ph.D, Dr.Wilfrido A. Moreno 653 h.263. gradient vector flow. video compression. deformable super quadrics. congestion control. video segmentation. medical imaging. network behavior. 690 Dissertations, Academic z USF x Electrical Engineering Masters. 773 t USF Electronic Theses and Dissertations. 4 856 u http://digital.lib.usf.edu/?e14.139 