USFDC Home  USF Electronic Theses and Dissertations   RSS 
Material Information
Subjects
Notes
Record Information

Full Text 
xml version 1.0 encoding UTF8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd leader nam Ka controlfield tag 001 001469346 003 fts 006 med 007 cr mnuuuuuu 008 040524s2004 flua sbm s0000 eng d datafield ind1 8 ind2 024 subfield code a E14SFE0000248 035 (OCoLC)55520091 9 AJR1100 b SE 040 FHM c FHM 1 100 Soundararajan, Padmanabhan. 0 245 Core issues in graph based perceptual organization h [electronic resource] : spectral cut measures, learning / by Padmanabhan Soundararajan. 260 [Tampa, Fla.] : University of South Florida, 2004. 502 Thesis (Ph.D.)University of South Florida, 2004. 504 Includes bibliographical references. 500 Includes vita. 516 Text (Electronic thesis) in PDF format. 538 System requirements: World Wide Web browser and PDF reader. Mode of access: World Wide Web. Title from PDF of title page. Document formatted into pages; contains 124 pages. 520 ABSTRACT: Grouping is a vital precursor to object recognition. The complexity of the object recognition process can be reduced to a large extent by using a frontend grouping process. In this dissertation, a grouping framework based on spectral methods for graphs is used. The objects are segmented from the background by means of an associated learning process that decides on the relative importance of the basic salient relationships such as proximity, parallelism, continuity, junctions and common region. While much of the previous research has been focussed on using simple relationships like similarity, proximity, continuity and junctions, this work differenciates itself by using all the relationships listed above. The parameters of the grouping process is cast as probabilistic specifications of Bayesian networks that need to be learned: the learning is accomplished by a team of stochastic learning automata. One of the stages in the grouping process is graph partitioning. There are a variety of cut measures based on which partitioning can be obtained and different measures give different partitioning results. This work looks at three popular cut measures, namely the minimum, average and normalized. Theoretical and empirical insight into the nature of these partitioning measures in terms of the underlying image statistics are provided. In particular, the questions addressed are as follows: For what kinds of image statistics would optimizing a measure, irrespective of the particular algorithm used, result in correct partitioning? Are the quality of the groups significantly different for each cut measure? Are there classes of images for which grouping by partitioning is not suitable? Does recursive bipartitioning strategy separate out groups corresponding to K objects from each other? The major conclusion is that optimization of none of the above three measures is guaranteed to result in the correct partitioning of K objects, in the strict stochastic order sense, for all image statistics. Qualitatively speaking, under very restrictive conditions when the average interobject feature affinity is very weak when compared to the average intraobject feature affinity, the minimum cut measure is optimal. The average cut measure is optimal for graphs whose partition width is less than the mode of distribution of all possible partition widths. The normalized cut measure is optimal for a more restrictive subclass of graphs whose partition width is less than the mode of the partition width distributions and the strength of interobject links is six times less than the intraobject links. The learning framework described in the first part of the work is used to empirically evaluate the cut measures. Rigorous empirical evaluation on 100 real images indicates that in practice, the quality of the groups generated using minimum or average or normalized cuts are statistically equivalent for object recognition, i.e. the best, the mean, and the variation of the qualities are statistically equivalent. Another conclusion is that for certain image classes, such as aerial and scenes with manmade objects in manmade surroundings, the performance of grouping by partitioning is the worst, irrespective of the cut measure. 590 Adviser: Sarkar, Sudeep 653 grouping. spectral methods. graph cut measures. learning automata. 690 Dissertations, Academic z USF x Computer Science and Engineering Doctoral. 090 TK7885 773 t USF Electronic Theses and Dissertations. 4 856 u http://digital.lib.usf.edu/?e14.248 PAGE 1 Core Issues in Graph Based Perceptual Or ganization: Spectral Cut Measures, Learning by P admanabhan Soundararajan A dissertation submitted in partial fulllment of the requirements for the de gree of Doctor of Philosophy Department of Computer Science and Engineering Colle ge of Engineering Uni v ersity of South Florida Major Professor: Sudeep Sarkar Ph.D. Dmitry Goldgof, Ph.D. K e vin Bo wyer Ph.D. T apas Das, Ph.D. Thomas Sanocki, Ph.D. Date of Appro v al: Mar 29, 2004 K e yw ords: Grouping, Spectral methods, Graph cut measures, Learning automata cCop yright 2004, P admanabhan Soundararajan PAGE 2 DEDICA TION T o my parents! PAGE 3 A CKNO WLEDGEMENTS I w ould lik e to tak e this opportunity to thank Dr Sarkar for his guidance, immeasurable patience and being a great mentor I am also grateful to my committee members, Dr Goldgof, Dr Bo wyer Dr Das and Dr Sanocki for their advice and feedback. Thanks also to the other f aculty members in the department with whom I ha v e interacted. Thanks also to my labmates particularly Earnie, Zongyi, Y an, T ong, Laura, Isidro, A yush, Lakshmi and rest of the V ision lab gang. Outside my lab, I' d particularly lik e to thank Ashok, Mouli, Sanjukta, V ijaya and Ameeta. Man y thanks to my friends Himanshu, V i v ek, Hemal, Nilanjan, Rupali, Shikha, Rajee v Amar & Aparna for k eeping my week ends interesting. Thanks to the cof fee gang at IISc, especially V enki, Reena, P acchi, Niraj, Shridhar Raj, K oshy Bhu and Ganesh for encouraging me to pursue my Ph.D. This undertaking w ould not ha v e been possible without my parents and immediate f amily I am grateful for their support and patience. Thanks also to the w onderful folks at T ech Support who took care of problems, special thanks to Daniel. I also thank innumerable number of friends who ha v e helped me on my journe y you kno w who you are. PAGE 4 T ABLE OF CONTENTS LIST OF T ABLES iii LIST OF FIGURES i v ABSTRA CT vii CHAPTER 1 INTR ODUCTION 1 1.1 What is Perceptual Or ganization? 1 1.2 Contrib utions of this W ork 5 1.3 Layout of the Dissertation 6 CHAPTER 2 RELA TED W ORK 8 2.1 W orks on Learning in Perceptual Or ganization 12 2.2 Graph Based Grouping Engine 13 2.3 Probabilistic Modeling of the Grouping Process 14 CHAPTER 3 SUPER VISED LEARNING OF LARGE GR OUPING STR UCTURES 17 3.1 Scene Structure Graph Specication 20 3.1.1 Bayesian Netw orks 22 3.1.2 Quantication of the Scene Structure Graph 25 3.2 Graph Spectral P artitioning 26 3.3 Learning Grouping P arameters 29 3.3.1 What is a Learning Automaton? 31 3.3.2 Ho w Does a T eam of Automata Operate? 32 3.3.3 Choice of the Learn Rate 36 3.3.4 Stopping Conditions 36 3.3.5 The Learnt P arameter Combinations 37 3.3.6 Ho w is the Performance Feedback, b Computed? 37 3.4 Results and Analyses 42 3.4.1 General Performance of Spectral Grouping 42 3.4.2 Adaption of the Grouping Algorithm to Object T ypes 45 3.4.3 Adaptation of the Grouping Algorithm to a Domain 51 3.5 Summary 55 CHAPTER 4 THEORETICAL STUDIES ON GRAPH CUT MEASURES 57 4.1 Comparing Random V ariables: Stochastic Orders 64 4.2 Minimum Cut 67 4.3 A v erage Cut 70 i PAGE 5 4.4 Normalized Cut 73 4.5 Summary 75 CHAPTER 5 EMPIRICAL EV ALU A TION OF THE CUT MEASURES 77 5.1 Image Set 78 5.2 Ground T ruth Creation 79 5.3 P arameter Selection 84 5.4 Analyses and Discussions 84 5.4.1 Summary Statistics 84 5.4.2 Ov erall T rain and T est Performance 86 5.4.3 Classwise Performances 86 5.4.4 Per image Performance 88 5.4.5 Cross Compatibility of the P arameter Sets 92 5.5 T ime Issues 101 5.6 Summary 101 CHAPTER 6 DISCUSSIONS AND CONCLUSIONS 104 REFERENCES 106 ABOUT THE A UTHOR End P age ii PAGE 6 LIST OF T ABLES T able 2.1 Classicatory structure for perceptual or ganization. 10 T able 3.1 Conditional probabilities used in the Bayesian netw orks. The functions TT nT pand U are as dened in Fig. 3.4. 25 T able 3.2 ANO V A of the learning performance on the aerial images of planes. 48 T able 3.3 ANO V A of the learning performance on aerial images. 54 T able 5.1 The or ganization of the ra w performance indices, their summary statistics studied for each image class, and the notations used to refer to them. 85 T able 5.2 Mean v alues with 95% condence le v el of the classmean perfor mance indices, b M ean al tc, for each class of images considered for each cut type. 88 T able 5.3 ANO V A of the classa v erage performances ( b M ean al tci). 89 T able 5.4 (a) ANO V A of the best per image performance ( b max 1 al tI c k) on all 50 test images, for the cut types. 91 T able 5.5 (a) ANO V A of the range of performances b max 1 al tI c kb max 100 al tI c k, for the three cut types. 92 iii PAGE 7 LIST OF FIGURES Figure 1.1 Gestalt principles of grouping. 2 Figure 1.2 Computational theory for Graph based Grouping. 5 Figure 3.1 Schematic of the grouping strate gy 19 Figure 3.2 The geometric attrib utes used to classify pairwise edge se gment relationships are sho wn. 20 Figure 3.3 Bayesian netw orks used to classify pairs of edge se gments into the Gestalt inspired salient relationships. 23 Figure 3.4 Basic forms of the conditional probabilities of the Bayesian netw orks. 23 Figure 3.5 T eam of learning automata for learning parameter combinations. 33 Figure 3.6 Example used to illustrate performance measure computation. (a) Ground truth models, (b) Groups formed. 40 Figure 3.7 Line t of the data points of b (xaxis) and b al t (yaxis). 41 Figure 3.8 The performance of the spectral grouping algorithm. 43 Figure 3.9 Normalized histogram of the parameter v alues that result in good performance o v er images lik e that sho wn in Fig. 3.8. 44 Figure 3.10 T ypical iteration traces of the learning automata team. 46 Figure 3.11 Representati v e images from the set of 40 airplane images. 47 Figure 3.12 (a) V ariation of train and test performance for dif ferent airplane images. 48 Figure 3.13 Normalized histogram of the parameter v alues that result in good performance for se gmenting planes from aerial vie ws. 50 Figure 3.14 Representati v e images from the set of 20 aerial images. 52 Figure 3.15 (a) V ariation of train and test performance on the aerial images. 53 Figure 3.16 Normalized histogram of the parameter v alues that result in good grouping performance for aerial images. 55 i v PAGE 8 Figure 4.1 P artitioning of a scene structure graph o v er features from multiple objects. (See te xt for e xplanation of notations.) 59 Figure 4.2 Empirical t of the gamma probability density function to link weight distrib ution: Left for links between same object features, right for links between features from dif ferent objects. 61 Figure 4.3 The e xpected v alues of the three measures, (a) minimum cut, (b) a v erage cut, and (c) normalized cut measures plotted as a function of f 1 and f 2 for a scene with similar sized objects and with the strength of connection within objects being just twice the strength between objects, i.e. N 1N 2W2 and w1. 65 Figure 4.4 The e xpected v alues of the three measures, (a) minimum cut, (b) a v erage cut, and (c) normalized cut measures plotted as a function of f 1 and f 2 for a scene with similar sized objects and with the strength of connection within objects being 20 times the strength between objects, i.e. N 1N 2W20and w1. 66 Figure 5.1 Sample ground truthed images from Natur al indoor image set. 80 Figure 5.2 Sample ground truthed images from Natur al outdoor image set. 81 Figure 5.3 Sample ground truthed images from Manmade indoor image set. 82 Figure 5.4 Sample ground truthed images from Manmade outdoor image set. 83 Figure 5.5 Sample ground truthed images from Aerial image set. 83 Figure 5.6 Box plot of per class a v erage performances ( b M ean al tci) for each of trained parameter combinations on the set of 50 images, in the training and testing sets, for each of the three cuts. 87 Figure 5.7 Images on which all three cuts perform the best. 90 Figure 5.8 Image on which the performance of a v erage cut is the most dif fer ent from the other 90 Figure 5.9 Image on which the performance of normalized cut is the most different from the other 94 Figure 5.10 Image on which the performance of mincut is the most dif ferent from the other 95 Figure 5.11 Box plot of best performances ( b max 1 al tI c k) on 50 images in the test set for the three cut measures. 96 v PAGE 9 Figure 5.12 Box plot of dif ference between best and the w orst performances ( b max 1 al tI c k b max 100I c k) on a per image based for the 50 images in the test set for the three cut measures. 96 Figure 5.13 Box plots of 100 mean performances on the three cut measures with aver a g e cut trained parameters (500 samples/cut). 97 Figure 5.14 Box plots of 100 mean performances on the three cut measures with normalized cut trained parameters (500 samples/cut). 97 Figure 5.15 Box plots of 100 mean performances on the three cut measures with minimum cut trained parameters (500 samples/cut). 98 Figure 5.16 Box plots of the best performance on the three cut measures with aver a g e cut trained parameters (50 samples/cut). 99 Figure 5.17 Box plots of the best performance on the three cut measures with normalized cut trained parameters (50 samples/cut). 100 Figure 5.18 Box plots of performance on the three cut measures with minimum cut trained parameters (50 samples/cut). 100 Figure 5.19 Box plot of the time (in seconds) tak en by the a v erage, normalized and the minimum cut o v er a set of 20 images. 101 vi PAGE 10 CORE ISSUES IN GRAPH B ASED PERCEPTU AL ORGANIZA TION: SPECTRAL CUT MEASURES, LEARNING P admanabhan Soundararajan ABSTRA CT Grouping is a vital precursor to object recognition. The comple xity of the object recognition process can be reduced to a lar ge e xtent by using a frontend grouping process. In this dissertation, a grouping frame w ork based on spectral methods for graphs is used. The objects are se gmented from the background by means of an associated learning process that decides on the relati v e importance of the basic salient relationships such as proximity parallelism, continuity junctions and common re gion. While much of the pre vious research has been focussed on using simple relationships lik e similarity proximity continuity and junctions, this w ork dif ferenciates itself by using all the relationships listed abo v e. The parameters of the grouping process is cast as probabilistic specications of Bayesian netw orks that need to be learned: the learning is accomplished by a team of stochastic learning automata. One of the stages in the grouping process is graph partitioning. There are a v ariety of cut measures based on which partitioning can be obtained and dif ferent measures gi v e dif ferent partitioning results. This w ork looks at three popular cut measures, namely the minimum, a v erage and normalized. Theoretical and empirical insight into the nature of these partitioning measures in terms of the underlying image statistics are pro vided. In particular the questions addressed are as follo ws: F or what kinds of image statistics w ould optimizing a measure, irrespecti v e of the particular algorithm used, result in correct partitioning? Are the quality of the groups signicantly dif ferent for each cut measure? Are there classes of images for which grouping by partitioning is not suitable? Does recursi v e bipartitioning strate gy separate out groups corresponding to K objects from each other? vii PAGE 11 The major conclusion is that optimization of none of the abo v e three measures is guaranteed to result in the correct partitioning of K objects, in the strict stochastic order sense, for all image statistics. Qualitati v ely speaking, under v ery restricti v e conditions when the a v erage inter object feature af nity is v ery weak when compared to the a v erage intraobject feature af nity the minimum cut measure is optimal. The a v erage cut measure is optimal for graphs whose partition width is less than the mode of distrib ution of all possible partition widths. The normalized cut measure is optimal for a more restricti v e subclass of graphs whose partition width is less than the mode of the partition width distrib utions and the strength of inter object links is six times less than the intraobject links. The learning frame w ork described in the rst part of the w ork is used to empirically e v aluate the cut measures. Rigorous empirical e v aluation on 100 real images indicates that in practice, the quality of the groups generated using minimum or a v erage or normalized cuts are statistically equi v alent for object recognition, i.e. the best, the mean, and the v ariation of the qualities are statistically equi v alent. Another conclusion is that for certain image classes, such as aerial and scenes with manmade objects in manmade surroundings, the performance of grouping by partitioning is the w orst, irrespecti v e of the cut measure. viii PAGE 12 CHAPTER 1 INTR ODUCTION 1.1 What is P er ceptual Or ganization? Perceptual or ganization is the ability to percei v e structures in a comple x visual en vironment. The process in v olv es structuring small pieces of information into lar ger units and appears to be highly ingrained in the human visual system to an e xtent that it is often tak en for granted and is grossly under appreciated The human visual system uses this process to such an e xtent that it sometimes imposes structure in the sensory data e ven if none e xists [9]. There is a sense of history with respect to perceptual or ganization and it traces back to the early 20th century The Gestalt psychologists recognized that it is v ery important to or ganize the sensory data and further identied that it is a hard problem. The Gestalt school of psychology jointly founded by W ertheimer [19 ], K of fka [39 ] and K ohler [40 ] demonstrated the role of or ganization by numerous selfe vident e xamples. F actors such as proximity similarity parallelism, symmetry continuity and closure [59 ] were of fered by them to describe the grouping process. The y were initially arri v ed at by a series of e xperimental observ ations made by W ertheimer [19 ]. In Fig 1.1 the rst set of dots appear independent of each other or in other w ords no tw o dots are associated with each other In the second instance, the dots closer to each other appear or seem to belong to a single group. In the third instance, the role of similarity is sho wn. Instances of parallelism, symmetry continuity and closure are seen respecti v ely in the ne xt set of arrangement in Fig 1.1. These principles in practice, act in conjunction to form the o v erall structure perception. The use of these principles in computer vision is not ne w [43 ]. Ho we v er their usage has been limited in tw o aspects. First, the ability to tune the relati v e importance of these relationships has not been e xploited. F or e xample, in some domains the parallelism relation might be a better dis1 PAGE 13 Figure 1.1. Gestalt principles of grouping. No Grouping Proximity Similarity Parallelism Symmetry Continuity Closure criminant between object and background than continuity In such cases we w ould lik e to weigh parallelism more than continuity In addition, the denition of the salient relationships themselv es entail uncertainty In this dissertation, we of fer a frame w ork that casts the grouping parameters as probabilities, which are learnt from a set of training images of objects in their natural conte xts. While the Gestalt psychologists were able to sho w the importance of or ganization and able to recognize what is being calculated, made their respecti v e w orks popular the y ho we v er were not able to e xplain why or ho w it is calculated. Some of the core ideas, which turn up in later w orks, are 2 PAGE 14 that Gestalt psychology is not just a psychology based on perception b ut has also been look ed at aspects of producti v e thinking, memory and learning, and more. The importance of attention, attitude, interest and or ganizational f actors in the perceptual e xperience is not denied. One of the basic principles of Gestalt psychology is that the or ganization e xtends to w ard Pr ¨ agnanz or the tendenc y of a process to realize the most re gular ordered, balanced and stable state of a gi v en situation. The inuence of past e xperience is not denied b ut the empiricist vie w that past e xperience is a uni v ersal e xplanatory principle is denied. When W ertheimer K ohler and K of fka independently proposed the Gestalt la ws, it w as not clear as to which rule w as to be applied under certain conditions. K of fka attempted to inte grate these la ws, and this formulated la w came to be kno wn as Pr ¨ agnanz The dri ving force behind this w ork w as in attempting to understand the human visual system. It w as not until Marr [44 ] addressed the issues in terms of his three le v els of representation of an infor mation processing and emphasized by W itkin and T enenbaum [95 ] and Lo we [43 ], that perceptual or ganization w as adopted in mainstream computer vision research. Marr' s [44 ] in his seminal w ork proposed three le v els at which an information processing can be represented. The rst is the computational theory that outline the goal of the computation and appropriateness. The second le v el (middle) is the representation and algorithm, where details of ho w to accomplish the computational theory is e xplained. P articularly representation for the input & output and the algorithm are required. The third le v el proposed is the hardw are implementation, where physical realization of the representation and the algorithm is made. Most of the research in computer vision and articial intelligence can be represented by these le v els. Marr [44 ] proposed a computational theory of vision based on processing important information re garding local intensity changes and their geometrical distrib ution and or ganization. The tok ens at this stage are then grouped together to form higher le v el tok ens. Ultimately Marr' s paradigm combines the full primal sk etch and v arious shape properties and comes up with a 2.5D sk etch (all combinations of intrinsic scene representations). The 3D model representation is essentially to describe shapes and their spatial or ganization in an objectcentered coordinate frame. Thus grouping can be characterized as a process that groups lo wle v el image features based on emer gent or ganization e xhibited among them, without the use of specic object model kno wledge. 3 PAGE 15 As a consequence of the generic nature of this process, it imparts e xibility to vision systems that are b uilt upon it. While grouping can be thought of as a process where the coherent or similar features are grouped together gureground on the other hand is the process where the set of features are separated into a fore ground (object of interest) and background. In a w ay it distinguishes the object from its surroundings. Grouping is a stage at which we can transcend from the le v el of pix els to higher le v els of representing image data. Gi v en an image, there are ob viously w ay too man y pix els to process and grouping can classify whether these pix el features belong to an y particular object or could just dif ferentiate if the features belong to the fore ground (as in object) or background. Other than pix els, the features those are often used are edge chains or e v en re gions to name a fe w among the plethora of features which can be computed from an y ra w image. The importance of the grouping process has been theoretically established by Clemens & Jacobs [13 ] in the conte xt of inde xing based recognition and by Grimson [27 ] in the conte xt of constrained search based recognition, where the combinatorics reduce from an e xponential to a lo w order polynomial if we use an intermediate grouping process. Fig 1.2 sho ws both the computational and the representational & algorithmic le v els according to Marr The input and the output denitions form the computational le v el, while the rest of the blocks in the Fig 1.2 represents the representation and algorithm (middle) le v el. At this middle le v el, the strate gy of ho w the accomplish the computational le v el is described. The lo wle v el features are e xtracted from the ra w image pix els and the features considered here are straight line and arcs edges. The strate gy follo wed in this dissertation is to use these features along with the the Gestalt relations which could be parallelism, continuity junctions, symmetry etc, between these features, to a graph is constructed. This graph is then partitioned using one of the cut measures, either a v erage, normalized or minimum, which results in clusters. These clusters are lar ge indi vidual groups which are essentially formed by corresponding objects or object parts e xhibiting or ganization and structure. What is remarkable is that this grouping process need not be perfect. Ev en groups with missing object features and some amount of background features drastically reduces the combinatorics. 4 PAGE 16 Figure 1.2. Computational theory for Graph based Grouping. Extract lowlevel features Construct Graph (Gestalt relations) Partition (Minimum, Average, Normalized) Groups Edges (Straight lines, arcs) Graphs Image Pixels 1.2 Contrib utions of this W ork The no v el contrib utions of this w ork consists of, 1. Analytical model of the graph spectra based grouping frame w ork, to form lar ge perceptual groups from relations dened o v er small number of image primiti v es. Our major conclusion is that optimization of none of the three measures is guaranteed to result in the correct partitioning into K objects, in the strict stochastic order sense, for all 5 PAGE 17 image statistics. Qualitati v ely speaking, under v ery restricti v e conditions when the a v erage inter object feature af nity is v ery weak when compared to the a v erage intraobject feature af nity the minimum cut measure is optimal. The a v erage cut measure is optimal for graphs whose partition width is less than the mode of distrib ution of all possible partition widths. The normalized cut measure is optimal for a more restricti v e subclass of graphs whose par tition width is less than the mode of the partition width distrib utions and the strength of inter object links is six less than the intraobject links. 2. A learning frame w ork, based on game theory and learning automata, to adapt the perceptual grouping process to an image domain. W e present a e xible, learnable, perceptual or ganization frame w ork based on graph partitioning. The graph spectral techniques f acilitate the easy consideration of global conte xt in the grouping process. And, an Nplayer automata frame w ork learns the grouping algorithms parameters. The performance of the grouping algorithm is demonstrated on a v ariety of images and the po wer of the algorithm is further enhanced when it is sho wn that the relati v e importance of the salient relations is dependent on the object or domain of interest. 3. Empirical e v aluation of three popular graph cut measures, namely minimum, a v erage and normalized cuts, on o v erall group quality W e e v aluate three measures, gi v en the task of grouping e xtended edge se gments. Our ndings in this re gard suggest that the quality of the groups with each of these measures is statistically equi v alent, as f ar as object recognition is concerned. W e also nd that the performance of the groupingbypartit ion in g strate gy depends on the image class. 1.3 Lay out of the Dissertation In Chapter 2, we look at the state of the community in the eld of perceptual or ganization and prior w orks in learning and on graph modeling, as applied to perceptual or ganization. In Chapter 3, we introduce the learning frame w ork, where we use Gestalt based relations for graph construction, spectral partitioning for graph clustering and the performance measures. The core of the learning 6 PAGE 18 algorithm based on the learning automata is described here. In Chapter 4, we present theoretical analysis on the cut measures. In Chapter 5, we subject the cut measures to rigorous empirical e v aluation. W e describe the ground truth protocol and present e xtensi v e statistical analyses. In Chapter 6, we discuss the implications of the w ork on the research in perceptual or ganization and also look at some of the potential directions in which more attention ought to be paid. 7 PAGE 19 CHAPTER 2 RELA TED W ORK In computer vision, the term perceptual or ganization has been used by v arious researchers in v arious conte xts, at dif ferent le v els of vision processing, and with respect to dif ferent feature types. This practice has blurred the meaning of the term perceptual or ganization. T o restore focus to this domain, Sarkar and Bo yer [67 8] proposed a classicatory structure and a nomenclature, based on the sensor signal dimensionality le v el of abstraction, and module inputs and outputs. That is, perceptual groupings dif fer from one another with respect to the types of constituent features being or ganized and the dimensions o v er which the or ganizations are sought [95 page 521]. W e used these tw o f actors as tw o ax es in our classicatory structure as depicted in T able. 2.1. One axis represents the dimensions o v er which or ganization is sought: 2D, 3D (or 21/2 D), 2D plus motion, and 3D (21/2 D) plus motion. The other axis denotes the feature types to be or ganized, stratied by layers of abstraction: signal le v el, primiti v e le v el, structural le v el, and assembly le v el. The signal le v el pertains to or ganizing the ra w signal. F or e xample, gray le v el images in 2D, range images in 21/2 D, motion sequences in 2D plus motion, and range sequences in 3D plus motion. The ne xt tw o le v els (primiti v e, structural) are based on the dimensionality of the feature with respect to the domain of or ganization. The criterion of dimensionality although not strictly dened here mathematically refers to the number of parameters that are needed to dene a feature. F or e xample, in a 2D static image a contour se gment is a one dimensional manifold while a ribbon is tw o dimensional. The primiti v e le v el deals with or ganizing features e xtracted from the signal le v el into lo wer dimensional manifestations in the or ganizing eld. F or e xample, constant curv ature se gments and re gion boundaries b uilt from edge maps are 1D manifolds embedded in 2D; surf aces are 2D man8 PAGE 20 ifolds in 3D. Hence, constant curv ature se gments and surf aces constitute primiti v e le v el or ganizations in their respecti v e domains. At the structural le v el the or ganized features ha v e the same dimensionality as that of the space in which the y are being or ganized. Ribbons and closed re gions are 2D manifestations in 2D and, therefore, represent structural le v el features for 2D or ganization. The assembly le v el is concerned with further or ganizing the structural le v el features. Or ganizations such as parallel sets of ribbons or box es constitute the assembly le v el for 2D grouping. In T able 2.1 we summarize the current state of art. The information in each box of the matrix sho wn in T able 2.1 is arranged as follo ws. The rst ro w lists some of the typical features to be or ganized at this le v el and sensor signal dimensionality The second ro w lists some typical output or ganizations from modules at this le v el and dimension. The third ro w lists some of the representati v e w ork in this area. None of these lists are e xhausti v e; this is just a sampling to con v e y a statistical impression. 9 PAGE 21 T able 2.1. Classicatory structure for perceptual or ganization. Each box has three ro ws. The rst ro w lists some of the typical features to be or ganized at this le v el and dimension set. The second ro w lists some typical output or ganizations from modules at this le v el and dimension. The third ro w lists some of the representati v e w ork in this area. 2D 3D(21 2D) 2D + time 3D + time Structures found belo w Structures found belo w Structures found belo w Structures found belo w Assembly Lar ge, Re gular Lar ge, Re gular Coherent motion grouping, Coherent motion grouping, Le v el arrangements arrangements Articulated motion grouping Nonrigid motion grouping [58 49 61 ] [32] Edge&Re gion primiti v es Coparametric surf aces, Boundaries Flo w types&Boundaries Flo w streams&Boundaries Structural Ribbons, Corners, Mer ges P arallel, Continuous patches Flo w type grouping Groups of o w streams Le v el Polygons, Closed re gions T etrahedral V erte x Combinations Correlated motion grouping Correlated motion grouping [43 47 46 20 68 69 ] [23 ] [77, 26, 72] [42, 62, 12, 92, 37 25 3] [30 84 49 50 74 18 83 66 ] Re gions, Edge chains Surf ace P atches & Clusters Optic o w patches 3D Flo w patches Primiti v e Surf ace f aces Coparametric surf aces Swirls, V ortices, V ortices, Swirls Le v el Contour se gments Occlusion detection Sinks, Sources Sinks, Sources [43, 2, 60 85 33] [7, 23 ] [98, 88, 81, 51] Dots, Pix els 3D points, Range Mo ving points/pix els Mo ving 3D Points Signal Dot clusters, Edge chains Surf ace P atches, Discontinuities Coherent Pix el Coherent Motion Le v el Re gions, T e xture patches Point clusters Motion Groups Groups [101 100 102 76 1, 31] Range se gmentation w ork [79, 80] [17, 28, 79, 36, 2, 11, 93] [85 ] 10 PAGE 22 Based on the re vie w of the recent w ork listed in T able 2.1 and the pre1993 w ork re vie wed in [67 ], we of fer the follo wing observ ations. 1. The w ork in the last ten years (19932003) continues to follo w the pattern established prior to 1993. Most of the w ork in perceptual or ganization has been at 2D at the signal, primiti v e, and structural le v els, with some increase in emphasis at the structural le v el. Perceptual or ganization w ork in 2D and motion has not yet seriously v entured past the signal and primiti v e le v els. 2. Most w ork in perceptual or ganization for 2D images has concentrated on e xtracting continuous contours by grouping image pix els, using primarily the properties of proximity and continuity [28 2 76 17, 92 11 ]. Of the w ork in perceptual or ganization with e xtended primiti v es, such as lines or arcs, the ef fort has been mostly to form simple, small groups of primiti v es such as parallels [20 ], con v e x outlines [37 ], ellipses [62 68 ], and rectangles [47 68 69 ]. W e belie v e this is partly due to the rarity of f ast computational frame w orks in which to form lar ge assembly le v el groups. 3. The w ork in this dissertation is for 2D images and uses e xtended primiti v es to form lar ge groupings. A graph spectra based frame w ork which has polynomial comple xity is the heart of the grouping engine. The adv antages of w orking at the primiti v e le v el are: (a) Lesser number of features to w ork with than at the signal le v el. (b) Edges are better for model representations. (c) Higher le v el relations lik e parallelism, symmetry etc can be utilized. (d) More tolerant to illumination changes and noise 4. Prior to 1993, Perceptual or ganization w ork in 2D and motion had not seriously v entured past the signal and primiti v e le v els. Presently w orks ha v e started appearing at all le v els. Mainly spatiotemporal grouping is performed in these images. 5. The w ork in 3D + time images ha v e not been v entured at all. Presently there are no kno wn w orks at all le v els here. 11 PAGE 23 2.1 W orks on Lear ning in P er ceptual Or ganization W e present a computational model that inte grates a v ariety of salient relationships such as parallelism, continuity common re gion, and perpendicularity among e xtended tok ens to form lar ge groups. Herault and Horaud [31 ] group edgels based on a quadratic cost function which is deri v ed specically from the relations, cocircularity smoothness and proximity Further the y use simulated annealing to solv e the gureground problem. The y consider the shape/noise discrimination problem as a combinatorial optimization problem. McCaf ferty [46 ] also uses simulated annealing to optimize an ener gy function which is formed by dif ferent Gestalt principles. His ener gy formulation handles lines and re gions within the same frame w ork. The ener gy function in v olv es the Gestalt relations, particularly continuity similarity proximity and closure. The relati v e contrib ution can be adjusted allo wing for higher le v el interactions. Zhu [99 ] uses Mark o v Random Fields to learn shape models from images. The shape models are learned from observ ed natural shapes based on a minimax entrop y learning theory The learned shape models themselv es are Gibbs distrib utions dened on MRFs. The neighborhood structures of these MRFs naturally correspond to the Gestalt la ws, namely collinearity proximity cocircularity parallelism and symmetry and as a result both contour and re gion based features are inherently encoded. Peng and Bhanu [54 ] w ork ed on a closed loop recognition system using a Bernoulli quasilinear unit for each parameter v alue and combine these to form a team. Instead of just relying on bottomup process, the y also utilize the reinforcement learning en vironment in tuning the parameters of the algorithm. Our w ork to this re gard is dif ferent in that the proposed learning automata frame w ork can tak e care of parameter dependencies in the search process. Hoogs, et al [33 ] pro vide a set of perceptual observ ables that pro vide a single image description for grouping, gureground and te xture analysis. The image is modeled as a partition of surf aces bounded by intensity discontinuities and deri v e perceptual measures as relations between neighbor 12 PAGE 24 ing surf aces. The gureground se gmentation is based on an image graph where the graph nodes are image re gions and as a result the graph size is drastically reduced. 2.2 Graph Based Gr ouping Engine One of the most common approaches to grouping is based on graph representations that capture the structure amongst lo wle v el primiti v es such as image pix els, edge pix els, straight lines, arcs, and re gion patches. The links of the graph, which are typically weighted, capture the association or af nity between the primiti v es. There are tw o dif ferent classes of approaches for forming groups from this graph representation. First is the class of techniques that search for special graph structures such as c ycles [57 35 37 70 38 ], cliques [70 ], spanning trees [97 ], or shortest paths [70 11 ]. The second class of techniques seeks to nd clusters of graph nodes based on some global coher ence criterion [71 93 55 73 79 96 24 90 15 ]. In particular we look at techniques that seek to form these node clusters by partitioning the graph. W u and Leahy [96 ] proposed the concept of using minimum cuts for image se gmentation. The y constructed a graph whose nodes represent pix els and links indicate af nities which are deri v ed from proximal relations. A sparse graph w as created by using a suitable threshold for the link weights. Clusters were formed by recursi v ely nding the minimum cuts of this graph using an algorithm based on the F ordFulk erson theorem. Gdalyahu, et al. [24 ] approach the graph partitioning problem by stochastic clustering. The y partition the graph into k parts by inducing a probability distrib ution o v er each cut that decreases monotonically with its capacity Shi and Malik [79 ] suggested the no v el normalized cut measure for grouping edge pix els. The normalized cut measure is the ratio of the edges cut to the product of connecti vity (v alenc y) of the nodes in each partition. Perona and Freeman [55 ] considered an asymmetric v ersion of the Shi and Malik normalized cut measure. The Perona and Freeman cut is the ratio of the edges cut to the total edges cut in the fore ground objects. There is an inherent approximation when computing the pairwise relationships between all elements with a pointwise property of each element and this 13 PAGE 25 can be interpreted as a salienc y measure, which when thresholded will pro vide the gure in the gureground. In this dissertation we use a graph partition based frame w ork, b ut for grouping constant cur v ature edge se gment primiti v es. W e use a partition metric that can be sho wn to be equi v alent to minimizing the cut weight, normalized by the product of the sizes of each partition, and hence can be termed as the a v erage cut. Both the normalized and the a v erage cuts can be well approximated by a solution constructed out of the graph spectra. A graph spectrum is the set of eigen v alues and eigen v ectors of the matrix representation (e.g. adjacenc y Laplacian, normalized Laplacian, etc.) of the graph. 2.3 Pr obabilistic Modeling of the Gr ouping Pr ocess The w orks rele v ant to probabilistic analysis of grouping algorithm are the analyses performed by Amir and Lindenbaum [2] and Berengolts and Lindenbaum [4]. Their analysis is based on a binomial distrib uted cue. The number of background points f alsely added to the group is used to quantify the grouping quality The y pro vide an upper bound on the number of f alse additions to the fore ground. Their analysis is done on complete graphs as well as on locally dense graphs ( k connected). Recently Berengolts and Lindenbaum [4 ] analyzed the connected components algorithm and used a probabilistic model to deri v e e xpressions for addition errors and the group fragmentation rate, taking into account interfering or nonindependent cues Three studies that considered comparison of dif ferent graph clustering methods are those of W eiss [91 ], who studied similarities of graph spectral methods for se gmentation, W illiams and Thornber [94 ], who considered clustering methods based on the af nity matrix and Matula [45 ], who considered clustering methods based on the proximity matrix. W eiss compared four dif ferent spectral algorithms, namely Perona and Freeman [55 ], Shi and Malik [79 ], Scott and LonguetHiggins [75 ], and Costeira and Kanade [14 ], and proposed one of his o wn combinations for se gmentation. In his analysis, W eiss considered an image with tw o clusters with dif ferent b ut constant within cluster and between cluster af nities. He e xtended his analysis to the case when the v ariation of the within and between cluster dissimilarities are small and to the case when between 14 PAGE 26 cluster af nities are zero. He found that Scott and LonguetHiggins performs well for constant blocks (clusters of the same size) and ShiMalik performs well for nonconstant blocks. W illiams and Thornber consider the association based salienc y measures of Shashua and Ullman [76 ], Herault and Horaud [31 ], Sarkar and Bo yer [71 ], Guy and Medioni [28 ], and W illiams and Jacobs [93 ]. The y also propose a ne w salienc y measure that denes salience of an edge to be the relati v e number of closed random w alks that visit that edge. The y rst compare these salienc y measures on simple test patterns consisting of (30, 20, and 10 uniformly spaced) edges from a cir cle in a background of 100 edges of random position and orientation. Performance w as quantied based on the computed saliencies of the object (circle) edges. In the second part of the study the y used edgedetected 32x32, images of 9 dif ferent fruits and v e getables in front of a uniform background. T o simulate realistic noisy backgrounds, the y used 9 Cann y edge detected te xtured images as mask patterns. The test patterns were then constructed by ANDing the v e getable silhouettes into the center 32x32 re gions of the 64x64 sized edge detected te xtures. F or their test setup, the y use a total of 405 patterns with dif ferent signaltonoise ratios and discuss just the f alse positi v e rate compared to the noisy edges. Ho we v er the strate gy for choosing the parameters for each of the salienc y measures is not clear Matula [45 ] used the connecti vity feature of the graph to induce subgraphs of the proximity graph. He deri v es three measures for clustering methods namely kbond, kcomponent and k bloc k These metrics are based on a cohesi v eness function, which w as dened for all nodes and edges of the graph, as the maximal edgeconnecti vity of an y subgraph containing k elements. He also mentions briey about applying these measures on random proximity graphs. The contrib utions of e v aluation of the cut measures in this dissertation are tw o fold. First, we analytically relate the nature of each partitioning measure to the underlying image statistics. This lets us quantify under what conditions minimizing each measure w ould gi v e us the correct partitions. Our major conclusion is that optimization of none of the three measures is guaranteed to result in the correct partitioning into K objects, in the strict stochastic order sense, for all image statistics. Qualitati v ely speaking, under v ery restricti v e conditions when the a v erage inter object feature af nity is v ery weak when compared to the a v erage intraobject feature af nity the mini15 PAGE 27 mum cut measure is optimal. The a v erage cut measure is optimal for graphs whose partition width is less than the mode of distrib ution of all possible partition widths. The normalized cut measure is optimal for a more restricti v e subclass of graphs whose partition width is less than the mode of the partition width distrib utions and the strength of inter object links is six less than the intraobject links. Second, we empirically e v aluate the groups produced by graph partitioning, based on the three measures, viz. mincut, a v erage cut, and normalized cut, gi v en the task of grouping e xtended edge se gments. Our ndings in this re gard suggest that the quality of the groups with each of these measures is statistically equi v alent, as f ar as object recognition is concerned. W e also e xamine whether the performance of the groupingbypartit ion in g strate gy depends on the image class. Further a set of 100 real images are used which mak es it quite thorough and uncommon in most aspects in the eld of computer vision. Borra and Sarkar [6] did an e v aluation on grouping modules for constrained search and inde xing based object recognition, where the y used 3 edge based modules on a dataset of 50 real images. W illiams and Thronber [93 ], as pointed out earlier use a total of 405 semisynthetic image patterns on their dataset. 16 PAGE 28 CHAPTER 3 SUPER VISED LEARNING OF LARGE GR OUPING STR UCTURES In this chapter we introduce the proposed learning frame w ork. Most of the pre vious w orks ha v e been to use one of the salient Gestalt relations instead of using combinations. W e consider a linear weighted combination of the saliencies of the relations e xpressed as posterior probabilities. The parameters of the grouping process is cast as probabilistic specications of Bayesian netw orks that need to be learned. The learning is accomplished by a team of stochastic learning automata. The grouping process is able to form lar ge groups from relationships dened o v er a small set of primiti v es and is also f ast. W e rst e xplain each module in the system is and then dissemble the learning strate gy along with ground truth methodology and performance measures. W e also demonstrate the ability to learn to group features of a single object type, airplanes in this case in the presence of dif ferent types of background clutter and demonstrate the ability to learn to form groups that correspond to se v eral object types in a particular domain, e.g. aerial. Gestalt psychologists ha v e of fered a set of la ws that are important in gureground se gmentation as we ha v e already seen in Chapter 1, such as the la ws of parallelism, continuity similarity symmetry common re gion, and closure [59 ]. The use of these principles in computer vision is not ne w [43 ]. Ho we v er their usage has been limited in tw o aspects. First, the ability to tune the relati v e importance of these relationships has not been e xploited. F or e xample, in some domains the parallelism relation might be a better discriminant between object and background than continuity In such cases we w ould lik e to weigh parallelism more than continuity In addition, the denition of the salient relationships themselv es entail uncertainty W e of fer a frame w ork that casts the grouping parameters as probabilities, which are learnt from a set of training images of objects in their natural conte xts. 17 PAGE 29 Second, most past ef forts ha v e been to form simple, small groups of features such as parallels [20 ], con v e x outlines [37 ], ellipses [62 ], and rectangles [47 ]. This is partly because of the rarity of f ast frame w orks to form lar ge feature groups. The computational dif culty arises from the f act that the search space for lar ge groups gro ws e xponentially with the number of features in a group. But, lar ge feature groups are important. It is highly unlik ely for lar ge or ganized groups to arise by chance. Hence, according to the la w of accidentalness [43 ], the signicance of a lar ge or ganization is higher than a small or ganized form. W e present a computational model that inte grates a v ariety of salient relationships such as parallelism, continuity common re gion, and perpendicularity among e xtended tok ens to form lar ge groups. Appr oac h to the Solution: One possible strate gy for deciding on the relati v e importance of salient geometric relationships is to consider 2D or 3D object models. Statistical analysis of these models could pro vide estimates of the v arious grouping parameters. F or e xample, one could look at the distrib ution of the angles between pairs of straight lines in the model and decide on the grouping angle tolerance. Ho we v er isolated object models do not constitute a suf cient basis for the grouping parameter decisions. It is job of the grouping algorithm to se gr e gate an object from not only the background clutter b ut also from other objects in the scene. Based on isolated object models we might be able to decide on the associati v e parameter v alues between features within a model, b ut these v alues will not guarantee se gre gation either from the background clutter or from other objects. W e ha v e to also consider the statistics of the background clutter and the scene conte xt of the objects. Ho we v er the modeling of both these f actors is an open and dif cult problem. So we suggest using training set of images of objects in conte xt b ut with the objects of interest manually outlined. Based on this training set of images, the importance of each relationship is learned using a Nplayer stochastic automata game frame w ork. Unlik e the usual gradient descent algorithms, which can guarantee only a local minimum, the learning automata based Nplayer game frame w ork con v er ges to the global optimum with proper choice of its learning rate [86 ]. Observ e that in this frame w ork the inuence of the object models on the grouping process is only statistical in nature and is implicit through the use of training images. W e do not require e xplicit, detailed object models. 18 PAGE 30 Figure 3.1. Schematic of the grouping strate gy Edge Detector Spectral Partitioning Scene Structure Graph Feature Grouping Automata Team of Learning Parameter Learning Performance Evaluator Ground Truth Feedback Image Output W e assemble the contrib utions of the indi vidual salient relationships o v er small number of primiti v es using a graph the scene structure graph. This graph is partitioned to form lar ge groups of features using the graph spectrum. Graph spectrum refers to the or der ed set of eigen v alues (along with their eigen v ectors) of the matrix representation of a graph. The o v erall grouping process is f ast. F or a 512 by 512 image, it tak es on an a v erage 5 seconds (on a Sparc Ultra) to compute the salient groups. As we shall see, the algorithm can cope with signicant image clutter Fig. 3.1 depicts the o v ervie w of the strate gy The input to the grouping algorithm consists of lo w le v el image features such as constant curv ature edge se gments (arcs and straight lines). The output consists of salient groups of lo w le v el features. The feature grouping algorithm consists of tw o parts: scene structure graph construction and spectral partitioning. A weighted relational graph captures the salient relationships among the edge tok ens. Probabilistic Bayesian netw orks quantify these salient relationships. The uncertainty in the denition of the relationships and their relati v e importance are captured using probability measures. Section 3.1 discusses in detail this relational graph construction. Section 3.2 outlines the graph spectral algorithm used to partition the relational graph into feature clusters. Because of the use of graph representations, the output groups do not ha v e a single global functional description such as elliptical, parallelogram, etc., b ut are described by the strong pairwise interactions between the features. This denition of groups tends to encompass a lar ger class of feature distrib utions than functional descriptions. 19 PAGE 31 Figure 3.2. The geometric attrib utes used to classify pairwise edge se gment relationships are sho wn in (a), (b) and (c). The length of the se gment e 1 is greater than the length of e 2 The photometric attrib utes of an edge pix el are sho wn in (d). Image Profile at an edgeGray LevelsImage Location (a) (b) (c) (d) derivative D 2 e 2 e 1 D 1 D 3 D 5 D 7 D 6 D 8 e 1 e 2 e 1 e 2 D 9 wwrrResponse of the 2 nd The probabilities that underly the relational graph, along with the other algorithm parameters, are learned using a Nplayer automata game frame w ork. As we shall see, in addition to the 6 prior probabilities that capture the relati v e importance of the salient relations, we ha v e 9 grouping tolerance parameters and 6 feature detection parameters that need to be chosen. The learning frame w ork, which decides on all these parameters, is able to account for dependence amongst parameters in the search process. W e belie v e that the learning frame w ork can also be used to learn parameters of other vision algorithms. These learning automata need supervisory feedback. This feedback is automatically generated by comparing the output of the grouping algorithm with manually outlined training images. The learning algorithm is discussed in detail in Section 3.3. 3.1 Scene Structur e Graph Specication The structure among the constant curv ature edge se gments is captured using a graph representation, whose nodes represents the se gments and the links quantify the saliencies between the se gments. The Gestalt inspired relationships of parallelism, perpendicularity (T and Ljunctions), proximity continuity and common re gion form the basis for the formulation of the link weights 20 PAGE 32 between an y tw o nodes representing the constant curv ature se gments. The links are quantied based on the follo wing attrib utes computed between an y tw o edge se gments as sho wn in Fig 3.2. 1. The maximum and the minimum distance of the end points of the smaller se gments to the lar ger se gment, normalized by the length of the lar ger se gment. 2. The o v erlap between the tw o se gments. 3. The minimum distance between the end points of the tw o se gments, normalized by the length of the lar ger se gment. 4. The dif ference in slope. 5. In addition to the abo v e geometric attrib utes, we compute tw o photometric attrib utes, r ma g and r wid t h that are based on the response of the second deri v ati v e of the smoothed image function near the edge. At the edge, this response is of course zero, ho we v er a w ay from the edge the response peaks to a maximum on one side of the edge and to a minimum on the other side. These peaks capture the beha vior of the image function on either side of the edge. F or each edge se gment we compute the a v erages along the se gment of the magnitudes ( r r) and the distances ( w w) of these e xtremum points from the edge location. W e quantify the photometric attrib utes between tw o edge se gments e i and e j by r ma gmaxrirj 05rirj rirj 05rirjnand w wid t hmaxwiwj 05wiwj wiwj 05wiwjrnBased on these photometric and geometric attrib utes, we classify and quantify the relationship between each pair of edge se gment as being parallel, T junction, Ljunction, continuous, or none oftheabo v e. This is achie v ed using the maximum aposteriori probability (MAP) strate gy based on the conditional probabilities of inferring the relationships based on the computed attrib utes. The aposteriori probabilities are computed using Bayesian netw orks, which ef ciently and succinctly encode the relational priors and the apriori conditional probabilities. These priors and conditional probabilities are e xpressed in parametric forms whose parameters, in turn, form the parameters of the grouping system. W e also use Bayesian netw orks to quantify the proximity and the common 21 PAGE 33 re gion f actors between tw o edge se gments in a MAP f ashion. The sum of these three maximum aposteriori probabilities form the weights of the scene structure graph. 3.1.1 Bay esian Netw orks Based on the v alues of the photometric and geometric attrib utes we classify each pair of edge se gment into straight parallel, T junction, Ljunction, continuous, ribbon, proximal, sharing a common re gion, or noneoftheabo v e. This classication requires noncrisp representations of the relationships, which are, typically uncertain. W e use probabilistic Bayesian netw ork representations to model these uncertainties. Bayesian netw orks are graphical representations of joint probability specications [52 ]. The nodes of a Bayesian netw ork represent the indi vidual random v ariables and its directed links denote the direct dependencies between tw o v ariables. The links are quantied by the respecti v e conditional probabilities. Not only does the netw ork representation encode e xplicitly the dependencies between v ariables b ut it also f acilitates ef cient probabilistic updating upon the arri v al of ne w information. F our Bayesian netw orks (see Fig. 3.3) classify the 2ary relationships. One netw ork classies pairs of straight lines into parallels(P), T junctions(T), Ljunctions(L), or continuous lines(C). The second netw ork classies pairs of arcs into cocircular(C) and parallels (ribbons, R). The third netw ork computes the signicance of the re gion similarity (Re g) between the edge se gment. And, the fourth netw ork classies proximity relations (Pr). The random v ariables of the Bayesian netw orks are the relational attrib utes and the relations themselv es, denoted by P T L, C, R, Re g, and Pr Note that there is a node in each net denoted by N. This node represents the noneoftheabo v e choice and captures the probability that the line arrangement could ha v e arisen just by chance. Bayesian netw orks allo w the consideration of the dependence of dif ferent relational types upon each another As a consequence, the quantication of a relationship, such as parallelism, between a line pair tak es into account not only the e xtent to which the line pair participates in the parallelism relation b ut also the e xtent of its participation in other possible relationships such as continuity perpendicularity etc. 22 PAGE 34 Figure 3.3. Bayesian netw orks used to classify pairs of edge se gments into the Gestalt inspired salient relationships. The netw orks in (a) and (b) classify pairs of straight lines and arcs, respecti v ely The netw ork in (c) computes the signicance of the photometric similarity And, the proximity signicance is computed using (d). min d max d lap o dist e P L T C N q dist e Pr N min d max d lap o dist c C N R mag r width r N Reg (a) (b) (c) (d) Figure 3.4. Basic forms of the conditional probabilities of the Bayesian netw orks. a b T(a,b) U(a,b) a b a b Tn(a,b) a b Tp(a,b) The probabilities that need to be specied in the Bayesian netw ork are the prior probabilities of the v arious relations, (P L, T R, C, Pr N, Re g), and the conditional probabilities of the relational attrib utes gi v en the relations. The prior probabilities constitute an ef cient mechanism for incorporating the relati v e importance of the v arious relationships. A lo w prior v alue for a relation w ould result in lo w nal probabilities, thus weighing do wn the ef fect of that relation. A high prior w ould indicate high importance of the relation. In the absence of e vidence to the contrary we assume equal prior for noneoftheabo v e (N) relation. Since the ribbon (R) and parallel (P) relations 23 PAGE 35 denote essentially the same relation, namely parallelism, we use the same prior for both of them. Thus, we ha v e six priors that can chosen (or learnt), namely the priors for P L, T C, Pr and Re g. F or the conditional probabilities, we need to specify the probability of an attrib ute gi v en the state of its parents in the Bayesian netw ork. F or e xample, the relational attrib ute, d max has P C, and N as its parents. So, we need to specify: Pd maxdPpCcNn, where pcand n denote the binary states of the parents. In the general case, this w ould require specifying 8 conditional probabilities for d maxd corresponding to v arious combinations of the states of the parents. Ho we v er in our case we kno w that a pair of straight line can e xhibit only one of the three relations. Thus, we need to specify only Pd maxdP1C0N0, Pd maxdP0C1N0, and Pd maxdP0C0N1; the probabilities for other combinations are zero. These three conditional probabilities represent the distrib ution of d max for a parallel, continuous, and noneoftheabo v e relationships, respecti v ely F or a parallel relation, d max should neither be zero nor should it be v ery lar ge. Recall that, d max is a measure of the distance between the tw o lines. Thus, the parallel lines should not be collinear which is the case accounted for by the continuity relation, nor should the y be v ery f ar apart. So, we represent the density using the triangular function, T0b, sho wn in Fig. 3.4. Ho we v er d max should be ideally zero for a continuity relationship so, we choose Pd maxdP0C1N0to be of the form T n0bas sho wn in Fig. 3.4. The node N represents the completely random scenario, thus Pd maxdP0C0N1is an uniform density function o v er (0, 1). T able 3.1 lists the forms of the conditional probability densities of the Bayesian netw orks. W e construct all the conditional probabilities out of the density functions sho wn in Fig. 3.4. The dif fer ences are in the parameters of the functions. All the conditional density functions are characterized by 7 parameters, d t ol o t ol q t ol c t ol d con t p t ol and r t ol These parameters represent the ef fecti v e tolerances used in the grouping process. Thus, d t ol is the distance tolerance, o t ol is the o v erlap tolerance, q t ol is the orientation tolerance, c t ol is the tolerance between tw o arc centers, d con t is the distance tolerance for continuity p t ol is the proximity tolerance, and r t ol represents re gion toler ance. 24 PAGE 36 T able 3.1. Conditional probabilities used in the Bayesian netw orks. The functions TT nT pand U are as dened in Fig. 3.4. Pd minP1L0T0N0T02 d t ol Pd minP0L1T0N0T n0d t ol Pd minP0L0T1N0T0d t ol Pd minP0L0T0N1U01 Pd maxP1C0N0T02 d t ol Pd maxP0C1N0T n0d t ol Pd maxP1C0N0U01 Po l a pP1L0T0C0N0T p1o t ol1 Po l a pP0L1T0C0N0T o t ol2o t ol2 Po l a pP0L0T1C0N0T n0o t ol2 Po l a pP0L0T0C1N0T p d con t0 Po l a pP0L0T0C0N1U01 PqP1L0T0C0N0T n0q t ol PqP0L1T0C0N0T p1q t ol1 PqP0L0T1C0N0T p1q t ol1 PqP0L0T0C1N0T n0q t ol PqP0L0T0C0N1U01 Pe d is tL1T0N0T n0d t ol Pe d is tL0T1N0T nd t ol21 Pe d is tL0T0N1U01 Pd minR1C0N0T02 d t ol Pd minR0C1N0T n0d t ol Pd minR0C0N1U01 Pd maxR1C0N0nT02 d t ol Pd maxR0C1N0nT n0d t ol Pd maxR0C0N1nU01 Po l a pR1C0N0T p1o t ol1 Po l a pR0C1N0T p d con t0 Po l a pR0C0N1U01 Pc d is tR1C0N0T n0c t ol Pc d is tR0C1N0T n0c t ol Pc d is tR0C0N1U01 Pe d is tPr1N0T n0p t ol Pe d is tPr0N1U01 Pe d is tPr1N0T n0p t ol Pe d is tPr0N1U01 Pr ma gRe g1N0T n0r t ol Pr ma gRe g0N1U01 Pr wid t hRe g1N0T n0r t ol Pr wid t hRe g0N1U01 3.1.2 Quantication of the Scene Structur e Graph The described Bayesian netw orks classify each edge se gment pair into dif ferent salient Gestalt inspired relations. Each pair of edge se gments instantiates the respecti v e attrib ute nodes in the Bayesian netw orks. Messages propagate in the netw ork according to the method of conditioning [52 ], updating the probabilities. The parent node with the highest probability determines the type of the relation between the pair of se gments. The v alue of the probability quanties the quality of the relation. Thus Pr obP i jdenotes the condence that the relationship between the i th and j th features is parallelism. W e combine the quantied relations to generate the link weights of the scene structure graph (SSG), w i j between tw o nodes as sho wn belo w w i j r maxPr obP i j Pr obR i j Pr obL i j Pr obC i j Pr obT i j Pr obPr i j Pr obRe g i j 0 if Pr obN i jPr obP i j Pr obR i j Pr obL i j Pr obC i j Pr obT i j(3.1) This results in a single v alue for each edge as opposed to a v ector weight. 25 PAGE 37 3.2 Graph Spectral P artitioning W e form lar ge or ganized groups of primiti v es by searching for clusters of nodes that are loosely connected to the rest of the nodes in the scene structure graph (SSG). T o nd these node clusters, we cast the problem of grouping image primiti v es into a partitioning problem of the scene structure graph, S S GNE, where N is the set of nodes and E is the set of weighted edges. W e compute this partitioning recursi v ely by rst cutting the graph into tw o parts, N 1 and N 2 which are further bisected. The process continues until we ha v e partitions that are small enough. W e represent a graph bisection by a v ector v whose sign of the i th component ( v i ) represents the membership of node i in one or the other set; positi v e components indicate the nodes for one set and the ne gati v e components indicate membership in the other Denoting the weight of an edge between nodes i and j as w i j we cast our graph bisection problem as min i j w i jv iv j2 (3.2) subject to the constraints that (1) i v i0 and (2) i v 2 i1. The minimization of abo v e term will tend to assign similar weights ( v iv j ) to nodes ( i and j ) between which there is a lar ge link weight ( w i j ). And, the dif ference between v i and v j for nodes that are weakly connected will tend to be lar ge. The tw o constraints pre v ent the tri vial solution of v1 and v0 respecti v ely The rst constraint will force ne gati v e and positi v e v alues for v i with the con v enient consequence that the ne gati v e entries w ould correspond to one partition and the positi v e v alues w ould constitute the other partition. W e can mer ge the second constraint with the minimized term in 3.2 to recast the problem as min i j w i jv iv j2 i v 2 i such that v i0(3.3) 26 PAGE 38 The numerator of the minimized term can be rearranged as follo ws: i j w i jv iv j22 N i1 N j1 w i jnv 2 i2 N i1 N j1 w i j v i v j2 v T L G v (3.4) where L G (kno wn as the Laplacian matrix) is an NN sized array with the follo wing entries L Gij r jj i w i j if ijw i j if i j (3.5) for e v ery ij1 N and where w i j is the weight of edge between nodes i and j Thus the minimization process can be compactly e xpressed using v ector notation as min 2 v T L G v v T v such that v T 10 (3.6) where 1 is v ector with all entries equal to one. The solution of this minimization can be easily constructed from the Courant Fischer Minimax Theorem [34 page 179]. The follo wing corollary of the Courant Fischer theorem gi v es a v ariational characterization of the eigen v alues of a matrix. Cor ollary 1 (fr om Cour antF isher). Let A be a real symmetric matrix with eigen v alues, l 1l 2 l n and let the corresponding eigen v ectors be, v 1 v n Then l kmin vv 0vv 1 nv k1 v T A v v T v (3.7) In particular the rst eigen v ector v 1 minimizes the quadratic e xpression in Eq. 3.7 with the minimum v alue being l 1 The second eigen v ector pro vides a minimizing solution ortho gonal to the rst eigen v ector The third eigen v ector pro vides a solution that is orthogonal to both the rst and the second eigen v ectors, and so on. Eq 3.7 with k2 determines the solution for Eq. 3.6. This follo ws from the f act that the rst eigen v alue l 1 of L G is zero and v 111 1r. Thus the condition in Eq. 3.7 that the second eigen v ector is orthogonal to v 1 reduces to i v 2i 0, which 27 PAGE 39 is the constraint of the minimization. Thus, the minimum value for the e xpr ession in Eq. 3.6 is 2 l 2 and the solution is the second eig en vector of the Laplacian matrix. Gi v en the second eigen v ector of the Laplacian, the partition is obtained by assigning the positi v e entries to one set and the ne gati v e ones to the other This spectral partitioning technique w as rst introduced by Fiedler [21 ], later on reused by Pothen et al. [56 ], and are presently used to determine load assignment in parallel and distrib uted computing scenarios. The second eigen v alue ( l 2 ) of the Laplacian matrix of a graph is also commonly used as a measure of the connecti vity of the graph. It can also be sho wn [22 ] that if G NEbe a graph, and G 1 NE 1a subgraph, i.e. with the same nodes and a subset of the edges, so that G 1 is less connected than G then l 2L G 1 l 2L G, i.e. the algebraic connecti vity of G 1 is also less than or equal to the algebraic connecti vity of G It is interesting to note that, using the deri v ation in [29 ] one can sho w that the abo v e partitioning technique of fers us an approximate solution to the problem of minimizing the total link weight between the tw o partitions, N 1 and N 2 normalized by the size of the tw o sets C u tN 1N 2 1 N 1 1 N 2 and hence can be referred to as the aver a g e cut also called ratio cuts solution. In [79 ] Shi and Malik suggest spectral partitioning that approximate another cut measure, namely the normalized cut for image re gion se gmentation. The normalized cut minimizes the total link weight between the tw o partitions, N 1 and N 2 normalized by the association of the nodes within the tw o sets: C u tN 1N 2 1 AssocN 1 1 AssocN 2 and had been proposed earlier in the VLSI community [29 ]. In summary the graph spectral partitioning solution operates by recursi v ely partitioning each part. The stopping condition of the recursion in v olv es a threshold on the maximum partition strength, which measures ho w strong a cluster one w ants to break. The other stopping condition is the minimum cluster size be yond which we do not partition. W e learn these tw o parameters, along with the 6 priors for the relations (see page 24) and the 7 tolerances that specify the conditional probability specication of the Bayesian netw ork (see page 24), using the automata based learning algorithm discussed in the ne xt section. Comple xity of the gr ouping pr ocess: The spectral partitioning technique in v olv es the computation of eigen v ectors at each stage. Standard routines for eigen v ector computations are ON 3. 28 PAGE 40 At each stage of the recursion, the problem size reduces by tw o. Using the master theorem we can sho w that the comple xity of a size N partitioning problem is ON 3. Ho we v er in practice, the scene structure graph is sparse and we can signicantly impro v e the e x ecution speed by using sparse matrix eigen v alue computation routines. Besides, we do not need to compute all the eigenv ectors of a matrix; we need to compute only the second eigen v ector at each iteration. The second eigen v ector can be computed in ONusing the Lanczos algorithm thus, resulting in an o v erall partitioning comple xity of ON log N. The scene structure graph construction is ON 2. So, the o v erall comple xity of grouping is ON 2. At this point, it must be noted that recursi v e graph bipartitioning is not the only w ay in which a graph partitioning can be achie v ed. Graph partitioning can also be achie v ed by using a one shot kw ay partitioning. One of the techniques is the Multidimensional Scaling (MDS). The similar ity matrix can be con v erted to a dissimilarity matrix and then subjected to a eigen decomposition. Based on this eigen decomposition, the dissimilarity matrix is recentered with the ne w coordinates, essentially making the dissimilarity matrix positi v e semidenite. An eigen decomposition is per formed again on this ne w matrix to obtain and due to the nature of the matrix, atleast one of the eigen v alues turns out to be equal to zero. The number of nonzero positi v e eigen v alues, say k is the ne w dimensionality of the representation in the eigenspace. This corresponding eigen v ectors of the nonzero eigen v alues with proper normalization w ould gi v e us the k clusters. Note that although there is no wasta g e per se interms of the eigen computation, this method actually requires to per form the eigen computation twice for getting to correct number of clusters k More details of this approach can be found in this paper by Roth, et al [63 ] and the MDS book by Cox and Cox [16 ]. 3.3 Lear ning Gr ouping P arameters As with an y perceptual or ganization strate gy the spectral grouping algorithm also has parameters that need to be chosen. Specically we ha v e 15 parameters: 7 tolerance parameters used in the Bayesian netw ork to construct the scene structure graph (see page 24), 6 prior probabilities for the relations (see page 24), and 2 parameters (see page 28) used in spectral partitioning, namely minimum cluster size and maximum partition strength. These parameters are in addition to the three 29 PAGE 41 edge detection parameters of edge scale, s edge strength threshold, and edge length threshold, and the three parameters of the constant curv ature contour se gmentation algorithm. Thus, there are 21 parameters that ha v e to be chosen. The adv antage of this lar ge number of parameters is the e xibility of the grouping algorithm. The do wn side is that we need an ef fecti v e strate gy to choose these parameters. In this section we present a strate gy to learn these parameter gi v en a training set of images. This problem of automated parameter selection is also present in other computer vision conte xts. The usual practice is to choose such parameters by trial and error or using heuristics. Ho we v er when we netw ork a number of vision modules, the number of parameters gro w and manual choice becomes dif cult. The parameter choice problem has three characteristics that mak e it computationally e xpensi v e in practice. First, the search space is e xtremely lar ge. Let N p be the number of parameters to be chosen and r be the number of possible v alues for each parameter Then the total number of possible parameter combinations is r N p Second, for a netw ork of vision modules, tuning of parameters on a per module basis does not guarantee o v erall optimal choice. In practice, the choice of the parameters depend upon each other This is true not only between parameters of a particular module b ut also between parameters from dif ferent modules. Thus, not only can there be dependence between the choice of the scale, s and the thresholds of the edge detector b ut there can also be dependence between the scale, s and, say the distance tolerance f actor ( d t ol ) used in the grouping algorithm. W ith increase in edge scale, the contours mo v e a w ay from each other and edges become sparse, which can af fect the distance tolerance. Third, the optimal parameter choices are typically dependent on the image domain. The goal of the learning algorithm is to learn a set of parameter combinations that result in good performance on a class of images characterized by the training set. The learnt set of parameter is composed of combinations that result in good performance on each of the training images. F or the e xperiments presented in Section V we chose the size of learnt set of parameter combinations to be 100. Gi v en a ne w image, all 100 parameter combinations w ould be tried, which of course is f ar better than trying 10 21 combinations. Note that, we implicitly encode the dependence of parameter on images; there are good parameter combinations for each training image in the chosen 30 PAGE 42 set. Thus, if the ne w image has characteristics similar to one of the training images, a subset of learnt parameter combinations will result in good performance. There are v arious other possible strate gies for selecting the set of parameter combinations that result in good performance. Peng and Bhanu [54 ] emplo y a team of connectionist Bernoulli quasilinear units, with one unit associated with each v alue that each parameter can tak e. There are no interactions between the units. Sometimes, the optimal parameter selection process is cast as an optimization problem of an ener gy function [41 53 ] and traditional optimization techniques for parameter search such as hill climbing or gradient descent are emplo yed. W e attack the parameter estimation problem using a suite of learning automata (LA) in a N player stochastic game frame w ork. W e use the learning automata primarily for three reasons: 1. It has been pro v en that a team of learning automata will con v er ge to the global optimum [86 ] with the right learning rate. 2. The Nplayer game model can easily accommodate in the search process interactions between dif ferent parameter choices. It accounts for the dependence of one parameter choice on another parameters to guide the search process. 3. Although in this w ork we use the automata team as an of fline learning module, the team can also be used online. The automata team can incorporate ne w training data as the y arri v e. The y are capable of incremental learning. It is possible to use such a team of automata to continuously enhance the performance of a vision algorithm with each run. Ho we v er this aspect is still a part of future w ork. 3.3.1 What is a Lear ning A utomaton? A learning automaton (LA) is an algorithm that adapti v ely chooses from a set of possible actions on a random en vironment so as to maximize e xpected feedback. (The reader is referred to [48 ] for an e xcellent introduction to learning automata.) A learning automaton is coupled the en vironment, which in our case is the feature grouping algorithm along with the image set. In response to the chosen action, the en vironment generates a stochastic output b which is used by 31 PAGE 43 the learning automaton to decide on the ne xt action. The goal is to ultimately choose the action that results in the maximum e xpected b A learning automaton decides on the ne xt action by random sampling based on an action probability v ector p k p k 1 p k r, dened o v er the set of actions,a k 1 a k r. In the be ginning p k 1 p k r1r signifying that each action is equally lik ely On recei ving a feedback from the en vironment, this probability v ector is updated using a learning algorithm. The e xact nature of the updating algorithm v ary Ho we v er the common strate gy is to increase the probability of the action that generates a f a v orable b and decrease the probability of the action that generates an unf a v orable feedback. The change in the probabilities are such that, r i1 p k i1. W ith each iteration, the entrop y of the action probability v ector decreases, until the probability of the optimal action con v er ges to one. It can be sho wn that the LA will con v er ge if the statistics of the en vironment are stationary and the updating functions satisfy some minimal conditions. F or the grouping problem, this en vironmental stationarity assumption implies that the statistics of the image set are stationary or that the images are from one class. In the present scenario, we associate one learning automaton with one algorithm parameter The actions correspond to the v arious v alues of the parameter Ho we v er the learning automata do not operate independently of each other b ut the y w ork as a team to capture the dependence between the parameters. The probabilistic updating of each automaton tak es into account the actions of other automata. 3.3.2 Ho w Does a T eam of A utomata Operate? W e map the parameter estimation problem into an Nplayer game by associating with each parameter a player who has to choose from a range of parameter v alues (Fig. 3.3.2). W e quantize each parameter into r le v els ( r10 in our e xperiments) so that each player has a nite set of mo v es or plays to mak e or actions to choose from. Let us denote this choice set for the k th player by a k a k 1 a k r. Each player randomly mak es a mo v e, which forms part of the chosen parameter combination. This parameter combination e xtracts a re w ard from the en vironment, viz., from the grouping algorithm and the image set. T o generate a re w ard, the en vironment applies the 32 PAGE 44 Figure 3.5. T eam of learning automata for learning parameter combinations. Choices: {v ,..., v } 1 r P(v ),...,P(v ) r 1 v k v N v 1 Automaton 1 Automaton k Automaton N Environment Past Game History Action Probability Vector Updating Algorithm Learn Rate (for parameter N) (for parameter 1) m the team of automata at each iteration parameter set chosen by of the output with the Evaluate the performance Estimated Game Matrix grouping algorithm with the parameter combination on the training image set and computes the a v erage performance based on the measures discussed in Sec 3.3.6. This re w ard is returned to each player as feedback. Based on this common feedback, b each player chooses its ne xt mo v e. The objecti v e of each player is to choose an action so as to maximize this feedback o v er time. The updating strate gy maintains estimates of the e xpected feedback for e v ery combination of mo v es as a multidimensional matrix called the (estimated) game matrix D D i 1 ni N, whose dimension is rr rN times). Let the i k th possible action of the k th automaton be denoted by a k i k Then, D i 1 ni N stores the a v erage re w ard for the playa 1 i 1 a N i N. It might appear that we w ould require a signicant amount of memory to store this game matrix estimate. Ho we v er in practice this estimated game matrix is sparse and can be ef ciently stored. The number of nonzero entries will be at most equal to the number of iterations, which is typically f ar less than the maximum possible size of the matrix. Each player updates its action probability v ector based on this estimated game matrix. This estimated game matrix D is really an approximation of the underlying game matrix go verning the game D D i 1 ni N, which is composed of the e xpected feedbacks for e v ery combina33 PAGE 45 tion of mo v es,i 1 i N, D i 1 ni NEba 1a 1 i 1 a Na N i N(3.8) where a k i k is the i k th action of the k th automaton 1 W e, of course, do not kno w the game matrix a priori. Ho we v er if the statistics of this game matrix are stationary then we can design algorithms based on its estimates so that the game con v er ges to the global optimum. In our case, this en vironmental stationarity assumption implies that the statistics of the image set are stationary or that the images are from one class. Let E k j denote the maximum e xpected re w ard to the k th player when it plays a k j W e denote the v ector composed of these re w ards by E k E k jand term it, the individual game vector This indi vidual game v ector can be look ed upon as a projection of the game matrix. Thus, E k jmaxi ss kD i 1 ni k1ji k1 ni N (3.9) The termi ss kdenotes the set of possible combinations of mo v es of all the players e xcept for the k th player Let the globally optimal play for the the k th player be a k m k then maxi sD i 1 i Nmax j E k jE k m k for all k1 N (3.10) The termi sdenotes the set of possible combinations of mo v es of all the players. So, each player can reach the globally optimum point by choosing the plays according the indi vidual game v ector E k E k j. Of course, in practice we only ha v e an estimate of this v ector E k which is computed from the estimated game matrix Di 1 i N. E k jmaxi ss k Di 1 i k1ji k1 i N(3.11) Based on the en vironment feedback, b and this indi vidual game v ector estimate, E k j each automaton chooses its actions. Let us denote the iteration number by n W e will denote the v alue 1 W e use X to denote estimate of the random v ariable X. 34 PAGE 46 of a v ariable at the n th iteration by appending n as an ar gument to its symbol. Thus bndenotes the en vironmental feedback at the n th iteration and a kndenotes the play of the k th automaton at the n th iteration. Leta 1 i 1 a k i k a N i Nbe the play of the N automata at the n th iteration. F ollo wing Thathachar and Sastry [86 ], we update the action probability v ectors and other estimates, which essentially consists of tw o steps. First is the update of the probability v ector p kn1. W e increase the probabilities of the plays with estimates of the indi vidual maximum feedback, E k jnthat are lar ger than the feedback for the play chosen at the n th iteration E k i kn. W e decrease probabilities of the other plays so that the total sum of probabilities remains one. Mathematically p k jn1 p k jn E k i k E k jnp k jnifj i kand E k i k E k j p k jn E k j E k i kn 1p k jn p k i kn r1 ifj i kand E k i k E k j 1 j i k p k jn1ifji k(3.12) Recall, at start p k j0 1 r with all actions being equally lik ely The e xtent of change in the action probability v ector at each iteration depends on (i) the learning rate, (ii) the dif ference in the maximum feedback for an action and the action chosen at the n th iteration, and (iii) the probability of each action. F or cases where dif ferent actions result in drastically dif ferent feedbacks, the learning w ould be f aster than for a case where dif ferent actions result in almost similar feedbacks. Also, in the be ginning, when all the action probabilities are small, learning is slo w which allo ws the process to e xplore ne w actions. The second step consists of updating the indi vidual game v ector estimates, E k jn1. W e use tw o intermediate v ariables R and Z to rst compute the game matrix estimate, D j 1 nj Nn1, which, in turn, determines E k jn1. At each step we need to update only one entry of the game matrix, namely the entry corresponding to the play at the n th iteration, D i 1 i Nn1. 35 PAGE 47 Mathematically R i 1 i Nn1 R i 1 i Nn bnZ i 1 i Nn1 Z i 1 i Nn 1 D j 1 j Nn1 R j 1 j Nn1 Z j 1 j Nn1j k 1 r 1kN E k i kn1 max E k i kn D i 1 ni Nn1(3.13) At start, R i 1 i N0 0Z i 1 i N0 0 and E k i k0 0 for alli 1 i Nand k 3.3.3 Choice of the Lear n Rate The outlined learning algorithm is optimal and can be theoretically sho wn to con v er ge to the global optimum point with the right choice of learning parameter (see [48 86 ] for details). The rate of con v er gence is in v ersely related to If one chooses a v ery small then the learning algorithm is v ery slo w b ut the probability of nding the global optimum is high. A lar ge implies f aster con v er gence b ut does not guarantee a global optimal point. In our e xperiments, we start with 0 for the rst 100 iterations, to let the algorithm form a starting estimate of the game matrix and then for later iterations, is set to 0.1. Observ e that the ef fect of this learning rate on the learning process is some what dif ferent from that in other types of learning strate gies. Fixing to a constant does not imply a x ed ef fective learning rate. Recall from Eq. 3.12, the amount of change at each iteration is dependent on tw o other f actors: dif ferences in feedback for dif ferent actions and the action probabilities at that iteration. In f act, the amount of updating is small to w ards the be ginning iterations and gradually increases, e v en for constant 3.3.4 Stopping Conditions T raditionally the stopping conditions is cast in terms of the maximum action probability o v er all the players. Ideally the nal action probability v ector of player should ha v e a probability of one corresponding to the optimal action and zero for the others. F or the parameter selection case, we are not interested in the optimal parameter combination b ut in a set of good parameter combinations. So, we stop when the automata team does not nd any ne w par ameter combination that is better 36 PAGE 48 than the ones alr eady found for a number (typically 200) of consecutive iter ations. This condition is easy to detect: If no E k i kn1, for 1kN is updated at an iteration (Eq. 3.13) then it implies that no ne w parameter combination that is better than the pre vious ones has been found at that iteration. W e k eep track of the number of consecuti v e iterations for which this is true. 3.3.5 The Lear nt P arameter Combinations The set of good parameter combinations is selected from the sequence of actions, which we term as the tr ace or the run chosen by the team of learning automata. From this trace (or run), we choose the k best parameter combinations for each image. Remember that, at each iteration, we ha v e the indi vidual performance of the grouping algorithm with the chosen parameter combination on all training images. The k best parameter combinations constitute the learnt set of parameters. Note, this strate gy for learning good parameter sets is f aster than training on each image and then choosing the k best parameter combinations. W e e xplore the parameter space guided by the a v erage performance performance of the grouping algorithm o v er the input images b ut the nal selection of parameters is done based on indi vidual images. 3.3.6 Ho w is the P erf ormance F eedback, b Computed? The learning automata updates its action probability v ector based on feedback from the en vironment. The en vironment in the present case consists of an edge detector a contour se gmentor and a grouping algorithm, along with the training image set. At each iteration, the en vironment applies the grouping algorithms on the training image set, with parameters determined as per the LA actions. The a v erage performance forms the feedback to the LA team. The feedback measure, which captures the performance of the grouping algorithm on an image, is a combination of three terms. The rst term represents the e xpected speed of object recognition from the groups generated. The second term represents the condence in the recognition results. And the third term is dependent on the f alse alarm rate. These measures ha v e been proposed in [6 ] as a part of a set of v e performance measures for grouping modules. The rationale behind these measures are reproduced here for completeness. 37 PAGE 49 W e moti v ate the estimation of the speed of recognition from a constrained search point of vie w based on Grimson' s [27 ] comple xity analysis of object recognition using imperfect groups in the presence of clutter Let N G denote the number of features in a detected group, N O denote the number of model features, and N GO denote the number of group features that lie on the model. Assuming that all features are equally important, Grimson sho wed that the e xpected search, W t er m is essentially polynomial if we terminate when the number of matched features equal some predetermined threshold, t The e xact e xpression is gi v en by: N O N G N G N GOW t er mt N O N G N G N GO1k 2 N O2k 2 N G N O N GN Ok 21(3.14) The constant k is small and is typically equal to 02 P D where P is the total perimeter of the object and D is the image dimension. If N G N O50 D 2 P 2 then the search is essentially quartic. In the w orst case, PD and the requirement is N G50 N O a v ery liberal requirement. The term in Eq. 3.14, which depends on the quality of the group, is the ratio N G N GO This constitutes the rst part of performance measure. P t imeGO N GO N G (3.15) This measure ranges from zero to one and should be as lar ge as possible to minimize the amount of search. The quality of the terminated constrained search will be proportional to the threshold, t which is the number of model features e xplained by the group. Thus, t N O captures the model to group match quality Using this e xpression coupled with the f act that the termination threshold t is less the number of common features, N GO between the model and the group, we suggest the second part of the performance measure to be: P qualGO N GO N O (3.16) 38 PAGE 50 This measure ranges from zero to one and should be lar ge to ensure high condence recognition. Lar ge v alues of this measure will help discriminate between models, and thus boost the accurac y of recognition. The performance measures in Eqs. 3.15 and 3.16 need the a v ailability of object models, or at least, estimates of the numbers of features ( N O ) in the models. Since we are concerned with an edge based recognition strate gy the features of interest are edge pix els. Manual construction of 3D models is cumbersome and renders the performance analysis almost intractable for real domains. W e circumv ent this problem using manual outlines of object boundaries in each image. Gi v en an edge image, the collection of edge pix els close to the manual outline represents the perfect grouping of features in an image. F or 2D model based recognition scenarios, such as those that are vie w based, the number of edge points will pro vide a good estimate of the number of model features. F or 3D model based recognition scenarios, we e xpect the number of edge features in an image to be proportional to the actual number of 3D model features (on a v erage). Let the grouping algorithm generate N groups, G 1 G N for an image with M objects, O 1 O M The f alse alarm groups are dened to be groups that do not o v erlap with an y object of interest. F or each pair of group, G i and image object, O j that o v erlap we compute P t imeG iO jand P qualG iO j. Let the total number of o v erlaps be N o ver l a ps which can be an ywhere between 0 and N M W e then combine these measures as follo ws to generate the performance measure, b : b i j P t imeG iO j N o ver l a ps i j P qualG iO j N o ver l a ps 1N f al se N(3.17) where N f al se is the number of groups that do not o v erlap with an y object. Notice that the measure b is in v ersely related to the number of f alse alarms and that it ranges from zero to one, with one being the desired v alue. This product form of combination tends to assign equal importance to the time and quality of recognition. In addition, the used normalized summation form for each of the measures tends to, 1. penalize a group that is spread across tw o objects more than a group that o v erlaps with one object and the background, 39 PAGE 51 2. prefer lar ge groups o v er small groups, and item penalize groups of features that do not belong to an y object. As an illustration of this measure consider Fig 3.6(a), where we ha v e tw o objects, sho wn in dif ferent colors. In Fig 3.6(b) there are tw o possible groups of straight line features, denoted by again, dif ferent colors. Observ e that one of the features f 4 is grouped along with features that predominantly belong to object 2. There are also f alse alarms f 7 and f 8 which do not appear in the ground truth b ut are grouped along with Object 2. The common number of features between the objects and the groups are: N G 1O 12, N G 1O 20, N G 2O 11 and N G 2O 23. Let the number of features in the model objects be 4 each. The number of feature in the tw o groups are 2 and 4, respecti v ely Using these v alues, the o v erall performance is b05. On the other hand if the f 4 feature had been correctly grouped with object 1, then we ha v e N G 1O 13, N G 1O 20, N G 2O 10 and N G 2O 23. In this case we get b0539. Figure 3.6. Example used to illustrate performance measure computation. (a) Ground truth models, (b) Groups formed. (a) (b) 1 2 f 1 f 2 f 4 f 3 f 5 f 6 f 7 f 8 W e use the indi vidual edge pix els, instead of the line se gments as in Fig 3.6, are our primiti v e features. In the case of model based recognition, particularly those that are vie w based, the number of edge points will pro vide a good estimate of the number of model features as well as implicitly attach more signicance to longer se gments than shorter ones. The manual construction of models is cumbersome and not to say the least, time consuming, and this renders the performance analysis almost intractable for real domains. W e circumv ent this problem by using manual outlines of object features in each image. Gi v en an edge image, the collection of edge pix els close to the manual outline represents the perfect grouping of features in an image. 40 PAGE 52 Other combinations of the measures might be desirable based on the task at hand, ho we v er this combination suf ces for the illustration of the essential ideas. In Chapter 5, we e xperiment with a form without the f alse alarm term as sho wn in Eq 3.18; let us call it b al t The f alse alarm term will not be important if all objects that are in a scene are important, i.e, there are no or ganized f alse alarm form. Fig 3.7 sho ws the correlation between b and b al t for such a case. Optimizing b al t will tend to result in same answer as b for such case. If ho we v er only a fe w objects in an image are of interest and the background include signicant or ganized form then the b form used in this chapter w ould be appropriate. b al t i j P t imeG iO j N o ver l a ps i j P qualG iO j N o ver l a ps(3.18) Figure 3.7. Line t of the data points of b (xaxis) and b al t (yaxis). 0 0.1 0.2 0.3 0.4 0.5 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 b b alt T o gi v e an idea of the ef fect of f alse alarm term, in the e xample considered in the pre vious case in Fig 3.6, the v alues of the b al t w ould be 0.707 and 0.763 respecti v ely as the f alse alarms F 7 and F 8 ha v e no ef fect on the e xpression. 41 PAGE 53 3.4 Results and Analyses W e present thorough analyses and e v aluation of the performance of both the spectral grouping and the learning strate gies. First, we in v estigate the performance of the spectral grouping algorithm on a v ariety of real images. Second, we compare the performance of the particular spectral partitioning technique that we use for grouping with normalized cut based spectral graph partitioning suggested else where [79 ]. Third, we demonstrate the ability to learn to group features of a single object type e.g. airplanes, in the presence of dif ferent types of background clutter F ourth, we demonstrate the ability to learn to form groups that correspond to se ver al object types in a particular domain, e.g. aerial. 3.4.1 General P erf ormance of Spectral Gr ouping Fig. 3.8 sho ws some sample results of the spectral grouping algorithm on a v ariety of real images, namely oblique aerial vie ws, top aerial vie ws, and outdoor images of man made and natural objects. The left column sho ws the gray le v el images with the ground truth objects (manually) outlined in dif ferent colors. The middle column sho ws the input edge features that are to be grouped. And, the rightmost column sho ws the dif ferent detected groups. Each group is colored dif ferently Note ho w the algorithm is able to pick out salient features in the scene e v en in the presence of signicant image clutter In the multi story b uilding of Fig. 3.8(a), the grouping algorithm pick ed out the parallel structures corresponding to the windo ws. The image in Fig. 3.8(d) has signicant clutter b ut the algorithm is able to pick out the salient groups corresponding to the major objects in the scene. The algorithm is also able to resolv e the b uildings (circular and rectangular) sho wn in Fig. 3.8(g). Figs. 3.8(j)(o) demonstrate the applicability of the grouping algorithm to dif ferent domains and to images that are close vie ws of manmade and natural objects. Inspite of the lar ge image clutter in the mailbox image of Fig. 3.8(j) the groups corresponding to the major structures in the scene are found. In Fig. 3.8(m) the tiger is se gmented out from the scene (red group). The algorithm is able to handle curv ed edge se gments. 42 PAGE 54 Figure 3.8. The performance of the spectral grouping algorithm. The images in the middle column sho w the image edge features that are grouped. The right column sho ws the feature groupings found using the spectral method. Each cluster is sho wn in a single color (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) 43 PAGE 55 Figure 3.9. Normalized histogram of the parameter v alues that result in good performance o v er images lik e that sho wn in Fig. 3.8. Each bar plot corresponds to a parameter as labeled. The horizontal axis of each plot corresponds to the 10 dif ferent v alues for each parameter The highest bar in each plot is sho wn dark er than the rest. 0 0.5 1 0 0.2 0.4 CountDistance tol 0 0.5 1 0 0.2 0.4 CountOverlap tol 0 0.5 1 0 0.2 0.4 CountOrient tol 0 0.5 1 0 0.2 0.4 CountCenter Dist tol 0 0.5 1 0 0.2 0.4 CountProximity tol 0 0.5 1 0 0.2 0.4 CountCont Dist 0 0.5 1 0 0.2 0.4 CountRegion tol 0 0.5 1 0 0.2 0.4 CountParallel 0 0.5 1 0 0.2 0.4 CountProximity 0 0.5 1 0 0.2 0.4 CountLJunction 0 0.5 1 0 0.2 0.4 CountContinuity 0 0.5 1 0 0.2 0.4 CountTJunction 0 0.5 1 0 0.2 0.4 CountRegion P ar ameter values for good gr ouping: The results attest to the e xible nature of the grouping algorithm. It is capable of producing good results with comple x images e v en in the presence of signicant image clutter The v arious input parameters mak e the grouping algorithm v ery e xible. W e can tune the relati v e importance of dif ferent relations to suit a particular domain or object. T o study the ef fect of the choice of the grouping parameters on performance, we emplo yed the team of learning automata to learn a set of 100 parameter sets with the best performance, as measured by b (Eq. 3.17), on images such as those sho wn in Fig. 3.8. W e mak e the follo wing observ ations based on the distrib ution of these good parameter sets, which are plotted as normalized histograms in Fig. 3.9 44 PAGE 56 1. The priors that results in good performance for each of the salient 2ary relationships dif fer from image to image. Both, high and lo w prior choices result in good performance, depending on the image. 2. F or most of the images, a prior of 0.5 or more for re gion similarity result in superior perfor mance. This suggests that photometric attrib utes play a signicant role. This is in agreement with the Gestalt psychologists recent suggestion of the importance of common re gion as a grouping f actor [59 ]. Ho we v er so f ar photometric attrib utes ha v e not played a signicant part in e xtended feature grouping algorithms in computer vision. 3. Continuity between edge se gments is not al w ays important for gure ground se gmentation. F or some of the images (e.g Fig. 3.8(b)), lo w priors for this relation result in good perfor mance. W e should point out that we are referring to continuity at an e xtended edge se gment le v el and not at a pix el le v el. The learning process might choose lo w continuity prior e v en for images with seemingly long continuous edge features. This can happen if long continuous chains of edge pix els do not get fragmented as a result of the edge detection and the contour se gmentation processes and thus, the e xtracted edge se gments are long and e xhibit lo w continuity between them. 4. F or most of the images, lo w priors for proximity and T junctions result in good performance, which suggests that these relations might not be important for e v ery image. 3.4.2 Adaption of the Gr ouping Algorithm to Object T ypes Can the team of learning automata adapt the grouping process to se gre gate a particular object type from its natural background conte xts? T o study this, we selected the class of airplanes as the object type. The dif ferent types of airplanes could be in dif ferent background conte xts such as a tarmac or grass elds, and also at dif ferent orientations. W e selected 40 such aerial images, with dif ferent lighting conditions, vie wpoints, and scales. The left column in Fig. 3.11 sho w samples of these images. The images, which are of dif ferent sizes, are printed to occup y the same size on paper 45 PAGE 57 Figure 3.10. T ypical iteration traces of the learning automata team. The plot in (a) corresponds to learning on set S 1 of airplane images, and that in (b) corresponds to the set S 2 The v ertical axis correspond to the running a v erage feedback ( b ) o v er last 10 iterations. 0 500 1000 1500 2000 2500 3000 3500 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 On set S 1Running Average PerformanceIteration 0 500 1000 1500 2000 2500 3000 3500 4000 4500 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 IterationRunning Average PerformanceOn set S 2 (a) (b) W e separated the 40 images into tw o sets, S 1 and S 2 so that we could train on one set and test with the other In the training phase, the team of learning automata sampled the parameter space using, rst the image set S 1 and then S 2 such that the a v erage performance w as maximized. Fig. 3.10 sho ws tw o typical traces one for image set S 1 and the other for set S 2 Note ho w the a v erage feedback quickly con v er ges in about 3000 iterations. Compare this with the size of the search space, which is 10 21 ; there are 21 parameters, including the edge detector and contour se gmentation parameters, and each parameter can tak e 10 possible v alues. From the sampling trace, we composed a set of 100 good parameter sets using 5 best parameter combinations for each image in S 1 (or S 2 ). The learnt 100 parameter combinations for S 1 w as applied on S 2 and vicev ersa, to obtain the test performances. Thus for each image, we ha v e the best performance that can be achie v ed by training on the set containing it the train performance and the best performance using the parameters learnt on the set not including the image the test performance. The images in the third column of Fig. 3.11 sho w the best train performance on the images in the rst column. And, the images in the fourth column sho w the best test performance. The second column of images sho w the input edge features. Note the similarity of the train and test groups. This attests to the good performance of the learning algorithm. It is also interesting to 46 PAGE 58 Figure 3.11. Representati v e images from the set of 40 airplane images. The images in the second column sho w the edge features that are input of the grouping algorithm. Each image in the third column sho ws the output groups with the best parameters combination obtained by training on the set of 20 images that includes the corresponding image (on the left). Each image in the fourth column image sho ws the groups with the best parameter combination obtained by training on the set that does not include the corresponding gre y le v el image. (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) (p) (q) (r) (s) (t) 47 PAGE 59 Figure 3.12. (a) V ariation of train and test performance for dif ferent airplane images. The solid bars correspond to the best performance achie v ed on an image when it w as included in the training set. The shaded bars represent the best performance achie v ed on an image when it w as not included in the training set. (b) V ariation of a v erage training and testing performance with dif ferent runs. The solid bars correspond to the best train performance, as a v eraged o v er the set of images for one learning run. The shaded bars correspond to the best test performance, as a v eraged o v er the set of images for one learning run. (a) (b) T able 3.2. ANO V A of the learning performance on the aerial images of planes. Analysis of learning performance on the airplane images Source DF SS FV alue Pv alue Run 4 0.1798 14.43 0.0001 Signicant T rain/T est 1 0.0226 7.26 0.0078 Signicant Image 39 2.8792 23.70 0.0001 Signicant RunT rain/T est 4 0.0078 0.63 0.6430 Not Signicant RunImage 156 0.4780 0.98 0.5404 Not Signicant T rain/T estImage 39 0.1314 1.08 0.3586 Not Signicant note ho w the group corresponding to the airplane is separated from other features in an image. The background feature statistics v ary from image to image. In some images, the background is more or ganized than the other There are also strong and long edge features in the background that cannot be eliminated by simple edge thresholding. In f act the edge images sho wn in the second column are the best possible edges that can be obtained by changing the edge scale and thresholds, and e v en, the contour se gmentation parameters. Recall, that parameters of the edge detection and contour se gmentation are also part of the learning process. Statistical Analysis: W ith re gards to the performance of the learning algorithm, we consider the follo wing questions. (i) Does the observ ed train and test performance depend on the particular 48 PAGE 60 image? (ii) Does the observ ed train and test performance depend on dif ferent runs of the learning automata team? Fig. 3.12 (a) plots the best train and best test performance for each of the 40 images. The black bars correspond to the best train performances and the best test performances are denoted by shaded bars. F or each image the train and test performance are close to each other with a mean dif ference of 4%, ho we v er the maximum achie v able performance do v ary from image to image. Fig. 3.12 (b) plots the train and test performance, as a v eraged o v er the 40 images, for 5 dif ferent runs of the learning automata team. The solid bars correspond to a v erage train performance and the shaded bars correspond to a v erage test performance, o v er the image set. As e xpected, due to the stochastic nature of the learning algorithm, there is v ariation between dif ferent runs; the mean dif ference between a v erage training performance is 8%. Ho we v er the relati v e dif ference between o v erall test and train performance from run to run is small at around 2%. Although Fig. 3.12 gi v es us a visual feel for the rob ustness of the learning algorithm, it does not quantify the statistical signicances of the observ ations. F or statistical analysis we emplo yed the Analysis of V ariance (ANO V A) technique, which can assess the statistical signicance of the ef fect of dif ferent f actors, and their interactions, on the o v erall performance v ariation. The main f actors that can ef fect the grouping performance in our case are three: (i) the dif ferent learning runs, (ii) whether it is train or test performance, and (iii) the images. ANO V A can compute the signicance of the performance v ariations not only due to indi vidual f actors b ut also due to their interactions. Thus, we can answer questions such as, does the train & test performance interact with images or is the v ariation of train & test performance dependent on the images (T rain/T estImage)? Is the interaction of train & test performance and dif ferent learning runs signicant (RunT rain/T est)? Is performance on an image dependent on the particular learning run (RunImage)? T able 3.2 lists the ANO V A results. From the results, we can see that the v ariations due to the three main f actors are signicant, ho we v er their interactions are not signicant. Thus, the train and test performance dif ferences that we see in Fig. 3.12 are statistically signicant. This is not unusual. It is indeed rare that the train and test performance is the same for a learning algorithm. T ypically the test performance is e xpected to be lo wer than train performance. What is of interest is the e xtent of the dif ference, which in the present case is small about 4% dif ference. Similarly 49 PAGE 61 although the v ariation in performance with respect to the particular stochastic run is signicant, it is small the mean dif ference is about 8%. On the contrary the performance dif ference between image is not only signicant b ut is also not small the mean dif ference is about 30%. This attests to the v ariety of the image set it is not homogeneous. From T able 3.2, we also see that the interactions are not signicant, which implies that we can claim that (i) the observ ed train & test performance is not dependent on the images (T rain/T estImage), (ii) the observ ed train & test performance is not dependent on the stochastic sampling runs (RunT rain/T est), and (iii) the observ ed performance on an image is not dependent on the particular stochastic run (RunImage). Figure 3.13. Normalized histogram of the parameter v alues that result in good performance for se gmenting planes from aerial vie ws. Each bar plot corresponds to a parameter as labeled. The horizontal axis of each plot corresponds to the 10 dif ferent v alues for each parameter The highest bar in each plot is sho wn dark er than the rest. 0 0.5 1 0 0.5 CountDistance tol 0 0.5 1 0 0.5 CountOverlap tol 0 0.5 1 0 0.5 CountOrient tol 0 0.5 1 0 0.5 CountCenter Dist tol 0 0.5 1 0 0.5 CountProximity tol 0 0.5 1 0 0.5 CountCont Dist 0 0.5 1 0 0.5 CountRegion tol 0 0.5 1 0 0.5 CountParallel 0 0.5 1 0 0.5 CountProximity 0 0.5 1 0 0.5 CountLJunction 0 0.5 1 0 0.5 CountContinuity 0 0.5 1 0 0.5 CountTJunction 0 0.5 1 0 0.5 CountRegion The par ameter c hoices: Fig. 3.13 sho ws the normalized histograms of the parameter choices that compose the set of 100 learnt parameter sets. The plots with subscripted titles correspond to 50 PAGE 62 the grouping parameter tolerances and the other six plots correspond to the priors for the salient relationships, i.e. parallel, proximity Ljunction, Continuity T junction, and Re gion. The modes of the distrib utions are mark ed with dark bars. These distrib utions are less dispersed than the one for the set of general images in Fig. 3.9, which can be attrib uted to the similar nature of the airplane images. W e can also see that Ljunction, T junction, and re gion similarity play a greater role in se gmenting out the plane from the background than do parallelism, proximity or e v en continuity This is due to the f act that the later three relationships are sometimes present in the background to a lar ger e xtent than in the object itself, hence, are bad indicators for gureground se gmentation in this conte xt. 3.4.3 Adaptation of the Gr ouping Algorithm to a Domain Is it possible to adapt the grouping algorithm to a domain and not just to a particular object type, as we ha v e seen in the last subsection? Specically we in v estigate if it is possible to achie v e good learning performance on a general class of images such as aerial images. W e concentrate on a set of 20 images, some samples of which are sho wn in the left column of Fig. 3.14. The ground truth object (manual) outlines are sho wn o v erlaid on the gray le v el image. As before, we separated these 20 images into tw o sets, S 1 and S 2 W e trained on one set and tested with the other In the training phase, the team of learning automata sampled the parameter space using S 1 (and S 2 ) such that the a v erage performance w as maximized. From the learning trace, we composed a set of 100 good parameter sets using 10 best parameter combinations for each of the 10 images in S 1 (or S 2 ). The learnt 100 parameter combinations for S 1 w as applied on S 2 and vicev ersa, to obtain the best test performances on each image. Thus, for each image we ha v e the best performance that can be achie v ed by training on the set containing it the train performance and the best performance using the parameters learnt on the set not including the image the test performance. The images in the second column of Fig. 3.14 sho w train performance. And the images in the third column sho w the test performance. Note the reasonable similarity of the train and test groups. This attests to the good performance of the learning algorithm. 51 PAGE 63 Figure 3.14. Representati v e images from the set of 20 aerial images. Each image in the second column sho ws the output groups with the parameters obtained by training on the set of 20 images that includes the corresponding image (on the left). Each image in the third column image sho ws the groups with the parameter obtained by training on the set that does not include the corresponding gre y le v el image. (a) (b) (c) (d) (e) (f) (g) (h) (i) (j) (k) (l) (m) (n) (o) 52 PAGE 64 Figure 3.15. (a) V ariation of train and test performance on the aerial images. The solid bars correspond to the best performance achie v ed on an image when it w as included in the training set. The shaded bars to the best performance achie v ed on an image when it w as not included in the training set. (b) V ariation of training and testing performance with dif ferent runs. The solid bars correspond to the best train performance, as a v eraged o v er the set of images for one learning run. The shaded bars correspond to the best test performance, as a v eraged o v er the set of images for one learning run. (a) (b) Statistical Analysis: Fig. 3.15 (a) plots the best train and best test performance for each of the 20 images. The black bars correspond to train performances. And test performances are sho wn using shaded bars. F or each image, the test performance is lo wer than train performance. The mean dif ference between train and test performance is 14%. This is lar ger than the dif ferences observ ed for learning a single object type, which is to be e xpected since we are trying to generalize across a domain rather than across just a single object type. As before, the o v erall group quality dif fers from image to image; there is 44% v ariation across images. Fig. 3.12 (b) plots the train and test performance as a v eraged o v er the 20 images for 5 dif ferent runs of the learning automata team sampling. As before, the solid bars correspond to train performance and shaded bars correspond to test performance. The mean dif ference in train performance from run to run is about 3%. Ho we v er the dif ference between train and test performance, which is about 14%, does not seem to v ary with runs. T o quantify the statistical signicance of the observ ed dif ferences, we emplo yed the Analysis of V ariance (ANO V A) technique. The f actors that can gi v e rise to o v erall v ariations are the same as before, namely (i) learning runs, (ii) train or test case, (iii) images, and their interactions, (T rain/T estImage), (RunT rain/T est), and (RunImage). T able 3.3 lists the ANO V A results. 53 PAGE 65 T able 3.3. ANO V A of the learning performance on aerial images. Analysis of learning performance on the aerial images Source DF SS FV alue Pv alue Run 4 0.0024 1.19 0.3220 Not Signicant T rain/T est 1 0.2395 479.10 0.0001 Signicant Image 19 1.2790 134.66 0.0001 Signicant RunT rain/T est 4 0.0015 0.77 0.5492 Not Signicant RunImage 76 0.0396 1.04 0.4299 Not Signicant T rain/T estImage 19 0.0490 5.16 0.0001 Signicant From the results, we can see that (i) The train and test performance dif ference (of about 14% mean) that we see in Fig. 3.12 is statistically signicant. (ii) Images are a signicant source of v ariation, which attests to the v ariety of the image set. (iii) The performance does not v ary signicantly between dif ferent learning runs. This is desirable, b ut denitely not typical for learning based on stochastic samplings. W e belie v e this might be due to the underlying nature of the parameter space, which might ha v e lo w v ariations. (i v) The observ ed train & test performance is dependent on the images (T rain/T estImage). Thus the relati v e train and test performance v ary for image to image. F or some image, the dif ference between train and test is lo wer than others. This is due to the lar ger v ariety of the images being considered. (v) The observ ed relati v e train & test performance is not dependent on the learning runs (RunT rain/T est). (vi) The observ ed performance on an image is not dependent on the particular learning run (RunImage). The par ameter c hoices: Fig. 3.16 sho ws the normalized histograms of the parameter choices that compose the set of 100 learnt parameter sets. The plots with subscripted labels correspond to the grouping parameter tolerances and the other six correspond to the priors for the salient relationships, i.e. parallel, proximity Ljunction, Continuity T junction, and Re gion. The modes of the distrib utions are mark ed with dark bars. F or aerial images we see that e xcept for proximity all other relationships play an important role in se gmenting objects from background. Unlik e for the plane images, where parallelism and continuity did not consistently play an important role, here the y do play a signicant role. This dependence of grouping performance on the relati v e importance of the relationships is precisely the reason why we need a frame w ork that can adapt the grouping process. 54 PAGE 66 Figure 3.16. Normalized histogram of the parameter v alues that result in good grouping perfor mance for aerial images. Each bar plot corresponds to a parameter as labeled. The horizontal axis of each plot corresponds to the 10 dif ferent v alues for each parameter The highest bar in each plot is sho wn dark er than the rest. 0 0.5 1 0 0.5 CountDistance tol 0 0.5 1 0 0.5 CountOverlap tol 0 0.5 1 0 0.5 CountOrient tol 0 0.5 1 0 0.5 CountCenter Dist tol 0 0.5 1 0 0.5 CountProximity tol 0 0.5 1 0 0.5 CountCont Dist 0 0.5 1 0 0.5 CountRegion tol 0 0.5 1 0 0.5 CountParallel 0 0.5 1 0 0.5 CountProximity 0 0.5 1 0 0.5 CountLJunction 0 0.5 1 0 0.5 CountContinuity 0 0.5 1 0 0.5 CountTJunction 0 0.5 1 0 0.5 CountRegion 3.5 Summary W e presented a e xible, learnable, perceptual or ganization frame w ork based on graph partitioning. The graph spectral techniques f acilitate the easy consideration of global conte xt in the grouping process. And, a Nplayer automata frame w ork learns the grouping algorithm parameters. W e demonstrated the performance of the grouping algorithm on a v ariety of images. Among the interesting conclusions are: 1. It is possible to perform gureground se gmentation from a set of local salient relations such as parallelism, continuity perpendicularity proximity and re gion similarity each dened o v er a small number of primiti v es. 55 PAGE 67 2. The relati v e importance of the salient relations are dependent on the object or domain of interest. 3. Just geometric relationships are not suf cient for groupings. Photometric attrib utes such as re gion similarity play a signicant role in grouping e xtended lo wle v el features (see discussion associated with Figs. 3.9, 3.13, and 3.16). Extensi v e statistical analysis of the learning algorithm sho ws that it is possible to adapt grouping process to single object types (e.g. airplanes) with performances within 4% of the best possible performance. W e found that the observ ed learning performance on an image is not dependent on the learning run (or trace). Also, the observ ed train and test performance dif ferences are independent of the particular image. Furthermore, we demonstrated that it is also possible to learn grouping parameters for a specic image domain (e.g. aerial), with a mean train and test dif ference of 14%. In this case too, we found that the performance of the learning algorithm is independent of the learning run. Although we moti v ated the grouping problem from an object recognition point of vie w the grouping output can also be used for other vision tasks such as to focus attention in a scene. Similarly the learning algorithm can be used for other vision tasks such as performance characterization. T o compare tw o vision algorithms we need to rst decide on the best parameters on a per image basis or for a group of images. As the number of parameters increase, e xhausti v e search becomes computationally v ery e xpensi v e. The learning frame w ork in this paper of fers an ef cient alternati v e strate gy It can be used to nd the best parameter on a per image basis or for a group of images, just by controlling the images that are considered to be part of the en vironment. The parameter learning frame w ork can also be used to tune parameters of a network of vision modules, where the number of parameters is usually lar ge and there are strong interactions between dif ferent parameters. 56 PAGE 68 CHAPTER 4 THEORETICAL STUDIES ON GRAPH CUT MEASURES Graph based algorithms are ob viously popular in the vision community for ob vious reasons, most signicant reason being in its simplicity In particular partitioning schemes which in v olv e cut measures are becoming e xtremely popular There are man y cut measures which ha v e been proposed and used and it is not clear as to which of these measures are good, under what conditions are their performances considered optimal and ho w much time it tak es for the operations. In this Chapter we theoretically analyze the minimum, a v erage and the normalized cut measure discussed more in detail in Sections 4.2, 4.3, 4.4 respecti v ely In Chapter 5, we subject these measures to rigorous empirical and statistical analysis. The core conclusions of this study are that there is at best minimal dif ference between the cut measures applied to the graph partitioning problem from an object recognition perspecti v e. The ner details lik e the optimality conditions, dra wbacks will be re v ealed and is wea v ed in the discussions along the chapters. There is a rich body of prior w ork on the analysis of graph cuts. Most w orks consider the classic problem of minimum cut or the problem of graph bisection, which has origins in VLSI. The graph bisection problem in v olv es computing the minimum cut with constraints on the sizes of the partitions such as the oftused equal sized partitions requirement. The minimum cut problem is a well studied one that has also been e xtended to partitioning into k parts [65 ]. While the minimum cut problem without an y constraint on partition size has polynomial comple xity the problem of graph bisection with equal sized partitions is NPcomplete. In f act, W agner and W agner [89 ] sho wed that the problem of graph bisection with unequal partition sized constraints, with minimum partition size that is Oa N e, is also NPhard. Because of this computational challenge there is interest in the design of approximate algorithms or optimal algorithms for restricted graph classes. One of the earlier w orks to w ard this end is by Bui et al. [10 ], who described a polynomial time algorithm 57 PAGE 69 that computes the minimum bisection, optimally for d re gular random graphs and bisection width of b Later Boppana [5 ] sho wed that graph bisection could be computed, for almost all graphs, by maximizing the lar gest eigen v alue of the transformation of adjacenc y matrix of the graph. The analysis is based on a random graph model that in v olv es n v ertices with m edges, with a bisection width of b More recently Y u and Cheng [87 ] sho wed that the Boppana bisection could also be computed ef ciently using semidenite programming. F or k re gular graphs, Saab and Rao [64 ] sho wed that the greedy strate gy for nding the graph bisection could nd approximate solutions that are close to the optimal one. Our analysis is dif ferent in man y respects from traditional analyses of graph cuts. One fundamental dif ference is that we are concerned with analyzing the cut measures themselv es and not concerned with the optimality of particular algorithmic strate gies used to solv e the problem. F or e xample, we w ould lik e to kno w if recursi v ely minimizing the a v erage cut w ould result in correct groups. W ould it result in correct groups for all image statistics? W e are not interested, atleast in this section, in designing algorithms that nd the optimal a v erage cut of a gi v en graph instance. Finding an optimal solution to, say the a v erage cut problem is not useful if minimizing it does not result in groups that se gre gate objects from each other The second dif ference is that, in addition to the minimum cut measure, we consider the a v erage and normalized cut measures, which are relati v ely ne w The graph bisection and e v en the generalized k section v ersion, that ha v e been studied quite a bit is not appropriate in the vision conte xt since we do not apriori kno w the number of features from each object. The third ne w aspect is the modeling of the partition space ( f space), that we use; it is appropriate only in the conte xt of the object recognition problem. This restricted conte xt helps in managing the e xponential size of the partition space. Another direction in which we push the state of art, atleast in the conte xt of graph based grouping methods, is in that we consider weighted graphs. W e pro vide theoretical insight into the nature of the three partitioning measures in terms of the underlying image statistics. In particular we consider for what kinds of image statistics w ould optimizing a measure, irrespecti v e of the particular algorithm used, result in correct partitioning? Another question of interest is if the recursi v e bipartitioning strate gy can separate out groups 58 PAGE 70 Figure 4.1. P artitioning of a scene structure graph o v er features from multiple objects. (See te xt for e xplanation of notations.) Object 3 Object 1 Object 2 Object 4 Line Partitioning f f N f 2 N 2 4 3 N 3 4 (1 f ) (1 f ) (1 f ) (1 f ) N 1 f 1 1 2 3 4 N 1 N 2 N 3 N 4 corresponding to K objects from each other F or the analysis, we dra w from probability theory and the rich body of w ork on stochastic ordering of random v ariables [78 ]. Our major conclusion is that none of the three measures is guaranteed to result in the correct partitioning of K objects, in the strict stochastic order sense, for all image statistics. Qualitati v ely speaking, under v ery restricti v e conditions when the a v erage inter object feature af nity is v ery weak when compared to the a v erage intraobject feature af nity the minimum cut measure is optimal. The normalized cut measure is partially optimal, i.e. optimal o v er a restricted range of cut v alues, when inter object mean feature af nity is some what weak than the mean intraobject af nity And, the a v erage cut measure is also partially optimal, b ut with the least restricti v e requirement that the mean inter object af nity be just less than the mean intraobject af nity In our analysis, we assume that we ha v e a weighted scene structure graph, with positi v e v alued weights, which is true for most grouping strate gies. Fig. 4.1 depicts the notations that we use in this section and formally dene belo w Denition 1 Let 1. the number of objects be denoted by K And, the object themselv es be denoted by O 1 O K 2. the number of features of the i th object be denoted by N i 59 PAGE 71 3. the weights of a link, X i l m between the l t h and m t h features (or nodes) from the same i t h object be a Gamma random v ariable with PX i l mx GammaW 1 GWx W1 ex where Gxis the standard gamma function. Recall that Gamma random v ariables tak e on v alues between 0 and The mean and the v ariance are both W and the mode is W1. The parameter W is also kno wn as the shape parameter 4. the weights of a link, Y i j l m between the l t h feature from the i th object and the m t h feature from the j th object be a Gamma random v ariable with PY i j l my Gammaw 1 Gwy w1 ey W e can assume that the strength of the association between inter object features will be lo wer than that between intraobject features. Assumption 1 Both the mean and the mode of the weight distrib ution for links between features from dif ferent objects are lo wer than that for links between features from the same object, i.e., wW This, based on the theory presented in ne xt section, w ould imply that Y i j l ml t X i l m i.e. Y i j l m is stoc hastically less than X i l m In the 50 real images that we ha v e e xperimented with, the estimated W w ratio is around 7. Fig. 4.2 sho ws the t of the gamma model to distrib ution of link weights, both between inter and intraobject features. W e also use the oft assumed property that Assumption 2 The link weights are independent random v ariables. W e need notations to characterize a bipartition of a multiobject association graph. Instead of representing each possible partition indi vidually which is combinatorially e xplosi v e, we represent the possible bipartition classes as follo ws. 60 PAGE 72 Figure 4.2. Empirical t of the gamma probability density function to link weight distrib ution: Left for links between same object features, right for links between features from dif ferent objects. Denition 2 A bipartition result in tw o partitions ( S 1 and S 2 ) such that f i N i features from the i th object are in one partition ( S 1 ) and the rest of the1f iN i features are in the other partition ( S 2 ). A class of equi v alent bipartitions is characterized by the column v ector: f f 1 f KT Note that f i s are discrete numbers that range from 0 to 1 in increments of 1 N i F or recursi v e bipartitioning to e v entually result in the correct K w ay cut, f i should be 0 or 1, e xcluding the case when all f i s are 0 or all f i s are 1. In the conte xt of object recognition, all partitions in a bipartition classes are considered to be equi v alent in computational terms of ho w the y w ould impact object recognition. The underlying assumption in this conte xt is that all features from the same object are equally important. This assumption is not ne w and has been made by others when analyzing object recognition systems, such as by Clemens & Jacobs [13 ] in the conte xt of inde xing based recognition and by Grimson [27 ] for constrained search based recognition. It is tri vial to sho w that, Lemma 1 The partition classf 1 f Kis equi v alent to the partition class1f 1 1f K. Denition 3 Let 0 and 1 denote v ectors whose components are all 0 and 1 respecti v ely 61 PAGE 73 Denition 4 Let D denote the set of v ectors d each of whose components, d i is either 0 or 1, e xcluding the v ectors 0 and 1 The dimension of d is the same as that of f Denition 5 Let F denote the set of v ectors f whose i th components is either 0 or 1 N i e xcluding the v ector 0 Denition 6 Let Y denote the set of v ectors y whose i th components is either 1 or 11 N i e xcluding the v ector 1 The corner points on the boundary of the domain of possible partitioning classes is gi v en by the setDFY. That the elements of D will be boundary corner points is ob vious. The elements in the sets Y and F arise because we e xclude f0 and f1 which do not represent a partition. W e also mak e use of the f act that the possible v alues for f i are k N i for k01 N i P artitions represented by the elements in Y and F are undesirable partitions that separate just one feature of some object(s) from the others. The elements of D represent the set of desired partition classes, where none of the indi vidual objects are partitioned. Using the abo v e notations and assumptions, we ne xt establish the probability models for cut weight, association within each partition, and the number of features in each partition. Note that the sizes of each partition class f will not appear in the analysis since we are interested in algorithm independent characteristics. The use of partition class sizes w ould be rele v ant when analyzing a particular partitioning strate gy that mak es choices about dif ferent partition classes in a selected manner Lemma 2 The total link weights cut, S cf, for partitions in the class f is a Gamma random v ariable, Gammakf where kf f T P1fand P is a KK matrix with Pij r W N 2 i for ij w N i N j for i j (4.1) Pr oof: The cut links are of tw o kinds, inter object and intraobject links. The total weight of the cut inter object links is a sum of indi vidual Gamma v ariables specied by i j l m Y i j l m The number of the intraobject links is a random v ariable gi v en by i l m X i l m These sums are also Gamma 62 PAGE 74 distrib uted, which follo ws from the property that if X 1 is Gammak 1and X 2 is Gammak 2then X 1X 2 is Gammak 1k 2. Thus, i l m X i l m is GammaW i f i1f iN 2 iand i j l m Y i j l m is Gammaw i j f i1f jN i N j. The sum of these tw o sums will again be a Gamma random v ariable, whose shape parameter can be compactly e xpressed in the abo v e specied matrix notation by using P which is a K X K matrix as specied in the lemma. Lemma 3 The number of features in each partition can be e xpressed as Size S 1f f T N Size S 2f 1fT N (4.2) where N is a column v ector whose i th entry is N i Pr oof: This follo ws tri vially from Size S 1f i f i N i and Size S 2f i1f iN i Lemma 4 The sum of the link weights, S 1 and S 2 in the tw o partitions, S 1 and S 2 respecti v ely from the partition class, f are Gamma distrib uted random v ariable gi v en by S 1fis Gammaf T Pff T QS 2fis Gamma 1fT P 1f 1fT q(4.3) where Pis a K X K matrix dened as, P ij r 05 W N 2 i for ij w N i N j for i j (4.4) and q is a column v ector of size K with entries qi 05 W N i where i1 K Pr oof: W e use the f act that the sum of gamma random v ariables is also a gamma random v ariable whose shape parameter is the sum of the shape parameters of the constituent random v ariables. W ithin each part, separately counting the inter and intraobject links we ha v e S 1fis Gamma K i1 05 Wf 2 i N 2 if i N i i j w f i f j N i N jS 2fis Gamma K i1 05 W 1f i2 N 2 i 1f iN i i j w1f i 1f jN i N j(4.5) Using the matrix, Pwe can compactly e xpress the shape parameter as specied in this lemma. 63 PAGE 75 4.1 Comparing Random V ariables: Stochastic Orders In our analysis, we will deri v e e xpressions for the probability density functions describing the distrib utions of the cut measures as a function of the partition classes, denoted by Sf. W e will ha v e to compare these random v ariables to establish optimality Specically we will compare the random v ariables representing cut v alues for partition classes in D with those that are not D W e w ould lik e to kno w if SfD min f D Sf. The simplest w ay is to compare the mean v alues of tw o random v ariables, which has been our earlier strate gy [83 ]. As an e xample of this type of comparison, Figs. 4.3 and 4.4 sho w the nature of the v ariation of the e xpected v alues of the three cut measures as a function of all possible partitions,f 1f 2, of an image with tw o object, for tw o dif ferent image statistics. In each gure, the desired partition, namelyf 11f 20, is represented by the corner of the space, indicated by an arro w (In the follo wing sections we deri v e e xact e xpressions for these mean v alues. W e pre vie w these means just for illustration of the nature of the v ariation.) Notice that both the e xpected v alue of a v erage cut and the normalized cut measures seem to be well formed with a minimum at the right partition, while the e xpected minimum cut does not al w ays ha v e a minimum at the correct partition. Thus, the e xpected v alue of the a v erage and normalized cut measure for partitions that do not split object features is lo wer than those partitions that do split feature from the same object into tw o partitions. These visual observ ations re garding the mean v alues of the cuts ha v e been analytically pro v en in [83 ]. This comparison of means is some what informati v e, b ut does not establish strong results. Hence, we turn to the body of w ork in stoc hastic or der s that establishes denitions for compar ing random v ariables. In this section we present some of the k e y concepts and results that we will use in our subsequent analyses. F or an e xtensi v e e xposition of stochastic orders, we refer the reader to the e xcellent book by Shak ed and Shanthikumar [78 ]. An e xtremely strong w ay to dene ordering of random v ariables X and Y is to insist that e very realization of X be less than Y i.e. Pr obXY 1. Ho we v er this is an o v erly restricti v e denition, applicable only for v ery fe w real w orld situations. There are other w ays of dening stochastic ordering that are less strong, b ut widely applicable. Interestingly some of these orderings can be related to the strong ordering sense through proper transformations. Among the man y w ays 64 PAGE 76 Figure 4.3. The e xpected v alues of the three measures, (a) minimum cut, (b) a v erage cut, and (c) normalized cut measures plotted as a function of f 1 and f 2 for a scene with similar sized objects and with the strength of connection within objects being just twice the strength between objects, i.e. N 1N 2W2 and w1. of dening stochastic ordering between random v ariables, we pick tw o that are of particular interest to us. Denition 7 A random v ariable X is stoc hastically less than Y Xs t Y if PXt PYtfor e v ery t In other w ords, one random v ariable is less than another one in the stoc hastic sense when it is mor e lik ely for X than for Y to ha v e v alues less than any gi v en number Intuiti v ely this sense of stochastic order is appealing b ut is sometimes mathematically hard to establish, so we usually consider the follo wing sense. Denition 8 A random v ariable X is less than Y in the lik elihood or der sense, Xl r Y if PXt PYtis a decreasing in t o v er the union of the supports of X and Y In other w ords, lo wer v alues are more lik ely for X and for Y Although this sense of stochastic order seems to be unintuiti v e, it turns out that easier to establish this mathematically than the stochastic sense and is the stronger of the tw o senses. The follo wing properties are of particular interest to us. 65 PAGE 77 Figure 4.4. The e xpected v alues of the three measures, (a) minimum cut, (b) a v erage cut, and (c) normalized cut measures plotted as a function of f 1 and f 2 for a scene with similar sized objects and with the strength of connection within objects being 20 times the strength between objects, i.e. N 1N 2W20and w1. 1. If Xl r Y then Xs t Y In other w ords the lik elihood ratio sense is a stronger sense of stochastic order 2. Xs t Y if and only if there e xists tw o random v ariables X and Y dened on the same probability space such that Xs t X and Y s tY and P X Y 1. Notice the connection to the strongest sense of stochastic order 3. If Xl r Y then fX l r fYwhere f is an increasing function. 4. If Xl r Y then EfX r l r EfY rwhere f is an increasing function. Thus, statistical properties such as the mean and moments of X w ould be less than those for Y 5. LetX iY i i12 be independent pairs of random v ariables such that X il r Y i for i12. If X iY i are logconca v e densities, then X 1X 2l r Y 1Y 2 6. Let X be a random v ariable independent of Y ii1 N If Xs t Y i for i1 N then Xs t min i Y i This property is important to establish optimality in the presence of the minimizing operation in v olv ed in the graph partitioning operation. 66 PAGE 78 7. Let X and Y are gamma random v ariables with parametersk 1b 1andk 2b 2, respecti v ely If for e v ery t Lk 1b 1 Lk 2b 2, where Lkb kt b then Xl r Y This is easily established by taking a deri v ati v e of the ratio of the tw o pdfs and requiring that it be less than zero. In the analysis, the normalizing constants drop out and we are left with just the terms in v olving the random v ariable v alues. W e will refer to L as the lik elihood r atio or dering c har acteristic function If the inequality between the L s is true only for a range of t then the corresponding random v ariables are ordered only o v er that range. Note that for the ordering to be v alid o v er all t k 1 and b 1 should be less than k 2 and b 2 respecti v ely 8. Let X and Y are beta random v ariables 1 with parametersa 1b 1anda 2b 2, respecti v ely The lik elihood ratio ordering characteristic function for tw o beta v ariables is gi v en by Lab atba. If for e v ery t La 1b 1 La 2b 2then Xl r Y Note that for the ordering to be v alid o v er all t a 1 and b 1 should be less than a 2 and b 2 respecti v ely In Sections 4.2, 4.3 and 4.4, we will discuss the indi vidual cut measures, namely minimum cut, a v erage cut and normalized cut. 4.2 Minimum Cut The minimum cut measure has been signicantly useful in the eld of communications and computer architecture to name a fe w In perceptual or ganization, ho we v er it' s use w as not disco vered until W u and Leahy [96 ] came up with a strate gy for image se gmentation. There ha v e been man y v ariations of this cut measure mainly due to the f act that it has a tendenc y to split in such a manner that one of the partitions is of really small size. The result of this that there are unbalanced graph partitions. W e will discuss about the other cut measures which tries to correct this unbalanced partitioning in sections 4.3 and 4.4 A minimum cut based graph bipartition will try to minimize the total weight of cut links ( S c ), whose distrib ution as a function of the partition class are specied in Lemma 2. Ideally we w ould lik e each recursi v e bipartitioning not to split features from a single object into tw o groups. In other 1 If X is a beta distrib ution then the probability density function is gi v en by px 1 Babx a11ub1 for 0x1, where Babis the beta function. 67 PAGE 79 w ords, the minimum v alue of the cut should happen for partitions of the type fD or f id i In particular we are interested in the e v ent S cd min f D S cf. Using, property 6 from the pre vious section, it w ould suf ce to establish the conditions under which S cd S cf D. T o answer this question we rst establish a fe w lemmas about the beha vior of the shape parameter go v erning the distrib ution of S c i.e. kf f T P1f. Lemma 5 The function, kf f T P1f, is a conca v e function in the space of partition classes f Pr oof: Let f 1 and f 2 be tw o partition classes. A linear combination of these tw o v ectors is gi v en by a f 1 1af 2 where a 01. F or a conca v e function the follo wing relation must be true.a f 1 1af 2T P1a f 1 1af 2 a f 1 T P1f 1 1af 2 T P1f 2 f 2f 1T Pf 2f 1 0 (4.6) In deri ving the abo v e we ha v e used the f acts that (i) f 1 T Pf 2f 2 T Pf 1 since P is a symmetric matrix and (ii) a1a 0. The abo v e required condition can be rearranged into F T pF0 where F 1i N if 1i f 2i and pij r W for ij w for i j (4.7) This transformed condition w ould be true if p is a positi v e denite matrix, which it is. The matrix p can be e xpressed as the sum of a diagonal matrix, with positi v e diagonal entries Ww and a constant matrix, all of whose entries are w Since the diagonal matrix is positi v e denite and the constant matrix is positi v e semidenite, their sum is positi v e denite. Lemma 6 The partition classes, f that minimize the function kf f T P1fw ould also maximizeSize S 1f Size S 2f 68 PAGE 80 Pr oof: W e pro v e this by contradiction. Let us assume that f minimizes kfb ut the correspondingSize S 1fSize S 2f is not the maximum possible. W e sho w that we can deri v e a fwith a lo wer v alue of kf b ut with lar ger dif ference in sizes than this assumed minimum partition. Let us consider the k th component of f representing the partition of the features from the k th object. The function kfcan be e xpressed as the sum of tw o types of terms: the terms that include f k and those that do not. W e denote the aggre gate of the terms that do not include k th object by K W e can then e xpress kfas: kf Kp k k f k1f kN 2 k j k p1f jN jf k N k j k p f j N j 1f kN kKp k k f k1f kN 2 k j k p f j N j N k j k p 1f jN jf j N j f k N kKp k k f k1f kN 2 k j k p f j N j N kpN N f k N k (4.8) where we used Nand Nto denote j k1f jN j and j k f j N j respecti v ely Note that Nand Nrepresent the size of the tw o partitions e xcluding object k IfN N then choosing f k0 will gi v e us a lo wer v alue for kf, which also results in a partitioning v ector whoseSize S 2fSize S 1f is lar ger than our starting v ector IfN N then choosing f k1 will gi v e us a lo wer v alue for kf. In this case too, the resulting partitioning v ector w ould ha v eSize S 2f Size S 1f that is lar ger than for our starting v ector As a consequence of the abo v e tw o lemmas, we ha v e Cor ollary 2 The possible candidates for the minimum v alue of kfare those partitions in DF and Y that ha v e only one component that is dif ferent from all the others, i.e. the setnf i f i1 N i y i y i11 N i d i d i1n. This follo ws from the f acts that (i) the kfis a conca v e function, hence, the minimum will be achie v ed by partitions represented by the boundary of the domain, and (ii) these candidates represent the most disparate sized partitions on the boundary Theor em 1 The recursi v e partitioning strate gy based on the minimization of the cut v alues will result in correct groups, in the lik elihood ratio based stochastic order sense, if wW NN 1 where N 1minN 1 N K. 69 PAGE 81 Pr oof: The assertion w ould be true if S cfD, the cut v alue for correct partition classes, is less than S cf D. Since these random v ariables are gamma distrib uted (Theorem 2), S cfD l r S cf D, if kfD kf D, where the k s are the shape parameters of the corresponding densities. From Corollary 2 we kno w that the lo west v alue of k will be for partition classes in Y F or D kfF WwN i w NW(4.9) and kfD w N iNN i(4.10) The minimum v alue of the abo v e tw o k s will be for the case when N i is the minimum possible. The required condition, i.e., kfF kfDcan be transformed as follo wsWwNN i N i1 0 (4.11) The abo v e will be al w ays true if wW NN 1 where N 1minN 1 N K. Since the v ectors in Y represent the same partitions as the one in F we do not need an y other condition. 4.3 A v erage Cut W e rst establish that the a v erage cut v alues are general gamma distrib uted v ariables, whose parameters are functions of the partition classes. This, we follo w by an enunciation of the conditions under which minimizing this measure will lead to correct partitioning in the stochastic sense. W e nd that for graphs with partition widths, i.e. the minimum a v erage cut v alue, is less than w N minimizing the a v erage cut mak es sense. Lemma 7 The a v erage cut v alue S avgffor partition in the partition class f is a gamma random v ariable distrib uted according to Gammaf T P1f N f T N N T1f 70 PAGE 82 Pr oof: The a v erage cut measure of a partition is the total cut link weight normalized by the product of the sizes of the tw o partitions. S avgf N S cf Size S 1fSize S 2f(4.12) Thus, the a v erage cut measure for partitions in f is a gamma random v ariable that is scaled by product of the partition size. Recall that if X is Gammakthen bX is general Gammakbdistrib ution 2 The nal e xpression follo w from this observ ation and the e xpressions in Theorem 2 and Lemma 3. Lemma 8 The mean v alue of the a v eragecut cost, avgf, attains its minimum at fD and this minimum is a constant w N Pr oof: The matrix P can be e xpressed as sum of a diagonal and nondiagonal matrix: PP 1P 2 where P 1ij r WwN 2 i for ij 0 for i j (4.13) and P 2ij w N i N j (4.14) It is easy to see that P 2w NN T Thus, the e xpression for the e xpected a v erage cut is gi v en by avgf N iWwf i1f iN 2 i i j f i1f jN i N jw N (4.15) Using Assumption 1 we can easily see that the rst term on the right hand side is al w ays positi v e and will attain a minimum v alue of 0 whene v er all f i s are 0 or 1, in other w ords fD The denominator of the rst term is ne v er 0 for v alid partitions fD Thus, the minimum v alue of w N is attained by partition v ectors in D 2 The g ener al Gamma distrib ution gi v en by 1 Gkbx bk1 ex b for x0, where b is the scaling f actor 71 PAGE 83 Theor em 2 The a v erage cut measure S avgfwill result in a minimum, in the stochastic order sense o v er a r estricted r ang e0w N, for partitions in D This restricted range al w ays includes the mode and the mean of the optimal cut v alues. Pr oof: W e w ould lik e to establish that S avgfD s t min f D S avgf. Based on properties 1 and 6 from Section 4.1, we just need to sho w that S avgfD l r S avgf D. F or this to be true the lik elihood ratio ordering characteristic function, L should attain a minimum for fD Using property 7 from Section 4.1 and the e xpression deri v ed earlier in this section, the function L for a v erage cut v alues can be e xpressed as Lkf bf kf t bf f T P1f t N f T NN T1f f TPt NN T Nn 1f(4.16) Using a deri v ation similar to that in the proof for Lemma 5 we can sho w that the function Lkf bf is conca v e if pt N 1 1 T is positi v e denite, where p is as dened in Lemma 5. This condition results in the requirement that tw N Using this conca vity property we can say that Lf DYF Lf DYF for tw N W e can also infer that S avgf YF l r S avgfDusing the f act that the minimum of mean of the S avg is achie v ed for fD (Lemma 8). Thus, we ha v e S avgfD l r S avgf Db ut only for tw N Note that w N is also the mean v alue of cut v alue for optimal partitions, S avgfD(from Lemma 8). And, since for gamma random v ariables the mode is less than or equal to the mean, the assertion of the theorem follo ws. This theorem suggests that minimizing the a v erage cut is appropriate for graphs whose partition width, or the minimum a v erage cut v alue for the graph, is less than w N It can be easily established that Prtw N 05 using the f act the median of a gamma random v ariable is al w ays less than the mean v alue, which in this case is w N 72 PAGE 84 4.4 Normalized Cut W e sho w that the normalized cut is a sum of tw o beta distrib uted random v ariables, whose parameters are functions of the partition classes. Using this, we deri v e the condition that if W3 w and the partition width, i.e. the minimum v alue of the normalized cut v alue, is less than 0.5, minimizing the normalized cut measure mak es sense. Lemma 9 The normalized cut v alue S nr mffor partitions in the partition class f is a sum of tw o beta random v ariables, S 1 nr mf S 2 nr mf. The random v ariable S 1 nr mfis Beta ( kf af) and S 2 nr mfis Beta ( kf bf), where the parameters are kf f T P1faf f T Pff T q bf 1fT P 1f 1fT q Pr oof: The normalized cut measure of a partition is the total cut link weight normalized by the sum of the in v erses of the connectivities of the partitions. The connecti vity of each partition is the sum of the v alencies of the nodes in that partition, which can be e xpressed as the sum of the v alencies within each partition and the cut v alue. Using the notations from Lemmas 2 and 4.3, the normalized cut random v ariable can be e xpressed as S nr mf S cf S cf S 1f S cf S cf S 2f(4.17) W e ha v e also established in Lemmas 2 and 4.3 that S cS 1 and S 2 are all gamma random v ariables. The claims of this lemma follo w from this and the basic property that if X 1 and X 2 are independent gamma distrib uted random v ariables with scale parameter b and shape parameters k 1 and k 2 respecti v ely then X 1 X 1X 2 is a beta distrib uted random v ariable with parameters k 1 and k 2 73 PAGE 85 Cor ollary 3 The mean and mode of the normalized cut v alue S nr mffor partition in the partition class f are gi v en by nr mf kf kf af kf kf bfMode nr mf kf 1 kf af 2kf 1 kf bf 2 Note that for fD when we e xpect ka or b the mode will be less than the mean v alue. Theor em 3 The normalized cut measure S nr mfwill result in a minimum, in the stochastic order sense o v er a r estricted r ang e005, for partitions in D This restricted range will include the mode and the mean of the optimal cut v alues if the wW 6 N 11 NN 1 where N 1minN 1 N K. Pr oof: Property 5 from Section 4.1 establishes the conditions for sum of tw o beta random v ariables to be stochastically less than sum of tw o other beta v ariables. Thus, we need to sho w that S i nr mfD s t min f D S i nr mf, for i12. W e sk etch the proof for S 1 and the proof for S 2 is along similar lines. The required condition is that the lik elihood ratio ordering characteristic function, L should attain a minimum for fD Using property 8 from Section 4.1 and the e xpression deri v ed earlier in this section, the function L for normalized cut v alues can be e xpressed as Lkf af kf taf kf f T P1f tf T Pff T qf T P 1f f T P 1f t f TP1q(4.18) where, P ij r 1tW N 2 i for ij12 tw N i N j for i j F or the required condition, this function Lkf af should be conca v e, which is true only for t05 74 PAGE 86 Using this conca vity property we can say that Lf DYF s t Lf DYF for t05. W e can also infer that S 1 nr mf YF l r S 1 nr mfDby comparing the corresponding means. Thus, we ha v e S 1 nr mfD l r S 1 nr mf Db ut only for t05. W e can deri v e the same condition for S 2 nr m F or this range, t05, to include the mean and the mode normalized cut v alues of correct partitions,fD, we can sho w using the e xpression for the mean that 3 kf minaf bf If we assume that the smallest object size is N 1 then for fD minaf bf 05 W N 1N 11and the corresponding kf wfN 1NN 1. From these, we can deri v e the required condition to be wW 6 N 11 NN 1 ; the strength (as measured by the mean v alues) of the connections inside the smallest object should be more than 6 times lar ger than that between the objects. Thus, we ha v e results similar to, b ut some what more restricti v e conditions than that for the a v erage cut measure. Minimizing the normalized cut is appropriate for graphs whose partition width, or the minimum normalized cut v alue for the graph, is less than 0.5 and the withinobject connections are at least six times stronger than the betweenobject connections. Since the mean and the mode of the cut v alues for fD are included in the range005, we conjecture that the median is also included in this range, at least for fD and so Prt05 05. 4.5 Summary The theoretical analysis leads to the follo wing conclusions, 1. The a v erage cut measure S avgfwill result in a minimum, in the stochastic order sense o v er a r estricted r ang e0w N, for partitions in D This restricted range al w ays includes the mode and the mean of the optimal cut v alues. 2. The normalized cut measure S nr mfwill result in a minimum, in the stochastic order sense o v er a r estricted r ang e005, for partitions in D This restricted range will include the mode and the mean of the optimal cut v alues if the wW 6 N 11 NN 1 where N 1minN 1 N K. 75 PAGE 87 3. The recursi v e partitioning strate gy based on the minimization of the cut v alues will result in correct groups, in the lik elihood ratio based stochastic order sense, if wW NN 1 where N 1minN 1 N K. Our major conclusion is that optimization of none of the three measures is guaranteed to result in the correct partitioning of K objects, in the strict stochastic order sense, for all image statistics. Qualitati v ely speaking, under v ery restricti v e conditions when the a v erage inter object feature af nity is v ery weak when compared to the a v erage intraobject feature af nity the minimum cut measure is optimal. The a v erage cut measure is optimal for graphs whose partition width is less than the mode of distrib ution of all possible partition widths. The normalized cut measure is optimal for a more restricti v e subclass of graphs whose partition width is less than the mode of the partition width distrib utions and the strength of inter object links is six times less than the intraobject links. The prediction is that the optimization of each of the partition masure should result in groups of v arying quality While the minimization of a v erage and normalized cut measures should, for at least more than 50% of the time, be able to corretcly group features from multiple objects, the optimization of the minimum cut measure will not be able to do so under all image statistics conditions. 76 PAGE 88 CHAPTER 5 EMPIRICAL EV ALU A TION OF THE CUT MEASURES As we ha v e seen, partitioning of a graph representation, dened o v er lo wle v el image features based on Gestalt inspired relations, is an ef fecti v e strate gy for forming coherent lar ge perceptual groups in an image. The usual practice, mainly moti v ated by ef cienc y considerations, is to approximate the general Kw ay partitioning solution by recursi v e bipartitioning, where at each step the graph is brok en into tw o parts based on a partitioning measure. W e concentrate on three such measures, namely the minimum cut [96 ], a v erage cut [73 ] and normalized cut [79 ]. The minimum cut partition seeks to minimize the total link weight cut. The a v erage cut measure is proportional to the total link weight cut, normalized by the sizes of the partitions. The normalizing f actor for the normalized cut measure is the product of the total connecti vity (v alenc y) of the nodes in each partition. The questions we ask in this section are: Do the nature of the cut measures really matter if the y are part of a grouping strate gy whose performance can be optimized? Are the quality of the groups signicantly dif ferent for each cut measure? Are there image classes on which one cut measure is better than the other? Ho w do the cut measures perform if constrained to operate on the same graph? W e empirically e v aluate the groups produced by graph partitioning, based on the three measures, gi v en the task of grouping e xtended edge se gments. W e also e xamine whether the perfor mance of the groupingbyparti tio nin g strate gy depends on the image class. Our conclusions are: 1. When considering o v erall performances, the dif ferences in performances of the three cut measures are v ery minimal, at best. 2. When considering a v erage performances within an image class statistically signicant differences sho w up, b ut are v ery small. 77 PAGE 89 3. Natural objects in indoor settings seems to be the hardest domain for grouping. 4. A v erage cut measure seems to of fer the best compromise between performance, stability of performance, and speed, o v er the other tw o measures. The e xposition here is or ganized as follo ws. In the ne xt section, we re vie w the rele v ant w ork, mainly the graph based approaches to perceptual or ganization and other related performance comparison w ork. The follo wing sections considers each important aspect of empirical e v aluation, one at a time. 1. The performance measure used for e v aluation is sk etched out in Section 3.2. 2. The image set along with the creation of ground truth data, which is used to compute perfor mances, is discussed in Section 5.1. 3. In Section 5.4, we present an analysis of the actual performance of the cut measures on real images. 4. W e touch on time issues in Section 5.5. 5. W e discuss the implications of our ndings in Section 5.6. 5.1 Image Set W e use a database of 100 images di vided equally into 5 image classes: natural objects in indoor backgrounds, natural objects in outdoor backgrounds, manmade objects in indoor backgrounds, manmade object in outdoor backgrounds and aerial image class 1 Each of these images is associated with manual ground truth outlines of the objects of interest. There are 20 images totally in each class, out of which 10 are used for training and the other 10 for testing. The time tak en to ground truth each image is approximately 3060 minutes per image, depending on their size and also the comple xity of the objects in the image. This protocol for ground truth generated which is discussed ne xt, w as follo wed to the maximum e xtent possible. 1 The Uni v ersity of California, Berk ele y has a dataset of 12,000 manually se gmented dataset from o v er a set of 1000 Corel dataset images. These are mostly suitable at the signal le v el grouping algorithm and are a v ailable at, http://www .cs.berk ele y .edu/pr ojects/vision/gr ouping/se gbenc h/ 78 PAGE 90 5.2 Gr ound T ruth Cr eation F or each object in the image, we create ground truth outlines based on the edges that are visible as well as perceptual edges that should be there (small gaps, etc). The criterion for selecting the edges is that one must be able to recognize the object from the ground truth outlines. In our pre vious w ork [82 ], we only mark ed the edges that were visible in the image. Here, in addition to visible edges, we also mark edges that were not strong/unseen b ut with prior kno wledge of the object shape we can infer that an edge e xists. From an object recognition vie wpoint, this is essential so that the features in the ground truth is a good model of the object. In real images, we ha v e the problem of te xtures within the boundaries of an object and in the background. The te xture within the object is mark ed if it forms an important part for the object model. F or e xample, the zebra stripes if not mark ed, could ha v e a similar model as that of a horse. Mostly structural te xtures edges are mark ed; statistical te xtures are ignored. Figs 5.1(a), (b), (c) and (d) sho w samples of the natural objects in indoor background. Figs 5.2(a), (b), (c) and (d) sho w samples of the natural objects in outdoor background. Figs 5.3(a), (b), (c) and (d) sho w samples of the manmade objects in indoor surroundings. Figs 5.4(a), (b), (c) and (d) sho w samples of the manmade objects in outdoor surroundings. Samples of aerial objects are sho wn in Figs 5.5(a), (b), (c) and (d). 79 PAGE 91 Figure 5.1. Sample ground truthed images from Natur al indoor image set. (a) (b) (c) (d) 80 PAGE 92 Figure 5.2. Sample ground truthed images from Natur al outdoor image set. (a) (b) (c) (d) 81 PAGE 93 Figure 5.3. Sample ground truthed images from Manmade indoor image set. (a) (b) (c) (d) 82 PAGE 94 Figure 5.4. Sample ground truthed images from Manmade outdoor image set. (a) (b) (c) (d) Figure 5.5. Sample ground truthed images from Aerial image set. (a) (b) (c) (d) 83 PAGE 95 5.3 P arameter Selection W e use the team of Learning Automata frame w ork to statistically search for good parameters. F or each cut measure, we run the learning automata team on each class of image as a whole. F or each parameter combination chosen by the LA team at each iteration, the a v erage performance o v er the whole image set if the feedback. The parameters are changed in such a w ay that this a v erage performance o v er the image class is maximized. In other w ords, we ha v e 10 images in each image class and the parameters are changed such that the performance from each image is computed and a v eraged and it is this a v erage v alue that the team of learning automata maximizes. From these learning traces, we select 100 best parameter combinations. 5.4 Analyses and Discussions The analyzed data consists of the quantied performances ( b al t ) of each of the three cutbased grouping strate gies on 50 test images, from 5 v es classes, using 100 parameter sets found by training on a dif ferent set of 50 images. W e can or ganize the performances in the matrix form sho wn in T able 5.1. F or each image k in image class c there are 100 computed performances corresponding to the 100 trained parameter sets (PS). W e use b al tI c kito denote the performance on image I k from class c using the i t h parameter combination PS i F or our image data set, k can be an inte ger between 1 and 10, c can be one of the 5 image classes (NOI, NOO, MOI, MOO, & Aerial) and i can be an inte ger between 1 and 100. Thus for each image class, we ha v e 1000 performance numbers, gi ving us a total of 5000 performance numbers to be analyzed for each cut measure. 5.4.1 Summary Statistics W e compute tw o kinds of summary statistics from the ra w performance numbers, namely per image statistics and per class statistics. Referring to T able 5.1, the per image statistics are the ro wwise summary of the performances, in terms of the maximum, secondmaximum, thirdmaximum, etc, v alues. The L th best (maximum) performance on image I c k denoted by b max L al tI c k, is computed 84 PAGE 96 T able 5.1. The or ganization of the ra w performance indices, their summary statistics studied for each image class, and the notations used to refer to them. T rain or T est Images T rained P arameter Sets Per Image Statistics (Ro w wise) Class Image PS 1 . PS i . PS 100 Maximum L th Max Image 1 b al tI c 11. . . . . . Class c Image k b al tI c k1. . b al tI c ki. . b al tI c k100 b max 1 al tI c k. . b max L al tI c k . Image 10 b al tI c k1. . . . . Per Class Statistics b M ean al tc1. . b M ean al tci. . b max 1 al tc. . b max L al tc(Column wise) . as b max LI c k L th 100 max i1 bI c ki(5.1) The per class statistics are the columnwise summary of the performances in the T able 5.1. Thus, b M ean al tci, indicates the a v erage of performance o v er the images in class c with the i th parameter set. b M eanci 1 10 10 k1 bI c ki(5.2) W e capture the o v erall class performances by the a v erage, maximum, secondmaximum, thirdmaximum, etc, of these per class statistics. Thus, b M eanc 1 100 100 i1 b M eanci(5.3) and b max Lc L th 100 max i1 b M eanci(5.4) The questions belo w are addresses in the follo wing subsections. 1. Ho w do the train and test performances dif fer? Ho w much is the e xpected drop? 2. What are the per class performance dif ferences of the three cut measures? Are there image classes on which one cut measure is better than the other? 85 PAGE 97 3. Ho w do the per image performances of the three cut measures compare? Do the cut measures dif fer in terms of the per image v ariation in performances? 4. What happens if we use the trained parameters for one cut measure on another one? Note that in this case the cut measures are forced to operate on the same underlying graph. 5.4.2 Ov erall T rain and T est P erf ormance Fig 5.6 sho ws the spread of the per class a v erage performances with the dif ferent parameter combinations b M ean al tcis of T able 5.1 o v er all the 5 image classes in the training set and the testing sets. Each box plot encapsulates the 500 summary statistics; 100 summary performances for each of the 5 classes. In this paper we will use box plots such as these to visualize the distrib ution of random quantities. The box stretches from the lo wer quartile to the upper quartile, with a line inside it indicating the median v alue. The whisk ers e xtending from the box sho w the e xtent of the rest of the data and an y outliers are sho wn with plus marks. As e xpected, there is some drop in testing performance for all the three cut measures. Ho we v er the training and the testing performances for all the three cut measures seem to be qualitati v ely similar: the dif ferences, if an y are pretty small. Although these performances capture gross classwise performances, the o v erall character of the observ ation re garding the three cut measures hold up on more detailed analysis, which follo w in the subsequent sections, i.e. the dif fer ences in performances of the thr ee cut measur es ar e very minimal, at best. 5.4.3 Classwise P erf ormances Ho w the do the test performances of the three cut measures compare for each image class? Are there image classes for which one cut measure performs better than another? T able 5.2 lists the mean of the per class performances, or using the notation of T able 5.1, b M ean al tcs, along with their 95% condence interv als, for each cut type. W e observ e that for the natural objects indoor and manmade objects outdoor classes, the mean performance of the normalized cut is mar ginally better than the other tw o cuts. F or aerial images, a v erage cut seems to be better While for natural objects outdoor mincut seems to be better 86 PAGE 98 Figure 5.6. Box plot of per class a v erage performances ( b M ean al tci) for each of trained parameter combinations on the set of 50 images, in the training and testing sets, for each of the three cuts. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 BetaOver all images Train Test Avg Nrm Min Avg Nrm Min F or images of manmade objects indoors, all three cuts seem to ha v e similar performances. That these observ ations are statistically signicant is supported by Analysis of V ariance (ANO V A) of the b M ean al tci' s for each class, using the cut and the parameter sets as the sources of v ariation. T able 5.3 sho ws the ANO V A results for each of the v e image classes along with Pv alues (from ANO V A of data from pairs of cut types) to be used for inferring signicances of pairwise comparisons of the cuts. Lo wer Pv alues denote more signicance. Generally Pv alues of less than 0.05 is considered signicant for a 95% condence le v el. F or pairwise comparisons, of which there are three in each case, we will ha v e to consider a Pv alue threshold of 0.02 (005 3 ) to establish an o v erall condence of 95%. W e note that e xcept for manmade objects in indoor settings, all the dif ferences in the mean o v erall performance v alues for each class noted in T able 5.2, ho we v er small, are statistically signicant. W e also observ e from T able 5.2 that o v erall performance, irrespecti v e the cut types, is the best for natural objects in outdoor settings, follo wed by manmade outdoors, manmade indoors, aerial images, and natural objects indoors, in that order 87 PAGE 99 T able 5.2. Mean v alues with 95% condence le v el of the classmean performance indices, b M ean al tc, for each class of images considered for each cut type. Image class Cut Lo wer CL Upper CL Mean A v erage 0.310 0.316 0.313 1 Normalized 0.337 0.340 0.339 Natural Objects Indoors (NOI) Minimum 0.331 0.337 0.334 A v erage 0.449 0.460 0.455 2 Normalized 0.445 0.455 0.450 Natural Objects Outdoors (NOO) Minimum 0.454 0.465 0.459 A v erage 0.370 0.381 0.375 3 Normalized 0.373 0.380 0.376 Manmade Objects Indoors (MOI) Minimum 0.373 0.380 0.376 A v erage 0.388 0.395 0.391 4 Normalized 0.395 0.403 0.399 Manmade Objects Outdoors (MOO) Minimum 0.390 0.396 0.393 A v erage 0.345 0.357 0.351 5 Normalized 0.329 0.336 0.332 (Aerial) Minimum 0.318 0.329 0.324 5.4.4 P er image P erf ormance In this section, we look at performance on a per image basis, instead of looking at a v erage performance o v er the whole image class. W e rst mak e visual assessments of the results and then proceed to statistical analyses. V isual Assessment: F or each image, we pick the best performance out of the 100 trained parameter combinations, i.e b max 1 al tI c kusing the notations in T able 5.1. W e caution the reader that visual assessment of the groups might not agree with the computed performance measure, b al t W e ha v e to k eep in mind that our performance measure (i) penalizes groups that straddles tw o objects more than groups that include part of an object and the background, and (ii) penalizes small groups. Fig 5.7 sho ws an image on which all cut measures perform equally well: all the cut algorithms are able to separate out the zebra outline along with the strips. Figs 5.85.10 sho w the performances on some images on which one cut measure dif fer the most from the other tw o. In Fig 5.8 (a), the a v erage cut performs the best. Observ e that most of the background clutter is remo v ed and there are more feature groups detected in the case of a v erage measure compared to the other tw o. The normalized cut measure performs the best in Fig 5.9 (a) 88 PAGE 100 T able 5.3. ANO V A of the classa v erage performances ( b M ean al tci) for (a) natural objects indoors, (c) natural objects outdoors, (e) manmade objects indoor (g) manmade objects outdoor and (i) aerial images. The Pv alues for ANO V A of data from just pairs of cut are sho wn for the corresponding classes in (b), (d), (f), (h), and (j). (a) (b) Source DF SS Fv alue Pv alue Cut 2 0.0379 317.55 0.0001 P arameters 99 0.0432 7.32 0.0001 Cuts Compared Pv alue A vg, Nrm 0.0001 Nrm, Min 0.0001 A vg, Min 0.0001 (c) (d) Source DF SS Fv alue Pv alue Cut 2 0.0043 105.98 0.0001 P arameters 99 0.2220 108.51 0.0001 Cuts Compared Pv alue A vg, Nrm 0.0001 Nrm, Min 0.0001 A vg, Min 0.0001 (e) (f) Source DF SS Fv alue Pv alue Cut 2 5.78*105 0.37 0.6903 P arameters 99 0.1184 15.38 0.0001 Cuts Compared Pv alue A vg, Nrm 0.4859 Nrm, Min 0.5791 A vg, Min 0.6061 (g) (h) Source DF SS Fv alue Pv alue Cut 2 0.0032 96.55 0.0001 P arameters 99 0.0803 48.02 0.0001 Cuts Compared Pv alue A vg, Nrm 0.0001 Nrm, Min 0.0001 A vg, Min 0.0001 (i) (j) Source DF SS Fv alue Pv alue Cut 2 0.0385 399.57 0.0001 P arameters 99 0.1847 38.72 0.0001 Cuts Compared Pv alue A vg, Nrm 0.0001 Nrm, Min 0.0001 A vg, Min 0.0001 and the minimum cut measure performs the best in Fig 5.10 (a). Ho we v er as we ha v e seen earlier this dependence of performance of each cut type on the image is not as e xtreme as presented in these sample images. The dif ferences are mar ginal at best. This reinforces the ar gument against the reliance on just visual assessment of results on a fe w images. Ar e the best performances of the cut measur es on a per ima g e basis statistically dif fer ent? Fig 5.11 sho ws the spread of the best performance, b max 1 al tI c k, on the test set. W e observ e that the minimum cut performances are mar ginally higher than the other tw o cut measures. T able 5.4(a) 89 PAGE 101 Figure 5.7. Images on which all three cuts perform the best. (a) Ground T ruth (b) A v erage ( b al t087) (c) Normalized ( b al t087) (d) Minimum ( b al t088) Figure 5.8. Image on which the performance of a v erage cut is the most dif ferent from the other (a) Ground T ruth (b) A v erage ( b al t058) (c) Normalized ( b al t050) (d) Minimum ( b al t052) 90 PAGE 102 sho ws the ANO V A on the best performance on all 50 test images and for all cut measures, b max 1 al tI c ks It sho ws that the cut type is a signicant source of v ariation. The Pv alues from ANO V A for pair wise comparison of the cut types are sho wn in T able 5.4(b). W e see that the a v erage and normalized cut performances are not statistically dif ferent, while the dif ference of mincut performance from the other tw o, albeit small, is statistically dif ferent. W e also note here that we observ ed the same pattern for the second, third, and fourthbest performances. From fthbest performance onw ards, the dif ferences between the cuts stopped being statistically signicant. T able 5.4. (a) ANO V A of the best per image performance ( b max 1 al tI c k) on all 50 test images, for the cut types. (b) Pv alues for the ANO V A of the best per image performances, considering tw o cuts at a time, with images as the second f actor (a) (b) Source DF SS Fv alue Pv alue Cut 2 0.0188 5.15 0.0074 Image 49 3.6305 40.43 0.0001 Cuts Compared Pv alue A v erage, Normalized 0.5640 Normalized, Minimum 0.0091 A v erage, Minimum 0.0308 Is the variation of performance for eac h cut type signicantly dif fer ent fr om eac h other? Ho w sensiti v e is the performance of the cut measures on parameter choice? Is this sensiti vity dif ferent for the three cut types? Fig. 5.12 sho ws the plot of the spread of the dif ference between the best and the w orst performances,b max 1 al tI c k b max 100 al tI c k for the three cut measures. These dif ference should ha v e lo w v alues to be considered good. W e can clearly mak e out that the mincut has the lar gest dif ference between best and w orst performance. The dif ferences for the a v erage cut and the normalized cut are similarly spread. T able 5.5(a) sho ws the ANO V A results on the dif ference between the best and w orst performances. W e see that the v ariation of dif ference among the cuts are statistically signicant. P airwise ANO V A, whose Pv alues are listed in T able 5.5(b), sho w that dif ferences between the a v erage cut and normalized cuts are not signicant, while the dif ferences with the mincut are signicant. Thus, we can infer that the v ariation in performances of the mincut is more than that of the a v erage and normalized cuts, both of which ha v e similar range of performance v ariation. 91 PAGE 103 T able 5.5. (a) ANO V A of the range of performances b max 1 al tI c k b max 100 al tI c k, for the three cut types. (b) Pv alues for the ANO V A of the range of performances, considering tw o cuts at a time, with images as the second f actor (a) (b) Source DF SS Fv alue Pv alue Cut 2 0.118 13.68 0.0001 Image 49 2.431 11.5 0.0001 Cuts Compared Pv alue A v erage, Normalized 0.4220 Normalized, Minimum 0.0001 A v erage, Minimum 0.0011 5.4.5 Cr oss Compatibility of the P arameter Sets What happens if we use the trained parameters from one cut for the other tw o? As one will recall, the structure of the grouping algorithm is e xactly the same for all the three cuts: graphs are constructed o v er same kinds of primiti v es and the links are quantied in the same manner The only dif ference is in the manner in which the constructed graph is partitioned. Thus, it is possible to interchange parameters between the cut types. This e xchange w ould actually of fer a more direct test of the cut measures by isolating just the cut type as the source of v ariation. All other program parameters being the same, the cuts w ould be operating on the same graph. P er class performances: F or each image class, we pick the 100 trained parameters from one cut type and use them for the other tw o and we compute the classa v erage performances, b M ean al tcis. Fig 5.13 sho ws the spread of the performances of the cut measures using the parameter trained on the a v erage cut measure. Fig 5.14 sho ws the performance spreads of the cut measures using the parameters obtained by training using the normalized cut measure. And, Fig 5.15 sho ws the performances of the cut measures using the mincut trained parameters. Observ e that the minimum cut performance drops when it is applied with parameters learnt from either the a v erage or the normalized cut measures. On the other hand, both the a v erage and the normalized cut measures perform almost similar to each other irrespecti v e of the parameters. This further attests to the similarity of the a v erage and normalized cut measures. The mincut seems to be most sensiti v e to being trained. This is not surprising in some w ays, since mincut is wellkno wn for splintering of f small groups, thus de grading partitioning performance. Ho we v er our analysis also suggests that 92 PAGE 104 this a w can be remedied by training the algorithm parameters to b uild appropriate graphs where this dra wback w ould ha v e less of a deleterious ef fect. 93 PAGE 105 Figure 5.9. Image on which the performance of normalized cut is the most dif ferent from the other (a) Ground T ruth (b) A v erage ( b al t025) (c) Normalized ( b al t030) (d) Minimum ( b al t023) 94 PAGE 106 Figure 5.10. Image on which the performance of mincut is the most dif ferent from the other (a) Ground T ruth (b) A v erage ( b al t035) (c) Normalized ( b al t046) (d) Minimum ( b al t068) 95 PAGE 107 Figure 5.11. Box plot of best performances ( b max 1 al tI c k) on 50 images in the test set for the three cut measures. Avg Nrm Min 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Beta Figure 5.12. Box plot of dif ference between best and the w orst performances ( b max 1 al tI c k b max 100I c k) on a per image based for the 50 images in the test set for the three cut measures. Avg Nrm Min 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Change in Beta 96 PAGE 108 Figure 5.13. Box plots of 100 mean performances on the three cut measures with aver a g e cut trained parameters (500 samples/cut). Avg Nrm Min 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Parameters Learned from Average Cut MeasureBeta Figure 5.14. Box plots of 100 mean performances on the three cut measures with normalized cut trained parameters (500 samples/cut). Avg Nrm Min 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Parameters Learned from Normalized Cut MeasureBeta 97 PAGE 109 Figure 5.15. Box plots of 100 mean performances on the three cut measures with minimum cut trained parameters (500 samples/cut). Avg Nrm Min 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Parameters Learned from Minimum Cut MeasureBeta 98 PAGE 110 Figure 5.16. Box plots of the best performance on the three cut measures with aver a g e cut trained parameters (50 samples/cut). Avg Nrm Min 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Parameters Learned from Average Cut MeasureBeta P er ima g e performances: W e can also look at the best performance on a per image basis, instead the classwise a v erage performances. F or this, we use the 100 trained parameter combinations for one cut measure with the other tw o cut measures. W e pick the best performance, b max 1 al tI c k, for each image and plot the spreads. Each boxplot is composed of 50 performance measures, one for each of the 50 test images. Here, we observ e that the minimum cut performs on par with the other tw o cut measures and outperforms them by a small amount, when mincut trained parameters are applied. Also, the a v erage and the normalized cut measures, again seem to be pretty similar 99 PAGE 111 Figure 5.17. Box plots of the best performance on the three cut measures with normalized cut trained parameters (50 samples/cut). Avg Nrm Min 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Parameters Learned from Normalized Cut MeasureBeta Figure 5.18. Box plots of performance on the three cut measures with minimum cut trained parameters (50 samples/cut). Avg Nrm Min 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Parameters Learned from Minimum Cut MeasureBeta 100 PAGE 112 5.5 T ime Issues Figure 5.19. Box plot of the time (in seconds) tak en by the a v erage, normalized and the minimum cut o v er a set of 20 images. 0.5 1 1.5 2 2.5 3 3.5 0 50 100 150 200 250 300 350 Box plot of the time taken by the Average, Normalized and Minimum Cut over a set of 20 imagesTime Average Normalized Minimum Fig 5.19 sho ws the distrib ution of the time, in seconds, tak en by each cut algorithms to nish the grouping process, including lo wle v el processing and graph construction, which are the same for all the cut measures. The algorithms were run on a Sun UltraEnterprise with a clock speed of 247 MHz. W e can clearly see that the time tak en to compute the normalized cut v ary more than the other tw o cuts. This can be attrib uted to the normalized Laplacian matrix structure, all of whose diagonal entries are one and the rest of the entries are ne gati v e b ut with absolute v alues less than one. Whereas, for the a v erage cut, the diagonal elements can ha v e a v alue greater than 1. Hence the relati v e v ariation between the matrix element is less for the normalized cut case, which ef fects the rate of con v er gence of the eigen v alue/eigen vect or computations. 5.6 Summary W e empirically analyzed the performance of graph partitioning based grouping strate gies based on three cut measures, a v erage cut, normalized cut, and mincut. W e e v aluated the performance of 101 PAGE 113 the grouping strate gies with respect to ho w much the computed groups can aid constrained search based object recognition. W e used a f airly lar ge data set of 100 images, of which 50 images were used for training the parameters and performance analyses were conducted on the other set of 50 images. These images were from 5 classes, with 10 images in each cate gory W e chose the algorithm parameters using a f airly rigorous strate gy Statistical signicances of the results were established with Analysis of V ariance. Among the conclusions of this study are the follo wing: 1. When considering o v erall performances, the dif ferences in performances of the three cut measures are v ery minimal, at best. 2. When considering a v erage performances within an image class some small, b ut statistically signicant dif ferences sho w up. W e observ e that for the natural objects indoor and manmade objects outdoor classes, the mean performance of the normalized cut is mar ginally better than the other tw o cuts. F or aerial images, a v erage cut seems to be better While for natural objects outdoor mincut seems to be better F or images of manmade objects indoors, all three cuts ha v e similar performances. 3. W e also observ e that o v erall performance, irrespecti v e of the cut types, is the best for natural objects in outdoor settings, follo wed by manmade outdoors, manmade indoors, aerial images, and natural objects indoors, in that order 4. When considering per image best performances, minimum cut performances are v ery mar ginally higher than the other tw o cut measures. The dif ferences between a v erage and normalized cut performances are minimal. 5. When considering per image best performances, we observ e that the minimum cut is sensiti v e to the parameter choice, while the a v erage and the normalized cuts are not. 6. When constrained to operate on the same graph, the mean performances of the normalized and the a v erage cuts are statistically equi v alent. The performance of mincut v aries depending on the whether right parameter set is used. If mincut trained parameter sets are used, the mean performance o v er an image class is better otherwise the performance drops. 102 PAGE 114 7. A v erage cut seems to of fer the best compromise between performance, stability of perfor mance, and speed, o v er the other tw o measures. 103 PAGE 115 CHAPTER 6 DISCUSSIONS AND CONCLUSIONS In this dissertation, we de v eloped, analytically modeled, and empirically e v aluated a graph spectra based frame w ork, to form lar ge perceptual groups from relations dened o v er small number of image primiti v es. W e formulated a learning frame w ork, based on game theory and learning automata, to adapt the perceptual grouping process to an image domain. Some of the interesting conclusions are, 1. It is possible to perform gureground se gmentation from a set of local salient relations such as parallelism, continuity perpendicularity proximity and re gion similarity each dened o v er a small number of primiti v es. 2. The relati v e importance of the salient relations are dependent on the object or domain of interest. 3. Just geometric relationships are not suf cient for groupings. Photometric attrib utes such as re gion similarity play a signicant role in grouping e xtended lo wle v el features 4. Optimization of none of the three measures considered is guaranteed to result in the correct partitioning of K objects, in the strict stochastic order sense, for all image statistics. Qualitati v ely speaking, under v ery restricti v e conditions when the a v erage inter object feature af nity is v ery weak when compared to the a v erage intraobject feature af nity the minimum cut measure is optimal. The a v erage cut measure is optimal for graphs whose partition width is less than the mode of distrib ution of all possible partition widths. The normalized cut measure is optimal for a more restricti v e subclass of graphs whose partition width is less than the mode of the partition width distrib utions and the strength of inter object links is six times less than the intraobject links. 104 PAGE 116 5. When considering o v erall performances, the dif ferences in performances of the three cut measures are v ery minimal, at best. 6. A v erage cut seems to of fer the best compromise between performance, stability of perfor mance, and speed, o v er the other tw o measures. The implications of the study in terms of research into grouping algorithms are tw o fold. First, formulating another partitioning measure does not seems to of fer the most producti v e line of research, atleast from an empirical vie wpoint. Inf act some of the recent studies which ha v e come out after this w ork concur with our analysis [63 ]. Richer graphs, say with more than just 2ary relations, might of fer the best possible ne xt line of attack for performance impro v ement. Second, the hardest image domain for grouping, atleast using graph partition based methods, seem to be natural objects in indoor settings. Ha ving identied the image domains which are easiest through to the hardest, to perfect the process of se gmentation, we could use the easiest image domain and then mo v e onto the harder ones. W e ha v e clearly demonstrated the po wer of supervised learning to form lar ge groupings. Ho we v er manual ground truth comes at a premium; it is hard to create databases of substantial sizes. The ne xt le v el of adv ancement w ould be to de v elop semisupervised learning using partial ground truths and then ultimately mo v e onto unsupervised learning without ground truths. F or future w ork, the forms of af nity relations rather than just the parameters could be learnt based on image statistics for better graph construction. W e could e xtend grouping to video sequences and include depth and motion v ectors in the frame w ork. W e could also combine and perform grouping on stereo images. 105 PAGE 117 REFERENCES [1] N. Ahuja and M. T uceryan. Extraction of early perceptual structure in dot patterns: Integrating re gion, boundary and component gestalt. Computer V ision, Gr aphics and Ima g e Pr ocessing 48(3):304356, Dec 1989. [2] A. Amir and M. Lindenbaum. A generic grouping algorithm and its quantitati v e analysis. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 20:168185, Feb 1998. [3] J. August, S. Zuck er and K. Siddiqi. Fragment grouping via the principle of perceptual occlusion. In Pr oceedings of the International Confer ence on P attern Reco gnition pages 38, 1996. [4] A. Berengolts and M. Lindenbaum. On the performance of connected components grouping. International J ournal of Computer V ision 41(3):195216, 2001. [5] R. P Boppana. Eigen v alues and graph bisection: an a v erage case analysis. In Pr oceedings of the 28th International Symposium on F oundations of Computer Science pages 280285, 1987. [6] S. Borra and S. Sarkar A frame w ork for performance characterization of intermediatele v el grouping modules. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 19:13061312, No v 1997. [7] K. Bo yer M. Mirza, and G. Ganguly The rob ust sequential estimator: A generalapproach and its application to surf ace or ganization in range data. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 16(10):9871001 Oct 1994. [8] K. Bo yer and S. Sarkar Perceptual or ganization in computer vision: Status, challenges, and potential. CVIU 76(1):16, October 1999. [9] E. Brunswik and J. Kamiya. Ecological cuev alidity of proximity and other Gestalt f actors. American J ournal of Psyc holo gy pages 2032, 1953. [10] T Bui, S. Chaudhuri, T Leighton, and M. Sipser Graph bisection algorithms with good a v erage case beha vior In Pr oceedings of the 25th International Symposium on F oundations of Computer Science pages 181192, 1984. [11] S. Casadei and S. K. Mitter Hierarchical image se gmentation detection of re gular curv es in a v ector graph. International J ournal in Computer V ision 27(1):71100, 1998. [12] R. L. Castano and S. Hutchinson. A Probababilistic Approach to Perceptual Grouping. Computer V ision and Ima g e Under standing 64(3):399419, No v 1996. 106 PAGE 118 [13] D. T Clemens and D. W Jacobs. Space and time bounds on inde xing 3D models from 2D images. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 13:10071017, Oct 1991. [14] J. Costeira and T Kanade. A multibody f actorization method for motion analysis. In Pr oceedings of International Confer ence on Computer V ision pages 10711076, 1995. [15] I. J. Cox and S. B. R. an Y Zhong. Ratio re gions: a technique for image se gmentation. In Pr oceedings of the International Confer ence on P attern Reco gnition pages 557564, 1996. [16] T F Cox and M. A. A. Cox. Multidimensional Scaling Chapman and Hall/CRC, 2001. [17] J. Elder and S. Zuck er Computing contour closure. Eur opean Confer ence on Computer V ision pages 399412, 1996. [18] J. H. Elder A. Krupnik, and L. Johnson. Contour grouping with prior models. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 25(6):661674, Jun 2003. [19] W D. Ellis, Routledge, and K. P aul, editors. A Sour ce Book of Gestalt Psyc holo gy (tr anslated works by W ertheimer) London, 1955. [20] A. Etamadi, J. P Schmidt, G. Matas, J. Illingw orth, and J. Kittler Lo wle v el grouping of straight line se gments. Pr oceedings of the British Mac hine V ision Confer ence pages 119 126, 1991. [21] M. Fiedler Algebraic connecti vity of graphs. Czec h. Math. J 23:298305, 1973. [22] M. Fiedler A property of eigen v ectors of nonne gati v e symmetric matrices and its application to graph theory Czec h. Math. J 25:619637, 1975. [23] R. Fisher F r om Surfaces to Objects: Computer V ision and Thr ee Dimensional Scene Analysis John W ile y and Sons, 1989. [24] Y Gdalyahu, N. Shental, and D. W einshall. Perceptual grouping and se gmentation by stochaistic clustering. Pr oceedings of the IEEE Computer Society Confer ence on Computer V ision and P attern Reco gnition 1:367374, June 2000. [25] D. Geiger K. K umaran, and L. P arida. V isual or ganization for gure/ground separation. In Pr oceedings of the IEEE Computer Society Confer ence on Computer V ision and P attern Reco gnition pages 155160, 1996. [26] K. Gould and M. Shah. The trajectory primal sk etch: A multiscale scheme for representing motion characteristics. In Pr oceedings of the IEEE Computer Society Confer ence on Computer V ision and P attern Reco gnition pages 7985, 1989. [27] W E. L. Grimson. The combinatorics of heuristic search termination for object recognition in cluttered en vironments. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 13:920935, Sept 1991. [28] G. Guy and G. Medioni. Inferring global perceptual contours from local features. International J ournal of Computer V ision 20:113133, 1996. 107 PAGE 119 [29] L. Hagen and A. B. Kahng. Ne w spectral methods for ratio cut partitioning and clustering. IEEE T r ansactions on Computer Aided Design 11(9):10741085, Sep 1992. [30] P Ha v aldar G. Medioni, and F Stein. Perceptual grouping for generic recognition. International J ournal in Computer V ision 20(1/2):5980, 1996. [31] L. Herault and R. Horaud. Figure ground discrimination: A combinatorial optimization method. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 15:899914, Sept 1993. [32] D. Hof fman and B. Flinchbaugh. The interpretation of biological motion. Biolo gical Cyber netics 42(3):195202, 1982. [33] A. Hoogs, R. Collins, R. Kaucic, and J. Mundy A common set of perceptual observ ables for grouping, gureground discrimination, and te xture classication. IEEE T r ansaction on P attern Analysis and Mac hine Intellig ence 25(4):458474, Apr 2003. [34] R. A. Horn and C. R. Johnson. Matrix Analysis Cambridge Uni v Press, 1993. [35] D. P Huttenlocher and P C. W ayner Finding con v e x edge groupings in an image. In Pr oceedings of the International Confer ence on Computer V ision and P attern Reco gnition pages 406412, 1991. [36] H. Ishika w a and D. Geiger Se gmentation by grouping junctions. In Pr oceedings of the IEEE Computer Society Confer ence on Computer V ision and P attern Reco gnition pages 125131, 1998. [37] D. W Jacobs. Rob ust and ef cient detection of salient con v e x groups. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 18:2337, Jan 1996. [38] I. H. Jermyn and H. Ishika w a. Globally optimal re gions and boundaries as minimum ratio weight c ycles. IEEE T r ansactions on P attern Analsis and Mac hine Intellig ence 23(10):1075108 8, Oct. 2001. [39] K. K of fka. Principles on Gestalt Psyc holo gy Harcourt Brace, 1935. [40] W K ohler Gestalt Psyc holo gy Meridian, 1980. [41] S. Z. Li. P arameter estimation for optimal object recognition: Theory and application. International J ournal of Computer V ision 21:207222, Feb 1997. [42] C. Lin, A. Huertas, and R. Ne v atia. Detection of b uildings using perceptual groupings and shado ws. In Pr oceedings of the IEEE Computer Society Confer ence on Computer V ision and P attern Reco gnition pages 6269, 1994. [43] D. Lo we. Threedimensional object recognition from single tw odimensional images. Articial Intellig ence 31(3):355395, Mar 1987. [44] D. Marr VISION: A Computational In vestigation into the Human Repr esentation and Pr ocessing of V isual Information W H. Freeman, 1982. 108 PAGE 120 [45] D. W Matula. Graph theoretic techniques for cluster analysis algorithms. In J. V Ryzin, editor Classication and Clustering pages 95129, 1977. [46] J. D. McCaf ferty Human and Mac hine V ision Computing P er ceptual Or ganization Ellis Horw ood, 1990. [47] R. Mohan and R. Ne v atia. Using perceptual or ganization to e xtract 3d structures. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 11(11):1121113 9, No v 1989. [48] K. S. Narendra and M. L. Thathachar Learning A utomata: An Intr oduction PrenticeHall, 1989. [49] R. Nelson and A. Selinger A cubist approach to object recognition. In International Confer ence on Computer V ision pages 614621, 1998. [50] R. Nelson and A. Selinger Perceptual grouping hierarchy for 3d object recognition and representation. In D ARP A Ima g e Under standing W orkshop pages 157163, 1998. [51] M. Nicolescu and G. Medioni. Layered 4D representation and v oting for grouping from motion. IEEE T r ansactions on P attern Analsis and Mac hine Intellig ence 25(4):492501, Apr 2003. [52] J. Pearl. Pr obabilistic Infer ence in Intellig ent Systems Mor gan Kaufmann, 1988. [53] M. Pelillo and M. Rece. Learning compatibility coef cients for relaxation labelling pur poses. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 16:933945, Sept 1994. [54] J. Peng and B. Bhanu. Closed loop object recognition using reinforcement learning. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 20:139154, Feb 1998. [55] P Perona and W Freeman. A f actorization approach to grouping. Eur opean Confer ence on Computer V ision pages 655670, 1998. [56] A. Pothen, H. Simon, and K. P Liou. P artitioning of sparse matrices with eigen v ectors of graphs. SIAM J Matrix Annal. Appl. 11:430452, 1990. [57] S. V Raman, S. Sarkar and K. L. Bo yer A graphtheoretic approach to generating structure hypotheses in cerebral magnetic resonance images. Computer V ision, Gr aphics, and Ima g e Under standing 57(1):8198, Jan. 1993. [58] K. Rao and R. Ne v atia. Shape descriptions from imperfect and incomplete data. In International Confer ence on P attern Reco gnition pages V olI 125129, 1990. [59] I. Rock and S. P almer The le gac y of Gestalt psychology Scientic American pages 8490, 1990. [60] P Rosin and G. W est. Se gmenting curv es into elliptic arcs and straight lines. In International Confer ence on Computer V ision pages 7578, 1990. 109 PAGE 121 [61] P Rosin and G. W est. Extracting surf aces of re v olution by perceptual grouping of ellipses. In Pr oceedings of the IEEE Computer Society Confer ence on Computer V ision and P attern Reco gnition pages 677678, 1991. [62] G. Roth and M. D. Le vine. Geometric primiti v e e xtraction using a genetic algorithm. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 1:901905, Sept 1994. [63] V Roth, J. Laub, M. Ka w anabe, and J. M. Buhmann. Optimal cluster preserving embedding of nonmetric proximity data. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 25(12):15401551 Dec 2003. [64] Y Saab and V Rao. On the graph bisection problem. IEEE T r ansactions on Cir cuits and Systems 39(9):760762, Sept. 1992. [65] H. Saran and V V V azirani. Finding Kcuts within twice the optimal. SIAM J ournal of Computing 24(1):101108, Feb 1995. [66] S. Sarkar Learning to form lar ge groups of salient image features. In Pr oceedings of the IEEE Computer Society Confer ence on Computer V ision and P attern Reco gnition pages 780786, 1998. [67] S. Sarkar and K. Bo yer Perceptual or ganization in computer vision: A re vie w and a proposal for a classicatory structure. IEEE T r ansactions on Systems, Man, and Cybernetics 23(2):382399, 1993. [68] S. Sarkar and K. L. Bo yer Inte gration, inference, and management of spatial information using bayesian netw orks: Perceptual or ganization. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 15:256274, Mar 1993. Special Section on Probabilistic Reasoning. [69] S. Sarkar and K. L. Bo yer A computational structure for preattenti v e perceptual or ganization: Graphical enumeration and v oting methods. IEEE T r ansactions on Systems, Man, and Cybernetics 24:246267, Feb 1994. [70] S. Sarkar and K. L. Bo yer A computational structure for preattenti v e perceptual or ganization: Graphical enumeration and v oting methods. IEEE T r ansactions on Systems, Man, and Cybernetics 24(2):246267, Feb 1994. [71] S. Sarkar and K. L. Bo yer Quantitati v e measure of change based on feature or ganization: Eigen v alues and eigen v ectors. Computer V ision and Ima g e Under standing 71(1):110136, July 1998. [72] S. Sarkar D. Majchrzak, and K. K orimilli. Perceptual or ganization based computational model for rob ust se gmentation of mo ving objects. Computer V ision and Ima g e Under standing 86(3):141170, June 2002. [73] S. Sarkar and P Soundararajan. Supervised learning of lar ge perceptual or ganization: Graph spectral partitioning and learning automata. IEEE T r ansaction on P attern Analysis and Mac hine Intellig ence 22(5):504525, May 2000. 110 PAGE 122 [74] E. Saund. Perceptual or ganization of occluding contours of opaque surf aces. Computer V ision and Ima g e Under standing 76(1):7082, October 1999. [75] G. L. Scott and H. C. LonguetHiggins. Feature grouping by relocalisation of eigen v ectors of the proximity matrix. In Pr oceedings of British Mac hine V ision Confer ence pages 103108, 1990. [76] J. Sha'ashua and S. Ullman. Structural salienc y: The detection of globally salient structures using a locally connected netw ork. International Confer ence on Computer V ision pages 321327, 1988. [77] M. Shah, K. Rangarajan, and P Tsai. Motion trajectories. IEEE T r ansactions on Systems, Man, and Cybernetics 23:11381150, 1993. [78] M. Shak ed and J. G. Shantikumar Stoc hastic or der s and their application Academic Press Inc., 1994. [79] J. Shi and J. Malik. Normalized cuts and image se gmentation. Pr oceedings of the IEEE Computer Society Confer ence on Computer V ision and P attern Reco gnition pages 731737, Jun 1997. [80] A. Shio and J. Sklansk y Se gmentation of people in motion. In Pr oceedings of the IEEE W orkshop on V isual Motion pages 325332, 1991. [81] C. Shu and R. Jain. V ector eld analysis for oriented patterns. In Pr oceedings of the IEEE Computer Society Confer ence on Computer V ision and P attern Reco gnition pages 673676, 1992. [82] P Soundararajan and S. Sarkar Empirical e v aluation of graph partitioning measures for perceptual or ganization. In EEMCV01 pages 141153, 2001. [83] P Soundararajan and S. Sarkar In v estigation of measures for grouping by graph partitioning. In Pr oceedings of the International Confer ence on Computer V ision and P attern Reco gnition pages 239246, 2001. [84] T SyedaMahmood. Data dri v en and model dri v en selection using parallel line groups. Computer V ision and Ima g e Under standing 67(3):205222, Sep 1997. [85] T SyedaMahmood. Detecting perceptually salient te xture re gions in images. Computer V ision and Ima g e Under standing 76(1):93108, October 1999. [86] M. L. Thathachar and P S. Sastry Learning optimal dicriminant functions through a cooperati v e game of automata. IEEE T r ansactions on Systems, Man, and Cybernetics 17:73 85, Jan 1987. [87] C. C. T u and H. Cheng. Spectral methods for graph bisection problems. Computer s Oper ations Resear c h 25(7/8):519530, July 1999. [88] A. V erri and T Poggio. Against quantitati v e optical o w In International Confer ence on Computer V ision pages 171180, 1987. 111 PAGE 123 [89] D. W agner and F W agner Between min cut and graph bisection. T echnical report, TU Berlin, Sept. 1991. T echnical Report No. 307/1991, Algorithmic Discrete Mathematics. [90] S. W ang and J. M. Siskind. Image se gmentation with minimum mean cut. In Pr oceedings of the International Confer ence on Computer V ision pages 517524, 2001. [91] Y W eiss. Se gmentation using eigen v ectors: A unifying vie w In Pr oceedings of the Inter national Confer ence on Computer V ision v olume 2, pages 975982, 1999. [92] L. W illiams and A. Hanson. Perceptual completion of occluded surf aces. Computer V ision and Ima g e Under standing 64(1):120, Jul 1996. [93] L. R. W illiams and D. W Jacobs. Stochastic completion elds: A neural model of illusory contour shape and salience. In Pr oceedings of the International Confer ence on Computer V ision pages 408415, 1995. [94] L. R. W illiams and K. K. Thornber A comparison of measures for detecting natural shapes in cluttered backgrounds. International J ournal of Computer V ision 34(2/3):8196, August 1999. [95] A. W itkin and J. T enenbaum. On the role of structure in vision. In HMV83 pages 481543, 1983. [96] Z. W u and R. Leahy An optimal graph theoretic approach to data clustering: theory and application to image se gmentation. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 15:11011113, No v 1993. [97] C. T Zahn. Graph theoretic methods for detecting and describing Gestalt clusters. IEEE T r ansactions on Computer s 20:6886, 1971. [98] J. Zhong, T Huang, and R. Adrian. Salient structure analysis for uid o w In Pr oceedings of the IEEE Computer Society Confer ence on Computer V ision and P attern Reco gnition pages 310315, 1994. [99] S. C. Zhu. Embedding Gestalt la ws in mark o v random elds. IEEE T r ansactions on P attern Analysis and Mac hine Intellig ence 1999. [100] S. Zuck er Computational and psychophysical e xperiments in grouping: Early orientation selection. In Human and Mac hine V ision pages 545567, 1983. [101] S. Zuck er Early orientation selection: T angent elds and the dimensionality of their support. Computer V ision, Gr aphics and Ima g e Pr ocessing 32(1):74103, Oct 1985. [102] S. Zuck er C. Da vid, A. Dobbins, and L. Iv erson. The or ganization of curv e detection: Coarse tangent elds and ne spline co v erings. In International Confer ence on Computer V ision pages 568577, 1988. 112 PAGE 124 ABOUT THE A UTHOR P admanabhan Soundararajan earned his Bachelor of Engineering de gree (Electronics & Communication) in 1995 from Malnad Colle ge of Engineering, Mysore Uni v ersity India. He also w ork ed at the Supercomputer and Education Research Center (SERC), Indian Institute of Science (IISc) for 3 years and later joined the Ph.D. program at Uni v ersity of South Florida in 1998. His research interests include machine vision, machine learning, speech processing. 