xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam 2200409Ka 4500
controlfield tag 001 002029028
007 cr bnu|||uuuuu
008 090915s2009 flu s 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0002852
Design and implementation of an integrated dynamic vision system for autonomous systems operating in uncertain domains
h [electronic resource] /
by Michail Kontitsis.
[Tampa, Fla] :
b University of South Florida,
Title from PDF of title page.
Document formatted into pages; contains 93 pages.
Dissertation (Ph.D.)--University of South Florida, 2009.
Includes bibliographical references.
Text (Electronic dissertation) in PDF format.
ABSTRACT: In recent years unmanned aircraft systems (UAS) have been successfully used in a wide variety of applications. Their value as surveillance platforms has been proven repeatedly in both military and civilian domains. As substitutes to human inhabited aircraft, they fulfill missions that are dull, dirty and dangerous. Representative examples of successful use of UAS are in areas including battlefield assessment, reconnaissance, port security, wildlife protection, wildfire detection, search and rescue missions, border security, resource exploration and oil spill detection. The reliance of almost every UAS application on the ability to sense, detect, see and avoid from a distance has motivated this thesis, attempting to further investigate this issue.In particular, among the various types of UAS, small scale unmanned rotorcraft or Vertically Take-off and Landing, (VTOL) vehicles have been chosen to serve as the sensor carrier platforms because of their operational flexibility. In this work we address the problem of object identification and tracking in a largely unknown dynamic environment under the additional constraint of real-time operation and limited computational power. In brief, the scope of this thesis can be stated as follows: Design a vision system for a small autonomous helicopter that will be able to: Identify arbitrary objects using a minimal description model and a-priori knowledge; Track objects of interest; Operate in real-time; Operate in a largely unknown, dynamically changing, outdoors environment under the following constraints: Limited processing power and payload; Low cost, off-the-shelf components. The main design directives remain that of real-time execution and low price, high availability components.It is in a sense an investigation for the minimum required hardware and algorithmic complexity to accomplish the desired tasks. After development, the system was evaluated as to its suitability in an array of applications. The ones that were chosen for that purpose were: Detection of semi-concealed objects; Detection of a group of ground robots; Traffic monitoring. Adequate performance was demonstrated in all of the above cases.
Mode of access: World Wide Web.
System requirements: World Wide Web browser and PDF reader.
Co-advisor: Kimon Valavanis, Ph.D.
Co-advisor: Sudeep Sarkar, Ph.D.
Unmanned aerial systems
x Computer Science and Engineering
t USF Electronic Theses and Dissertations.
Design and Implementation of an Integrat ed Dynamic Vision System for Autonomous Systems Operating in Uncertain Domains by Michail Kontitsis A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science and Engineering College of Engineering University of South Florida Co-Major Professor: Kimon Valavanis, Ph.D. Co-Major Professor: Sudeep Sarkar, Ph.D. Dmitry Goldgof, Ph.D. Ali Yalcin, Ph.D. Stephen Wilkerson, Ph.D. Date of Approval: January 30, 2009 Keywords: unmanned aerial system s, robot vision, helicopt er vision, automated surveillance, target tracking, traffic monitoring Copyright 2009 Michail Kontitsis
Dedicated to my motherÂ’s memory and to my fatherÂ’s perseverance
Acknowlegdements I would like to express my graditude to my mentor Dr. Kimon Valavanis for his support and guidance through all my y ears in graduate school. I would also like to thank all my labmates and espec ially Richard Garcia and Konstantinos Dalamagkidis for their invaluable assi stance during the impl ementation phase of this system. Finally, it should be mentioned that the research was partially funded by the Office of Naval Research, the Army Research Office, the Hillbourough County Transportation Division and the National Science Foundation.
i Table of Contents List of Tables iv List of Figures v Abstract ix Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Introducing the problem 2 1.3 Method of solution 5 1.4 Contributions 6 1.5 Thesis outline 7 Chapter 2 Literature review 8 2.1 Related work 8 2.2 Remarks 12 Chapter 3 Restating and addressing the problem 13 3.1 Challenges of implementing a vision system for a VTOL 14 3.2 Module overview 15 3.3 Complexity 20 3.4 Tracking system 22 3.4.1 Tracking system module overview 23
ii 3.4.2 Target selection module 24 3.4.3 Matching 25 3.4.4 Template update module 28 184.108.40.206 Evaluating t he template update module 30 3.4.5 Pan-Tilt controller 32 Chapter 4 System implementati on and performance evaluation 36 4.1 Applications 36 4.2 Application 1Mine detection 36 4.3 Application 2Tracking a team of robots on the ground 46 4.4 Application 3Tr affic monitoring 50 4.4.1 Introduction 50 4.4.2 Description of the system 51 4.5 Additional testing 58 Chapter 5 Conclusion 70 5.1 Introduction 70 5.2 Discussion of results 71 5.3 Discussion of contributions 73 5.4 Final conclusion 74 Chapter 6 Future research 75 6.1 Introduction 75 6.2 Adaptive threshol ds and constants 75 6.3 Active vision 76
iii 6.4 Consistent target selection 77 6.5 Multiple UAVs 78 6.6 Improved hardware / different platforms 80 References 81 Appendices 86 Appendix A 87 Appendix B 93 About the Author End Page
iv List of Tables Table 2.1 Existing vision systems for VTOL vehicles. 9 Table 2.2 Summary of system characteristics and functionality. 11 Table 3.1 Pseudo-code for raising the alarm. 19 Table 3.2 Computational complexity of the modules used by the object identification system. 21 Table 3.3 Control rules for the Pan-Tilt. 34 Table 4.1 Performance evaluation for flight 1. 46 Table 4.2 Performance evaluation for flight 2. 46 Table 4.3 Performance evaluation for flight 3. 46 Table 4.4 Performance evaluation for flight 4. 47 Table 4.5 Performance evaluation for flight 5. 47 Table 4.6 Performance evaluation for all flights. 48 Table A.1 On-board equipmentPayloa d limitations considered. 88
v List of Figures Figure 3.1 Block diagram overview of the object detection system. 16 Figure 3.2 A block diagram of the tracking system. 25 Figure 3.3 Operation of the tr acking system at a given time k 26 Figure 3.4 The template matching process. 29 Figure 3.5 Tracking results for a =0.005. 31 Figure 3.6 Tracking results for a =0.155. 32 Figure 3.7 Tracking results for a =0.025. 33 Figure 3.8 Plot showing the RM S error for each value of a 34 Figure 3.9 Plot showin g the RMS error for so me characteristic values of a 35 Figure 4.1 Input image and resulting response by the system. 39 Figure 4.2 Another input-out put pair of images. 40 Figure 4.3 Another set of input image and resulting response by the system. 40 Figure 4.4 Images that show how t he system performs in different lighting conditions. 41 Figure 4.5 Test image under differ ent lighting conditions. 41
vi Figure 4.6 Correct identification of the object of interest under different conditions. 42 Figure 4.7 A correctly rejected image showing just the background. 43 Figure 4.8 The object is not detected. 43 Figure 4.9 False alarms caused by the presence of the helicopterÂ’s exhaust fumes. 44 Figure 4.10 False alarms caused by random background formations. 44 Figure 4.11 False alarm caused by an object with similar hue to the object of interest. 45 Figure 4.12 Identific ation of a team of four UGVÂ’s at a close distance. 48 Figure 4.13 Identificatio n of a team of four UGVÂ’s at a medium distance. 49 Figure 4.14 Identificatio n of a team of four UGVÂ’s when the image is subject to blur. 49 Figure 4.15 Identific ation of a team where only three out of four UGVÂ’s are visible. 50 Figure 4.16 A block diagram repres entation of the data extraction mode of the traffic monitoring system. 53 Figure 4.17 The RGB input image in the left is converted to HSI. 54 Figure 4.18 The application of a threshold on the saturation component (left) eliminates mo st of the pixels belonging to the road (right). 54
vii Figure 4.19 Applying morphologic al operators to the binary image (upper left). 55 Figure 4.20 Extraction of regions using c onnected components and a size filter. 55 Figure 4.21 The output of the traffic load esti mator compared to the ground truth. 57 Figure 4.22 A block diagram repres entation of the manual target designation mode of the tr affic monitoring system. 58 Figure 4.23 Tracking of multiple manually selected vehicles. 59 Figure 4.24 Tracking is terminated w hen targets, such as the two vehicles at the right side of the frame, exit the field of view. 60 Figure 4.25 Tracking is maintained despite the unpredictable motion of the VTOL and the parallax inducing motion of the vehicles. 60 Figure 4.26 Tracking of vehicles ex ecuting maneuvers such as turns and u-turns. 62 Figure 4.27 Tracking of a vehicle making a u-turn. 63 Figure 4.28 Images blurred by t he abrupt motion of the carrying platform. 64 Figure 4.29 Severely abrupt motion results in image blur and loss of track. 65
viii Figure 4.30 Tracking of static objects in an IR sequence. 66 Figure 4.31 Tracking vehicles in an IR sequence. 66 Figure 4.32 Tracking of race cars as they approach the camera. 67 Figure 4.33 Tracking racers as th ey become partially occluded by the safety fence. 68 Figure 4.34 Severe motion blur results to loss of track. 68 Figure 4.35 Tracking error due to occlusion. 69 Figure A.1 The Raptor 90 SE carrying t he first version of the vision system. 88 Figure A.2 The computer on-board the VTOL vehicle (assembled and tested by Richard Garcia). 89 Figure A.3 The Maxxi Joker 2 loaded with equipment. 90 Figure A.4 Details of the Maxxi Joker 2. 91 Figure A.5 The second version of the on-board processing system (assembled and tested by Richard Garcia). 92
ix Design and Implementation of an Int egrated Dynamic Vision System for Autonomous Sytems Operati ng in Uncertain Domains Michail Kontitsis ABSTRACT In recent years unmanned aircraft sys tems (UAS) have been successfully used in a wide variety of applications. Their value as surveillance platforms has been proven repeatedly in bot h military and civilian domains. As substitutes to human inhabited aircraft, they fulfill missi ons that are dull, dirty and dangerous. Representative examples of successful us e of UAS are in areas including battlefield assessment, reconnaissance, port secu rity, wildlife protection, wildfire detection, search and rescue missions, border se curity, resource exploration and oil spill detection. The reliance of almost every UAS application on the ability to sense, detect, see and avoid from a dist ance has motivated this thesis, attempting to further investigate this issue. In particular, among the various types of UAS, small scale unmanned rotorcraft or Ve rtically Take-off and Landing, (VTOL) vehicles have been chosen to serve as the sensor carrier platforms because of their operational flexibility.
x In this work we address the problem of object identification and tracking in a largely unknown dynamic environment under the additional cons traint of realtime operation and limited computational powe r. In brief, the scope of this thesis can be stated as follows: Design a vision system for a small aut onomous helicopter that will be able to: Identify arbitrary objects using a minimal description model and a-priori knowledge Track objects of interest Operate in real-time Operate in a largely unknown, dynamically changing, outdoors environment under the following constraints: Limited processing power and payload Low cost, off-the-shelf components The main design directives remain that of real-time execution and low price, high availability com ponents. It is in a sense an investigation for the minimum required hardware and algorithmic co mplexity to accomplish the desired tasks. After development, the system was evaluat ed as to its suitability in an array of applications. The ones that were chosen for that purpose were: Detection of semi-concealed objects
xi Detection of a group of ground robots Traffic monitoring. Adequate performance was demonstrat ed in all of the above cases.
1 Chapter 1 Introduction 1.1 Motivation In recent years unmanned aircraft sys tems (UAS) have been successfully used in a wide variety of applications. Their value as surveillance platforms has been proven repeatedly in bot h military and civilian domains. As substitutes to human inhabited aircraft, they fulfill missi ons that are dull, dirty and dangerous . Representative examples of successf ul use of UAS are in areas including battlefield assessment, reconnaissance, port security, wildlife protection, wildfire detection, search and rescue missions, bor der security and patrol, resource exploration and oil spill detection, to name just a few. The main common component among all those diverse applicati ons is that they are practically variations of remote sens ing and surveillance missions. The reliance of almost every Unmanned Aerial Vehicle (UAV) application on the ability to sense, detect, see and avoi d from a distance has motivated this thesis, attempting to further investigat e this issue. In particular, among the various types of UAS, small scale unmanned rotorcraft or Vertically Take-off and Landing (VTOL) vehicles have been chosen to serve as the sensor carrier platforms because of their operational fl exibility. Having the advantage of being able to operate from almost anywher e, since they require little to none
2 preexisting infrastructure, outweighs t heir deficit of speed and endurance when compared to their fixed wing counterpar ts. Their ability to hover and fly in relatively confined spaces makes them almost ideal for deployment at low altitude and in urban settings in which the majority of fixed wing platforms would be challenged to operate. T herefore, and although report ed research findings are general enough, the focus of the thesis is on designing and implementing an object detection system for a small unm anned custom made VTOL vehicle. 1.2 Introducing the problem To accomplish the aforementioned tasks/ goals autonomously, any UAS must be equipped with the appropriate s ensors to collect data and have enough on-board processing power for data interp retation and decision making. It must also employ a collection of algorithms c apable of dealing with a variety of tasks. A commonly used sensor suite includes inertial measurement units such as accelerometers, Global Positioning System (GPS) receivers, laser or barometric altimeters and cameras oper ating at various areas of the electromagnetic spectrum. In several o ccasions, synthetic aperture radars have also been utilized to provide a more det ailed description of the ground below from a relatively high altitude. Al though inertial sensors (IMU) and GPS measurements alone provide a relatively good estimation of the UAS's location and pose, these sensors are subject to fa ilures and faulty readings on top of not always being available. The GPS signal is susceptible to interference by naturally
3 occurring obstructions such as tree tops or clouds, as well as by artificial ones such as buildings, not to mention intenti onal malicious interference. As a result, readings provided by the GPS unit may be inaccurate or even nonexistent in cases when not enough satellites are visible by the receiver at that particular location. A typical inertial measurement unit can provide acceleration and velocity vectors in 3 axes as well as Euler angles and quaternions at a rate of 5 to 100 Hz depending on the model. The inhe rent problem of this kind of sensors is the presence of measurement drif t. If left uncorrected, the IMU estimate of the location tends to drift away within a few seconds. Several correction mechanisms exist that mitigate the problem usually by modeling the error, by using GPS measurements for correction or a combi nation of both. Laser or barometric altimeters have been used to provide additional information regarding the position of the UAS, namely the altit ude, and they are a good complement to a sensor suite but nevertheless not enough to ensure autonomous navigation in most cases. Cameras have been used as part of the UASÂ’s sensor suite primarily as data collection equipment rat her than navigational aids. Their function usually is to passively sense the environment for the presence of a specifically defined object of interest, record and transmit visual data back to a ground station for evaluation. Alongside with t heir role as data collectors, cameras can be very useful in estimating the relative or ev en absolute position of the carrying vehicle . This method is known as visual odomet er or visual simultaneous localization
4 and mapping (VSLAM). It us ually amounts to an attempt at reconstructing the three-dimensional environment around the vehicle. The resu lt is a map which, if accurately constructed, allows for autonomous navigation of the vehicle. Another essential ability of an aut onomous aerial vehicle is that of recognizing and tracking objects of interest thus, keeping them within the field of view of the camera while recording thei r trajectory. This enhances the utility of the unmanned vehicle and facilitates the wo rk of the ground control personnel. It allows the UAV to be used as a surveill ance tool that expands the covered area without requiring constant attention. Howeve r, identifying arbi trary objects from an overflying moving platform in an unc ontrolled environment can be extremely challenging given the variabi lity of the paramet ers that influence the process. In an outdoors environment varying lighting condi tions, unstructured clutter, motion, random occlusions and visual obstructi ons must be dealt with by the detection and tracking algorithms. A very important design direct ive for an autonomous UAV is the requirement of real-time operation. All t he tasks, especially the ones related to navigation, must be completed as fast as possible. In the worst case, the computation time of the decision maki ng components must not exceed the 33ms barrier that is considered to denote real-time performance. An additional constraint is impos ed on the algorithm by the carrying platform itself. Small aerial vehicles set a bound on the electrical power that can be carried along, which in turn limits the av ailable processing power. With limited
5 processing power at hand, the comple xity of the algorithms becomes an important factor. Between al gorithms that accomplish t he same task, the one with low complexity is always desirable. In this case, it is crucial that the selected algorithm be able to run in real-time on less than state of the art equipment. Another line of distinction exists between systems that are designed so that the processing takes place on-theground station and the ones that use an on-board computer. Obviously the forme r are not affected by any payload limitations therefore al lowing for more powerful computers to be used. In this thesis we address the proble m of object identification and tracking in a largely unknown dynamic environment under the additiona l constraint of realtime operation and limited computational power. 1.3 Method of solution In order to successfully address t he aforementioned problem a series of simple, relatively low complexity tec hniques have been employ ed. Specifically the object detecting algor ithm is based on the appl ication of appropriate thresholds on the image once the latter is converted to the Hue, Saturation, Intensity (HSI) colorspace. The final decision is made by using an accumulator that reflects the number of recent frames in which an object has been detected. When this exceeds a predefined limit t he system produces an alarm and notifies of the objectÂ’s presence. To address t he tracking problem a simple template matching algorithm based on a similarity m easure such as the sum of absolute
6 differences or the normalized correlati on coefficient, was implemented. The template is being continuously updated to maintain its relevancy throughout the time period that the object it describes is being tracked. The updated template at any time k is a linear combination of t he best matching image patch and the template at time k-1 Finally the tracking system has been designed so that it can concurrently accept input from both a human operator and an automated source such as another program. 1.4 Contributions The thesis contribution to the area of vision systems for unmanned aerial systems is the design and impl ementation of a cost effective system capable of performing object identification an d tracking in real-time that: Requires minimal information abo ut the dynamic environment in which it operates Uses a single uncalibrated, not stabilized camera Tracks multiple objects without r equiring a-priori knowledge of or using any assumptions about their trajectories Does not require an IMU. The result is a system that can be assembled by commercially available hardware and can be configur ed to perform surveillance without calibration of the camera or detailed knowle dge of the operating environm ent. It becomes evident that the use of an uncalibrated, not st abilized camera makes the problem very
7 challenging and to some ext end limits the accuracy of obt ained results. However, this is one major point addressed in this work: even with an uncalibrated, unstabilized camera, results are suffi cient to complete assigned missions. 1.5 Thesis outline This thesis consists of 6 Chapters and 2 Appendices. The first Chapter introduces the work and briefly describes the problem, the method of solution and the contributions. The second Chapter provides a review of related work and some remarks on them. The third Chapter is devoted to the detailed description of the proposed solution and the implemented system. The performance evaluation is presented in the fourth Chapter along with a description of the actual scenarios where the system was deployed and the specific tasks that it carried out. Concluding remarks follow in the fifth Chapter and future research topics that can enhance the current implementation ar e given in Chapter 6. Finally, the Appendices provide details on the hardware and software used to implement the vision system.
8 Chapter 2 Literature review 2.1 Related work Vision systems, techniques and algorit hms suitable for UAVs range in complexity from simple color segmentation to statistical pattern recognition. This literature review considers a publication as being related to this thesis if the implemented system is specifically designed for use by UAVs. Furthermore, a work is considered directly comparable if the resulting vision system is physically placed on an unmanned VTOL and has been shown to function under real operating conditions in an outdoors environment. Published related work and proposed machine vision architectures indicate the use of both Â“onboardÂ” , , , , [ 19], , ,  and Â“onthe-groundÂ” processing setups , [7-15], with the latte r being preferred most of the times. For on-board vision systems, due to the limited processing power provided by the on-board computer, der ived algorithms hav e the additional constraint to run in real-time, requ iring reduction of the computational load sometimes accomplished by processing selected image windows instead of the entire image. Table 2.1 pres ents a summary of machine vision techniques used by University research groups, the ma in processing unit (on-board, on-theground) and the unmanned VTOL vehicle pl atform they have been implemented
9 on. Table 2.2 summarizes functionalit y and capabilities of existing fully operational vision systems, including techni ques employed by each one of them. Table 2.1 Existing vision systems for VTOL vehicles. Machine Vision techniques us ed Processing unit Vehicle Berkeley University  No details provided No details provided BEAR Georgia Tech   Edge detectors, morphing, statistical pattern matching Onboard Rmax by Yamaha Standford University  YUV color segmentation, signum of Laplacian of Gaussian (sLoG) On-the-ground Hummingbird Aerospace Robotic Laboratory at Standford MIT  Template matching On-the-ground Black Star by TSK Rose Hulman IT (RHIT)  Template comparison On-board Bergen Twin IT Berlin  No details provided On-the-ground MARVIN by SSM Technik University of Texas  Edge linking matching On-the-ground XCell .60 Swiss Federal Institute of Technology (ETH)  No details provided On-board integrated in camera Huner Technik Carnegie Mellon University  Template matching and RGB color On-the-ground Rmax by Yamaha USC   Omnidirectional, optic flow On-board Bergen Twin Southern Polytechnic State Univesity  Stereo vision, Sobel egde detector On-the-ground Vario Robinson R22 Linkoping University, Sweden (WITAS)  No details provided On-board Rmax by Yamaha The problem of object identific ation and tracking has been studied extensively in computer vision. Although several methods exist that can identify objects in a controlled setting, special reference will be made to those of them that have been adapted for use in unmanned aerial systems since it is believed that they relate more closely to t he problem at hand. One such example is illustrated in the work of Ludi ngton et al.  that pres ents a method for target tracking using a technique based on particle f ilters. Each target is described as a four dimensional particle containing the image coordina tes and the dimensions of the rectangle that surrounds it. The assump tion for the system to operate is that
10 the target moves smoothly between frames and that the frame rate remains sufficiently high. The motion is modeled as Gaussian random walk and the measurement model relies on color and moti on cues. Finally, a neural network is responsible for constructing a performance estimate according to which adaptations are made to the particle filter. Another example is the system described in ,  where features such as rectangles are first extracted using the Douglas-Peucker algorithm  and then tracked using a Kalman filter. Although not explicitly designed for an unmanned vehicle the system presented in  is addressing the problem of tracking moving objects in aerial video. It employs two different tracke rs, one for short and another for long term tracking. The short term tracking is acco mplished by first registering successive frames using an affine transformation to correct for background motion and then extracting and matching Kanade -Lucas -Tomasi  features between successive image pairs. The long term tracker relies on model matc hing to follow a specific pattern through the sequence. The Lucas-Kanade tracker is also utilized by Kanade et al.  in conjunction with a motion prediction scheme t hat relies on Kalman filtering, an image pyramid and two dimens ional affine motion models to deal with large motion in image sequences taken fr om a micro-unmanned aerial vehicle. Motion tracking using low cost off-the-s helf components is also investigated in  where a fixed wing UAV is relayi ng data back to a ground station where the
11 processing takes place. The authors us e a proprietary vision system which according to their accounts Â“lost target track on a regular basis Â”. Tracking salient points is demonstrated in  using SIFT features and the RANSAC algorithm where the authors are reporting correct pr ojection in 55.4% to 82.5% of the frames while spending 0.63s to 1.93s per frame. Table 2.2: Summary of system characteristics and functionality. Institution Berkeley University Georgia Tech Univ. of South California COMETS*  WITAS+  CNRS~  Experimental setup Dynamic observer X X X X X X Dynamic environment X X Static / manmade environment X X X Known landmarks X X X Natural landmarks X Calibrated cameras X Capabilities 3D reconstruction / depth X X Object identification X X X X Object tracking X X X Methods used Optic flow X X X Motion estimation X X X X IMU data X Template matching X X X X *COMETS is a multi-national effort supported by the European Commission + Wallenberg laboratory for research on Informat ion Technology and Autonomous Systems (WITAS) ~ Centre National de la Recherc h Scientifique (CNRS) in France Another area where tracking a ground ta rget is important is that of autonomous landing. As shown in  t he helipad is usually marked by an Â“HÂ” which is extracted from the images using fixed threshold segmentation and
12 tracked during the landing maneuver by means of second and third order moments of inertia. 2.2 Remarks For completeness it must be stated again that there is a vast portion of the machine vision literature relating to the probl ems of object identification, tracking, motion estimation that cannot be presented here since this work was not intended to be a literature survey. A mo re detailed presentation of such techniques can be found in . Lastly, it should also be noted that t he problem to be addressed in this thesis can be classified as a dynamic vi sion problem with a moving camera and moving objects, arguably among the most general and difficult ones .
13 Chapter 3 Restating and addres sing the problem In this Chapter the challenges as sociated with detecting and tracking objects from an unmanned helicopter are described along with the proposed solution. Before doing that, it is necessary to review the definition of the overall problem this thesis is solving. In brief, it can be stated as follows: Design a vision system for a small autono mous helicopter that will be able to: Identify arbitrary objects usi ng a minimal description model and a-priori knowledge Track objects of interest Operate in real-time Operate in a largely unknown, dynamically changing, outdoors environment. The system will operate under the following constraints: Limited processing power and payload Low cost, off-the-shelf components. The main design directives remain that of real-time execution and low price, high availability components. It is in a sense an investigati on for the minimum required hardware and algorithmic complexity to accomplish the desired tasks.
14 3.1 Challenges of implementi ng a vision system for a VTOL Helicopters are attractive as unmanned vehicles due to their ability to take off from almost anywhere wit hout the requirement of a runway. Furthermore, their ability to hover makes them ideal for surveillance since they can keep the onboard sensors pointed towards the same area without having to execute elaborate maneuvers. The price to pay for that flexibility is their lower speed, limited endurance and inherent instability when compared to fixed wing aircraft. Small unmanned helicopters ar e even more unstable and susceptible to even modest changes in environmental conditions. This instability affects the images that any onboard camera would acquire and requires either the use of stabilization hardware which adds weight or stabilization software, which adds in turn complexity and demands more processing power. The unmanned aerial vehicles that were used for the purposes of this work are designed to operate outdoors. Such an env ironment is notoriously difficult for computer vision primarily due to variati ons in lighting. Furthermore, there is limited availability of a-pr iori knowledge of the environment and certainly no three dimensional map which leaves little room for helpful assumptions to be made. The low cost requirement and the lo w payload allow only for a single camera to be carried on-board. A second camera could have been utilized to allow for a stereoscopic system and provide additional data for verification purposes leading to a more robust detec tion. Its absence can be viewed as an additional design constraint for the vision algorithms.
15 3.2 Module overview Taking into consideration the afor ementioned design directives and constraints a system was implemented that alerts of the presence of objects of prescribed color. There were two indivi dual implementations of the design, with the second having a better more energy efficient processor and an analog camera with more powerful optics. The computer platforms selected we re two different variants of the x86 architectu re that fit the profile of low power consumption and off-the-shelf availability. Based around the 1.2 GHz EPIA and later the Pentium M processor, the system proc esses the data captured by either a firewire IEEE1394 digital camera or an analog color CCD camera connected to a framegrabber for analog to digital conversion. Both camera types are widely available, have a low cost and require little power to operate. The operating system of choice was Linux. More details regarding the hardw are can be found in Appendix A. A blockdiagram overview of the system is given in Figure 3.1. Briefly stated, the system acquires images from either a firewire camera or a framegrabber (see Appendix A), selects the regions of in terest on that image, conver ts them to an appropriate colorspace, selects the areas that meet certain criteria and finally makes a decision on whether to activate an ala rm indicating the presence of an object of interest. The overall object detection syst em consists of the following parts: Image acquisition Selection of region of interest (ROI) Color conversion
16 Application of thresholds Decision making. The following sections explain ho w each of the modules operates. Figure 3.1: Block diagram overvi ew of the object detection system. The image acquisition module reads the image from either a firewire port or a framegrabber. It provides access to the appropriate har dware via the open source libraries libdv, librwaw1394, libavc1394, libdc1394 for the firewire port or video for linux (v4l) for the case of the analog camera attached to the framegrabber. It also allows for the set up of various hardwar e parameters of the cameras such as white balance, resolution, frame rate etc. For our purposes the frame rate was set at the maximum a llowable by the har dware (30 fps). The region of interest (ROI) selecti on module is employed as a measure to reduce the overall computational load by reducing the number of pixels that are considered as possibly belongin g to the object that we ar e trying to identify. This selection is based on previous decisions of the system. If an area has caused the Image Ac q uisitio n ROI selection RGB to HSI Threshold a pp licatio n Decision making Alarm activation
17 decision making module to activate the alarm at frame t then that particular area is selected for the subsequent Delta_t frames while the rest of the image is not. Every Delta_t frames the image is selected in its entirety to allow for the introduction of new areas. It was experimentally determined that a good compromise between computational load reduction and the ability to incorporate new areas for classification occurred when the value of Delta_t was set to 15 The design requirements called for a minimal description model and limited a-priori knowledge about the object of interest. To satisfy that we relied on the color of the object as t he sole feature on which to ba se the classification. This decision was based on the assumption that if an object is sufficiently different than the environment then the color alone should be adequate to identify it. In their raw form the images are acquired in either YUV or RGB format. Although sufficient for displaying images these colors paces do not allow for a simple direct definition of color as it is perceived by humans. Furthermore, in the case of the RGB model the information about intensity and color is entangled in the triad of values for the Red, Green and Blue co mponents. The colorspace that was deemed appropriate for the task of providing a simple description of color closer to that of human perception was the Hue, Saturation and Intensity (HSI). Theoretically, the Hue and Saturation values should not be affected by variations in lighting. This is a very important property especially when operating in an uncontrolled environment subject to varying illumination. The conversion from the RGB to HSI color model is straightforward and described by the following
18 equations : B G R Intensity 3 1 Eq. 3.1 B G R Saturation , min 1 Eq. 3.2 G B G B Hue 360 Eq. 3.3 ) )( ( ) ( ) ( ) ( 2 1 cos2 1B G B R G R B R G R where R,G,B are the values for Red, Green and Blue component. After the conversion is complet ed the Hue Saturation and Intensity components are forwarded to the threshold application module which selects the pixel(s) that may belong to an object of in terest. The object for which the vision system is searching is defined by a se ries of upper and lower bounds on Hue, Saturation and Intensity. The image is scanned and pixels that fall within those bounds are selected as belonging to the object in question. From the segmented image we extract connected component s likely belonging to objects and bounding rectangles are drawn that enclose them. From the list of the extracted areas the ones that fall outside the bounds fo r size are discarded. Again the main implementation concern is simplicity. The decision making module is responsib le for the final decision regarding the presence of a target. It raises or lowers an alarm signal indicating that something of interest may be present in t he image. In detail, if the number of the
19 selected pixels exceeds a threshold, t hen the decision mechanism classifies the frame as containing an object and increases a counter by a const ant value. In the case that nothing is detected the sa me counter is decreased by a quantity relative to the exponential of its current value. When the value of the counter is greater than a certain thre shold then the alarm is raised. This can be viewed as a Â“leaky bucketÂ” that fills gradually every ti me a frame is found to contain an object of interest and drains rapidly when an obj ect is not present. The operation of the decision making module may be described in pseudo-code as given in Table 3.1. Table 3.1: Pseudo-code for raising the alarm. IF object==detected TH EN counter=counter + C1 Else counter=counter exp (counter / C2 ) IF counter>activation threshold THEN alarm=ON Else alarm=OFF. The constant C1 is related to the rate at which the counter is increased with each Â“detectionÂ”, while C2 controls the descent of the counterÂ’s value when an object is not present. By selecting thos e two constants it is possible to tune the behavior of the decision making mechanism in terms of its tendency to raise the alarm. For the same thresh old value a larger value for C1 will result in an easier activation of the alarm since t he counter will be increased by a larger amount. Similarly, a smaller value of C2 will lead to a steeper descent of the
20 counter when an object is not present resu lting in a faster deactivation. Typical values for C1 and C2 found after some experimentation, are 2 and 40, respectively. To avoid extremely high counter values that would make proper deactivation of the alarm almost impo ssible, an upper bound, typically 100, is introduced and the counter is not allowed to exceed that bound even if the object of interest is continuously present in the image. There is also a lower bound, usually 0 to 20, which the counter never goes below in order to avoid very low values that would prevent the alarm to be activated properly. In this way a scale from 0 to 100 is created for the values of counter with higher ones corresponding to a higher number of recent frames containing an object of interest. 3.3 Complexity As it is apparent during the design of the system, simplicity has been the primary constraint. This has resulted in an algorithm having to apply a threshold on the pixels of the im age, making it of order O(n2) where n is the dimension of a square image. The main computational burd en is posed by the conversion of the image into the HSI color space. More s pecifically, the calc ulation of the Hue component of the image includes a call to the inverse cosine and the square root function. With the incorporation of the region selection me chanism this number is drastically decreased which allows t he on-board vision system, despite having
21 less computational power than a gr ound based computer, to achieve a processing rate of 30 to 80 fps. The r egion selection algorithm consists of a series of iterations, each of which expan ds a bounding box around a given pixel. The computational time that it needs depends on the number and the size of the targets present as well as the size of the image. The worst case is again O(n2) where n the size of the nxn image. The common case though is to have a small number of regions, usually one or two. Since every region is not allowed to exceed a certain size, if it is to be consi dered a valid object, it can be said that the computational time is bounded by a c onstant. The Decision making module is a simple equality test in the on-board system and a little more complex leakybucket mechanism for the syst em with the processing unit on the ground. In both cases the delay is negligible and indepen dent of the size of the image. The evaluation of both systems in terms of comp lexity is summarized in Table 3.2. As it will be shown in the next chapter its simp licity allows real-t ime execution while exhibiting fairly robust detection. Table 3.2: Computational complexity of the modules used by the object identification system. System Configuration Modules Conversion to HSI Region Selection Decision Making On-Board system O(n 2 ) Constant Constant
22 3.4 Tracking system Besides being able to detect objects given a minimum description model, the system is also required to be able to tr ack them through time. It is designed in a way that allows the user to designate the object to be tracked during runtime. The main challenges include trajectory prediction as well as occlusion handling and motion segmentation. The demand fo r low computational cost, real-time execution, and a minimal object description model still applies. Although the helicopter is perceived to be stationary when hovering and attempting to track ground bound objects, th is is rarely the case. Given the relatively small size of both the Rapt or 90 and the Maxxi Joker 2 (see Appendix A for pictures), even slight variations in t he wind's direction or speed can result in an unstable flight profoundly influencing the quality of the ac quired images. This translates to relatively high disparities between corresponding objects in subsequent image frames. Furthermore, it makes tracking objects close to the boundaries of the image almost impo ssible because they may appear and disappear erratically due to the relative motion of the camera with respect to them. Occlusions present a significant challenge when attempting to track a specific object in a dynamically changing environment. Objects, be them robots, cars or anything for that matter, ar e expected to move almost randomly and therefore occlude each other. The backgro und environment although static can contribute to this problem whenever it in cludes obstacles comparable in size with
23 the objects of interest. Tr ee lines for example are such typical obstructions. Also, since the camera is mounted on a moving pla tform it is possibl e for any object to become occluded even by remaining stationary. As one might expect, occlusions are greatly r educed in frequency when the optic al axis of the camera is perpendicular to the terrain. However hovering directly above the area of interest may not always be feasible or desirable due to safety concerns. Motion or background segmentation is another challenge due to the nature of the environment that the unmanned vehicle operates in. Typical background extraction techniques such as frame di fferencing or Kalman filtering do not cope well with rapidly varying scenes. In particu lar, frame differencing degenerates to a crude edge detector when applied to a sequence of images acquired by a moving camera. On the other hand, motion estimation algorithms like the ones used in the MPEG standard that were also consi dered, although relatively accurate, found to be highly demanding in terms of processing power. However with dedicated hardware that accelerates MPEG encoding this could be a viable choice for motion estimation. 3.4.1 Tracking system module overview Having made a review of the typica l problems related to object tracking within the context of small unmanned VTOL s, we now describe the operation of our system along with the modules that constitute it. At first the object to be tracked has to be specified. This info rmation can either come from the object
24 detection system that is automatically search ing for a pre-specified object or from a user who activates the tracking system by manually selecting an object in the image. The target selecti on module then creates a descr iption of the selected object and forwards it to the matching module. The la tter will attempt to find a corresponding template in the subsequent image. Once such a match is found the original template that describes the object is updated with information from the most currently acquired frame. Shoul d a possible occlusion be detected, the Occlusion handler resolves it by means of motion prediction. Fi nally the Pan-Tilt controller signals the Pan-Tilt mechanism to move accordingly so that the tracked object is at the center of the image. Briefly st ated the tracking system is comprised of: The target selection module The matching module The template update module The Pan-Tilt controller. A block diagram showing the interc onnections between the aforementioned modules can be seen in Figure 3.2. The operation of each of the modules is described in the following paragraphs as well as depicted in Figure 3.3 below. 3.4.2 Target selection module This component is responsible for re ceiving the user's or the object detection system's input and creating a descrip tion for the object to be tracked. In
25 both cases the input to this module is a series of coordinates ( x, y ) representing the object's position in t he image's column and row coordinate system as seen in Figure 3.4. The design choice has been made to employ a Nt x Nt area taken around the image point ( x, y ) as a template for matching into subsequent image frames. 3.4.3 Matching This module attempts to find a match for the template within the current frame. To reduce the search space for th is match the search is limited within a ( Nsearch x Nsearch) area around the latest previously known position. Assuming continuity of motion for the object under tracking, it is r easonable to expect that it will appear in the next frame relatively close to its current position. Figure 3.2: A block diagra m of the tracking system. Target selectio n Matching Template update Pan-Tilt controller User Object detection module Tracking s y stem To servo motors
26 Figure 3.3: Operation of the tracking system at a given time k The similarity measure for the match is a normalized correlation coefficient between the template and all the Nt x Nt square sub-images within the search space. The correlation coefficient r ( dx ,dy) between the template and the region that is displaced by dx, dy from the origin of the sear ch space is given by : ) ( ) ( ) (2 2 k y x k k y x k y ximage d y d x image template y x template image d y d x image template y x template d d rEq. 3.4 The output of this module is the row and column position for which the aforementioned similarity measure is maximized: Frame ( k -1) Frame ( k ) Select Tar g et User input Object coordinates [x( k -1), y( k -1)] Template matchin g Update Tem p late Object coordinates [x( k ), y( k )]
27 )) ( ( max arg, y x dy dxd d r. Eq. 3.5 Alternatively, the sum of absolute differences: y x k y xd y d x image y x template d d SAD ) ( ) ( Eq. 3.6 can be used with similar results. In that case the row and column position that minimize the similarity measure become the output of this module: )) ( ( min arg, y x dy dxd d SAD. Eq. 3.7 Obviously a trade-off exists when selecting the size of the search space. A large value for Nsearch will allow for a larger disparity bet ween the positions of the object in subsequent frames. The penalty for a la rger search space is obviously the extra computational cost whic h increases with the square of Nsearch On the other hand, decreasing the search space may save some computing time but it entails the possibility of not finding a proper ma tch just because the object moved more than Nsearch/ 2 pixels in any direction. A good compromise was achieved by making a selection based on the maximum expected apparent motion Max_disparity For our applications we selected: Nsearch= 2 *Max_disparity = W /10, where W is the width of the captured image. Our implied assumption is that the apparent motion of the observed object will not be exhibiting inter-frame displacements of more than W /20 pixels. Figure 3.4 sh ows the relation between the sliding template, the im age and the search space.
28 3.4.4 Template update module The typical weakness of a tracking syst em based on template matching is the fact that with time the template may become irrelevant. Objects are moving and their pose with respect to the camera is almost constantly changing. As a result, the projected tw o dimensional image of any given object differs significantly within the time span of a few seconds making any attempt for a match with the original temp late almost impossible. To mitigate this effect the templa te is updated at every cycle of the algorithmÂ’s execution. Every new captured image, within which a match was found, contributes to the template by intr oducing information to it thereby forming a new one. The new template is a linear combination of the existing one and the neighborhood of the best ma tch. In most cases the linear combination of the current template and the best matching pat ch is sufficient to maintain the relevancy of the template without in curring a significant processing power penalty. If Template ( k ) is the template at time k and Match ( k ) the Nt x Nt neighborhood around the coordinate s of the best match then: Template ( k+ 1) = a Match ( k ) + (1a ) Template ( k ), Eq. 3.8 where a [0,1]. The design decision to be made when ca lculating this new template is about the amount of new information that will be incorporated versus the amount that will be retained from the Â“oldÂ” template Apparently there is a trade-off. If one chooses to retain more of the older te mplate the result will be a slower changing
29 template unable to accommodate pose vari ations that happen within the time span of a few frames. Figure 3.4: The template matching process. This, however, will make the template impervious to short duration random illumination variations as well as to t he occasional mismatch. On the other hand if the choice is made to aggressively update the template with new information then it has a better chance of remaini ng relevant and being able to cope even with objects whose pose and appearance vary r apidly. The caveat in this case is that the template becomes susceptible to noise and to the fact that even a single mismatch by the matching module can eas ily throw-off the tracking system by forcing it to follow the new mismatc hed object. The balance between the new and old information is controlled by the constant a Lower values of a place more weight on the old template rather than the newly acqu ired image and higher Search space dx dy template Nsearch W Image(k) x y
30 values of course have the opposite effec t. After some experimentation the value a =0.025 was chosen as the one that yi elded the best compromise between robustness and adaptation. 220.127.116.11 Evaluating the template update module To further investigate the way the values of a influence the output of the tracking system, a series of experiments were conducted. Each time, the system was instructed to track t he same object through the sequence while the value of a remained constant. This was repeated for values of a ranging from 0.005 to 0.995 with a step of 0.01. The sequence se lected for that purpose is one that contains a car executing a u-turn maneuv er. The choice was made because that particular video excerpt contains an object that changes its pose relatively fast allowing the opportunity to va lidate the effectiveness of the way the template is updated to accommodate the changing appear ance of the target. However, the template must also retain some of t he past information to ensure the identity of the target. During the experiment, it was verified that for low values such as 0.005 the template does not adapt fast enough to maintain the track. Figure 3.5 shows exactly that. Conversely, values above 0.155 forced out of the template enough past information so that the tra cking system abandoned t he initial target as shown in Figure 3.6. Finally, the value of a for which the system exhibited the desired behavior was 0.025. As shown in Figure 3.7 the vehi cle is consistently tracked through the
31 sequence. Having established this trajectory as the ground trut h, the root mean square (RMS) error was ca lculated for the ones produc ed by each of the 100 different values of a The results are shown in Figure 3.8. Isolating some characteristic values of a and plotting the error for t hem yields Figure 3.9. One can notice the sharp increases that corre spond to the time that the tracking failed. Figure 3.5: Tracking results for a =0.005.
32 Figure 3.6: Tracking results for a =0.155. 3.4.5 Pan-Tilt controller It is usually desirable, if not required, that the tracked object remains in the cameraÂ’s field of view (FOV). This task is accomplished by the Pan-Tilt controller, which as the name implies, sends control signals to the servos that adjust the pan and tilt angles of the camera. The goal is to keep the tracked object approximately at the center of the captured images. For that reason an error vector is calculated between the center of the image and the image point where the object is located. The vertical and hor izontal components of this error vector
33 are then used to adjust the pan and tilt angles so that the error is minimized. The control rules for the Pan-Tilt are shown in Table 3.3. To avoid oscillation and constant corrections the object is k ept within a NxN window centered around the center of the image. Figure 3.7: Tracking results for a =0.025.
34 Table 3.3: Control rules for the Pan-Tilt. If errorx<-N then pan left by 5 degrees If errorx>N then pan right by 5 degrees If errory<-N then tilt up by 5 degrees If errory>N then tilt down by 5 degrees Figure 3.8: Plot showing the RMS error for each value of a
35 Figure 3.9: Plot showing the RMS error for some c haracteristic values of a.
36 Chapter 4 System implementation and performance evaluation 4.1 Applications After the development of the system we had the opportunity to evaluate its performance in an array of applications. T he cases that were chosen for that purpose were: Detection of semi-concealed objects such as mines Detection of a group of ground robots Traffic monitoring. The design was also evaluated on some sequences of images that were slightly out of the scope for which it was origin ally intended such as infra-red video and car racing footage. As will be shown in the following paragraphs our design coped with all the chall enges presented to it wh ile requiring only minor application specific adjus tments and modifications. 4.2 Application 1Mine detection The first application, for which we ev aluated the suitability of our design, is based on a hypothetical scenario in whic h an unmanned VTOL is scouting an area trying to identify semi -buried objects on the ground. The vehicle flew in an unknown outdoors environment with severa l artificial landmarks that needed be
37 identified. For the purpose of our exper iment, various hemispherical and almost spherical black objects were randomly pl aced, some half-buried in the ground, imitating mine types. The choice of bla ck was made to test the limits of the threshold application module. The la tter being based on HSI has a distinct disadvantage when evaluating pixels with low color content. Furthermore, black is, theoretically, a singularity for t he Hue component. Figure s 4.1, 4.2 and 4.3 show input images, objects of interest, c aptured by the camera while the vehicle was flying, as well as the resulting system output. In Figures 4.1, 4.2 and 4.3, pixels belonging to the object are paint ed white, while all others are black. Shadows and other disturbances had li ttle effect on the overall system performance, since only few frames were misclassified as containing objects of interest. Therefore, the final output of the decision making module was not affected. The angle at which the camera was directed towards the ground had no effect on the system as illustrated in Figur e 4.2 which shows a view of the object from directly above. Figure 4.4 dem onstrates a case where the system erroneously classified some image pixels as belonging to the object of interest. The object of interest was intentionally placed in the shadow of a tree in an attempt to test the systemÂ’s sensitivity to changes in lighting conditions. This can be remedied by the introduction of a module in the algorithm that will compensate for the various lighting conditions. In Figure 4.5 another test under different lighting conditions is demonstrated. The minimum bounding rectangle containing the object is superimposed over the actual input image.
38 The overall performance was deemed satisfactory with a correct identification rate that ranged betw een 85% and 90%. False positives were present especially when illum ination approached either extreme. In very dark images differentiating a black object from it s environment is difficult since pixels not belonging to the object will register as such since their color information is almost lost. In cases were brightness wa s at or close to maximum, the washout of the colored image caused the algorithm to either miss the object or wrongly identify some other area of the image as being the object of interest. The behavior of the system regarding alarm activation was tuned in a way that maximizes activation sensitivity at the cost of false alarms. During the test it was observed that the helicopter would sometime s fly in an erratic pattern due to the prevailing weather conditions. As a result there were cases where the object would come into the camera's field of view and then exit again within a few tens of milliseconds. To compensate for this variability the alarm deactivation was intentionally delayed. This was pe rformed by adjusting the constants C1 and C2 of the decision making module. It a llowed the system to achieve a high identification rate while eliminating t he effect of oscillating alarm signals. It is also important to notice that the operation was always in real-time (30 frames per second). To further evaluate the systemÂ’s performance a seri es of flights were conducted. During this experiment the Unmanned Aerial System was tasked to identify and alert for the pres ence of an object of interest. The latter being a dark
39 spherical object arbitrarily placed in t he field on which ground vehicles roamed in a way unbeknownst to the VTOL. Figure 4.1: Input image and resulting res ponse by the system. Pixels belonging to the object of interest are highlighted in the second image. In addition, the object was moved ar ound randomly between flights to ensure that the detection rate was not the result of a particular placement with respect to sunlight and background. A tota l of five flights were attempted in one of which the VTOL was flown over an ar ea that did not contain the object in question (see Table 4.3). The system succeeded in identifying the target in all other four attempts. Sample instances of its operati on are shown in Figure 4.6a) through d) where the correctly identified object of interest is enclosed in a red minimum bounding rectangle.
40 Figure 4.2: Another input-out put pair of images. Notice that some pixels on the top of the image have been misclassified as belonging to the object of interest. Figure 4.3: Another set of input im age and resulting response by the system. Pixels belonging to the object of intere st are highlighted in the second image.
41 Figure. 4.4: Images that show how t he system performs in different lighting conditions. Some pixels are miscla ssified due to the presence of shadow. Figure. 4.5: Test image under different lighting conditions. A red minimum bounding rectangle superimposed over the area that the algorithm detected the object. Figure 4.6a) and b) show images that were taken at a different time of the day than Figure 4.6c) and d) as evidenced by the length of the shadow the object casts.These samples demonstrate the syst emÂ’s ability to successfully operate even when a significant component of lighting conditions, such as the direction of
42 the main source of illumination, varies. Another property illustrated by the quartet of images in Figure 4.6, is that of sca le invariance to some degree. Although the aerial vehicle did not maintain a steady al titude through the course of each flight, the object was identified successf ully both in instances that it was relatively close to the camera such as in Figure 4.6b) and in those in which it was further away such as in Figure 4.6a), c) and d). a) b) c) d) Figure 4.6: Correct identification of the object of interest under different conditions.
43 As expected the vast majority of the processed frames did not contain the object of interest and were correctly reje cted. Such a frame is shown in Figure 4.7 where simply the background is seen blurred by a stream of exhaust fumes emanating form the VTOLÂ’s internal co mbustion engine. Unfortunately, there were some cases in which the object wa s missed as shown in Figure 4.8. False alarms were not entirely avoided as seen in Figures 4.9 through 4.11. Figure 4.7: A correctly rejected im age showing just the background. Figure 4.8: The objec t is not detected.
44 In Figure 4.9 the false alarm shown is produced by the exhaust fumes of the helicopter. When sufficiently concentrat ed they registered on the camera with a bluish hue that was close to that of the object of interest. Some random background formations when clustered toget her also spurred false alarms such as the one shown in Figure 4.10. Figure 4.9: False alarms caused by t he presence of the helicopterÂ’s exhaust fumes. Figure 4.10: False alarms caused by random background formations.
45 Objects that had components sharing co lor characteristics with the object of interest also resulted in false ala rms to be raised. A ty pical example was one of the ground robots whic h had a blue body and black wheels. Under certain lighting conditions part of the body was darkened enough to be close to the hue value of the object of interest. Figure 4.11: False alarm caused by an obj ect with similar hue to the object of interest. The typical hit, miss, fa lse alarm and correct rejection rates are presented for each attempt in Tables 4.1, through 4.5. Lastly, the results from all five flights were aggregated to produce the data disp layed in Table 4.6. The overall performance of the system can be characte rized as adequate since it exhibits an average detection rate of 89.60% while producing an acceptably low false alarm rate of 6.40%.
46 Table 4.1: Performance ev aluation for flight 1. FLIGHT 1 Object present Object absent Alarm ON 88.24% 16.23% Alarm OFF 11.76% 83.77% Table 4.2: Performance ev aluation for flight 2. FLIGHT 2 Object present Object absent Alarm ON 89.66% 1.79% Alarm OFF 10.34% 98.21% Table 4.3: Performance ev aluation for flight 3. FLIGHT 3 Object present Object absent Alarm ON N/A 2.22% Alarm OFF N/A 97.78% 4.3 Application 2Tracking a team of robots on the ground A trend with rising popularity among oper ators of unmanned vehicles is to use them in teams. Usually there is team of ground robots assigned to perform some task while an aerial vehicle ins pects the environment well beyond the
47 range of the sensing capabilit ies of its ground bound teammates. When operating in such a scenario the UAV should be able to verify the presence of each of the ground robots. To explore the ability of the system to i dentify multiple targets of different color we arranged a team of four (4) robots an d placed them in a typical outdoors environment. The VTOL flew alon g with the team of the robots and attempted to hover over them. Table 4.4: Performance ev aluation for flight 4. FLIGHT 4 Object present Object absent Alarm ON 100.0% 11.34% Alarm OFF 0.00% 86.66% Table 4.5: Performance ev aluation for flight 5. FLIGHT 5 Object present Object absent Alarm ON 77.78% 0.55% Alarm OFF 22.22% 99.45% The images in Figures 4.12 to 4.15 show the results of the identification process where each ground vehicle is enc losed in a bounding rectangle that matches its color. What may not be easily conveyed by the still images is the fact
48 that the helicopter's altitude was fluctuat ing significantly, constantly changing the point of view of the camera. Despit e these abrupt pose changes, the algorithm was able to detect all f our of the ground robots. Table 4.6: Performance eval uation for all flights. Aggregate of All Flights Object present Object absent Alarm ON 88.60% 6.40% Alarm OFF 11.40% 93.60% Figure 4.12 Identification of a team of four UGVÂ’ s at a close distance.
49 Figure 4.13 Identification of a team of four UGVÂ’ s at a medium distance. Figure 4.14 Identification of a team of four UGVÂ’s w hen the image is subject to blur.
50 Figure 4.15 Identification of a team where only thr ee out of four UGVÂ’s are visible. 4.4 Application 3Traffic monitoring 4.4.1 Introduction A very interesting application for an unmanned VTOL is that of traffic monitoring. The VTOL is a surprisingly suit able platform since it can hover over a particular traffic node for varying periods of time depending on configuration. A medium sized autonomous helicopter such as the Bergen Observer can hover for a period of 45 minutes and up to more than an hour and a half if equipped with extra fuel tanks. However the electric powered Maxxi Joker 2 can only provide a 12 to 15 minute hover, although this is expected to increase as new, higher density battery technology becomes availa ble. Furthermore, a traffic monitoring system based on autonomous or semi-aut onomous VTOLs can be deployed very
51 rapidly in areas that provide little or no infrastructure. Once deployed, it can provide real-time data to operators on the ground or to traffic simulation models which can then provide more accurate pr edictions for traffic parameters based on more current observations. The operator of the system can also direct the system to track a certain vehicle by manually se lecting it with a point ing device (e.g. a mouse). Additionally, such a system c an serve as an ad-hoc replacement for already existing traffic monito ring infrastructure in case the latter has failed or has been accidentally destroyed. In emergencie s such as a hurricane evacuation or a large scale automotive accident an aut onomous VTOL deployed from a first responderÂ’s vehicle can provide helpful information. Briefly stated, the autonomous VTOL pr ovides the capability of a traffic monitoring system without the requirement of extensive infrastr ucture. It has two modes of operation, one aut omatically extracting tra ffic data and another tracking manually selected vehicles. 4.4.2 Description of the system As stated above one of the objectives of the system is to extract meaningful real-time data from the video stream captured by the onboard camera on the VTOL. Such data include the total number of vehicles that pass through a given part of the road network the number of vehicles that follow a certain path and the overall traffic flow. This task c an be separated into 3 distinct steps. Initially the image areas t hat correspond to vehicles have to be differentiated
52 from the environment. Second ly, the extracted vehicles must be consistently tracked as they traverse the portion of the road network that is being examined. Lastly, the result of the tracking proc edure is converted to meaningful traffic measures. A block diagram showing the succession of these steps is shown in Figure 4.16. The following par agraphs provide a more detailed description of the whole process. The first step towards extracting the desir ed traffic data invo lves identifying the vehicles. In keeping with the goals of th is work, the vehicle extraction process must be simple, computationally cheap and requiring minimal a-priori knowledge. The selected method takes advantage of th e fact that the areas of the image corresponding to a paved road usually have extremely low saturation values. A simple sufficiently low threshold is then applied to suppress the part of the background that is the road. Following that, a pair of erode/dilate morphological operators are used to further distinguish the blobs that correspond to vehicles. A size filter is employed to avoid some residual areas of the background being categorized as vehicles. In particular, for any formation of pixels to be accepted as a vehicle it has to fit wit hin a bounding rectangle no smaller than W /10 x H /10 and no larger than W /3 x H /3, where W and H are the width and height of the image respectively. Its effectiveness is based on the assumption supported by the observation that areas smaller than W /10 x H /10 usually correspond to noise whereas areas with dimensions larger than W /3 x H /3 are usually representative of various background formations. The app lication of such a filter enforces a
53 measure of scale on the problem albeit a reasonable one. Figures 4.17 to 4.20 depict the image processing steps leading from the input image to the extracted vehicles. Figure 4.16: A block diagram representation of the dat a extraction mode of the traffic monitoring system. In succession, a set trackk-1 is constructed containi ng the center of gravity of all the regions that pass the size filter at a given time k -1. This set containing pairs of (i,j) image coordinates is pr esented as input to the tracking module described in Chapter 3. The output of the tracking module forms a set trackk that holds the respective matching c oordinates at the current frame k It is worth mentioning that time in this case has been discretized to coincide with the Image Thresholding to suppress the background Vehicle tracking Traffic data Vehicle extractio n Connected Components Application of Structural operators to extract the vehicles
54 acquisition rate of the camera. So frame k and time k are interchangeable. Figure 4.17: The RGB input im age in the left is converted to HSI. The saturation component is shown in the right. Figure 4.18: The application of a threshol d on the saturation component (left) eliminates most of the pixels belonging to the road (right).
55 Figure 4.19: Applying mor phological operators to the binary image (upper left). Dilation (upper right) then erosion (lower center). Figure 4.20: Extraction of regions using connected co mponents and a size filter. Notice the rejection of very large and very small blobs.
56 Having completed the vehicle extracti on and tracking the remaining task is to calculate the traffic par ameters of choice. The numbe r of vehicles currently present on the road is simply equal to number of objects represent in the tracking set. In other words the cardinality of trackk gives the number of current vehicles. The amount of vehicles over any given per iod of time can be found by integrating the function of vehicles over time. Plo tting the function of vehicles over time, results in the graph shown in Figure 4.21. The red line represents the number of vehicles that the system estimates are curre ntly in view, the blue line is the actual number of vehicles pres ent and the green line shows the execution time in milliseconds. On average the system es timated the current traffic with an accuracy of 81%. For comparison, traffi c monitoring systems that are utilizing fixed cameras  have reported higher rates of 95% to 97%. They benefit from the more consistent background and motion extraction since the sensors remain static and all apparent motion in the image can only be attributed to vehicles. Note that the execution time remain s well below 33ms which signifies realtime operation. The occasional spikes correspond to write cycles during which the operating system logged the output of th e vision program to the disk. They are not due to executing the core visi on code and are not present during normal operation. The computer used in this case was a Pentium 4 @ 3GHz. The system can also function in another mode of operation that relies on user input for target designation. It allo ws a certain number of manually selected vehicles to be tracked. The operator has only to point and click on the vehicles to
57 be tracked. The image coordinates are gathered and presented as input to the tracking system described in Chapter 3. The block diagram in Figure 4.22 describes this process. Figure 4.21. The output of t he traffic load estimator compared to the ground truth. Results of this mode of operation can be seen in Figure 4.23. Figure 4.24 shows some targets being abandoned by the syst em as they exit the field of view in order to make the resources of me mory and processing power available for reallocation to tracking newly acquir ed targets. Figure 4.25 shows that the tracking can be maintained despite the unpredictable motion of the VTOL and the
58 parallax inducing motion of the vehicles. Further resilience to parallax is demonstrated in Figures 4.26 and 4.27 wher e vehicles are shown to be tracked while executing turns and u-turns that change their pose with respect to the camera. Figure 4.22: A block diagram representat ion of the manual target designation mode of the traffic monitoring system. 4.5 Additional testing To further examine its capabilities and limitations, the proposed design was also tested on image sequences that we re either not representative of the applications for which it was intended or outside the operational envelope as delimited by the scale of objects it can track and the disturbance for which it can compensate. Two image sequences were sele cted for that purpose. The first is Image Operator input image coordinates (i,j) Vehicle tracking Vehicle trajectory
59 one taken by an infra-red aerial camera and the other a video clip of a car racing event. Figure 4.23: Tracking of multip le manually selected vehicles.
60 Figure 4.24: Tracking is terminated when tar gets, such as the tw o vehicles at the right side of the frame, exit the field of view. Figure 4.25: Tracking is maintained des pite the unpredictable motion of the VTOL and the parallax inducing motion of the vehicles.
61 Figure 4.25 (Continued).
62 Figure 4.26: Tracking of vehicles exec uting maneuvers such as turns and uturns. The infra-red video depicts some buildi ngs and static fixtures as well as some moving vehicles. Tracking the latte r proved especially challenging because the apparent size of the movi ng objects is near the limit s of the scale for which the system was designed. In other word s the moving targets are too small to provide the template with enough informa tion. Furthermore, there are some instances were the movement of the ca mera becomes so sudden and abrupt that the image itself is blurred for several frames (see Figure 4.28) resulting in an
63 unrecoverable loss of track as seen in Figure 4.29. Figure 4.30 shows the tracking of several static, background objec ts while Figure 4.31 depicts the result of the vehicle tracking. Figure 4.27: Tracking of a vehicle making a u-turn. The car racing sequence shows some race cars as they approach the camera mounted at approximately the same level as their height, then the view switches to a camera overlooking the tr ack partially occluded by a safety fence and finally concludes in a frontal view of the racers. In the first view the apparent
64 size of the objects increases rapidly w hereas in the second and third the change in scale is more gradual. The system was able to track the selected race cars consistently through the scale variation as well as when they became partially occluded by the safety fence as shown in Figures 4.32 and 4.33. Figure 4.28: Images blurred by the abrupt motion of the carrying platform. Upper left image is unaffected by blur for comparison. However, the track was expectedly lost when the vehicles approached the camera too close and at a presumably very high speed resulting in them appearing blurry and covering most of the image as shown in Figure 4.34. The
65 system also lost track of vehicles that became heavily occluded for a significant amount of time as depicted in Figure 4. 35. Notice that despite the eventual tracking error, the estimate for the pos ition of the blue racer, denoted by the rectangle, remains in the general area of the occl uded vehicle. Figure 4.29: Severely abrupt motion resu lts in image blur and loss of track.
66 Figure 4.30: Tracking of static objects in an IR sequence. Figure 4.31: Tracking vehicles in an IR sequence.
67 Figure 4.31 (Continued). Figure 4.32: Tracking of race cars as they approach the camera.
68 Figure 4.33: Tracking racers as they bec ome partially occluded by the safety fence. Parallax and scale va riation are also present. Figure 4.34: Severe motion blur results to loss of track.
69 Figure 4.34 (continued). Figure 4.35: Tracking error due to occlusion.
70 Chapter 5 Conclusion 5.1 Introduction This thesis presented a system for obj ect identification and tracking to be used by unmanned aerial systems. The objec t detection part is based on image thresholding and exploits the inherent theoretical invariability to lighting conditions of HSIÂ’s Hue component. T he final decision to raise the alarm signifying the presence of an object of in terest is made after the latter has been detected in sufficiently many recent frames. Also in this work, a tracking system has been presented. It has been based on template matching using the sum of absolute differences as a similarity measure. A template update occurs in every loop of the algorithm by incorporating new information so that the former rema ins relevant allowing the track to continue despite changes in the objectÂ’s appearance. Furthermore it allows for manual entry of targets by the human operator as well as receiving input from other image processing m odules. The following paragraphs further discuss the results and contributions of this thesis as well as provide some final remarks.
71 5.2 Discussion of results The proposed design was evaluated in several different scenarios and application contexts incl uding detection of semi-conc ealed objects, robot teams and traffic monitoring. It performed adequatel y in all of them exhibiting a high 89.60% detection rate, for the object ident ification missions, while suffering an acceptable 6.40% false alarm rate. Although color has proved a fairly good descriptor for objects, it may not be sufficient and has to be supplemented by f eatures such as shape and texture. Most of the false alarms were due to objec ts of similar hue to that of the object of interest while most missed detections can be attributed to specular reflections and shadows. In an outdoors environment re flections are beyond the control of the designer of the system and can occur at any time given the fact that the observer (the camera carrying flying pla tform) moves with respect to the main lighting source (the sun). When enough of the objectÂ’s surface appears to be covered by these specular reflections it becomes impossible to detect because the color information is almost lost fo r that area. Lastly, another parameter affecting the performance is the scale within which the system operates. As the VTOLÂ’s altitude varies during the flight the apparent size of the observed objects can change significantly. Since the des ign relies on an assumption about the apparent size of objects, misses and fa lse alarms may occur if the object appears smaller than the lower or larger than the upper limit for size. When fixed optics are used, the scale is closely a ssociated with the physical size of the
72 object and the altitude at which the VTOL flies. Variable optics can provide some independence from those but again only within the mechani cal limits of the zoom mechanism. The tracking system developed in this work was able to correctly track on average 81% of the visible vehicles when assigned to the task of traffic monitoring. It was shown capable of copi ng with significant di sturbance, resulting from the nature of the carrying platform and the unstabilized camera, as well as with occasional brief occlusion and c hange of pose of the tracked object. However, there are specific situations t hat are beyond the ability of the system to compensate for. A severe disturbance that results in an apparent motion of more than W /20 between successive frames, where W is the width of the frame, will also result in a lost track. For an image acquisition rate of 30 fps, this apparent motion amounts to 1.5 W pixels/sec which is arguably a high enough limit. Reversely, assuming that the camera is perfectly stable, only objects able of covering the largest visible dimension in 2/3 of a second will avoid being tracked. Another case where the track can be lo st is when an occlusion occurs that covers more than 50% of the template corresponding to the object being tracked. In such a circumstance, within a few repet itions of the temp late update process the template will lose enough informati on to make finding the proper match problematic.
73 5.3 Discussion of contributions This work contributes to the ar ea of vision systems for unmanned aerial systems by proposing a monocular, unca librated, not gyro-stabilized design capable of identifying and tra cking multiple objects in re al-time, without strict assumptions about their trajectories, by relying on minimal information about the environment while forgoing the need for motion compensation usually requiring an inertial measurement unit (IMU). By selecting a monocular, non stabilized system the design avoids the extra cost and weight of a binocular configuration along with that of a gyro-stabilized turret. The fact that th e camera does not require calibration for the system to operat e, further simplifies the assembly and setup process. The ability to track multiple objects at a low computational cost allows the design to run in real-time on inexpens ive, less powerful, power efficient computing platforms. This is especially critical for relatively small unmanned with limited payload capabilities for which carry ing a large energy reserve to support a powerful computer is impossible. Althoug h the prevailing trends in computer design and battery chemistry promise to a lleviate this problem by providing more efficient, powerful hardware and higher energy density batteries respectively, the desire for the minimalistÂ’s approach in the design of a vision system will most likely continue to be relevant. This is especially true for the class of unmanned aerial systems that are even smaller than the Maxxi Joker 2 platform that was used in this work.
74 By keeping the requirement of kno wledge about the environment to a minimum, the design is not explicitly required to perform elaborate background extraction thus further reduc ing the computational burden. Only the apparent size and color of the object of inte rest are required for the i dentification process. No apriori known landmarks are used to fa cilitate detection and tracking and minimal assumptions, such as the one used in the traffic monitoring scenario about the tarmac being gray, are made. This allows for a fairly versatile system capable of performing in different environments with only a few, if any, modifications. 5.4 Final conclusion In summary, this work produced a design suitable for object identification and tracking tasks in real-time. It demons trated that it is possible to endow relatively small, inexpensive VTOLs wit h capabilities usually reserved for more expensive, higher end unmanned systems.
75 Chapter 6 Future research 6.1 Introduction In this Chapter the discussion revolves around the possible future research related to the work presented so far. Several methods exist that require investigation as to their ability to provide better solutions to the central problem of this thesis. Some of them focus on improving the indivi dual modules of the existing system while others require ma jor revisions of the approach described here. In the following paragraphs a sele ction of the potentially beneficial modifications is presented. They have been chosen for their potential to improve the performance of the system while adh ering to the real -time operation mandate. 6.2 Adaptive thresholds and constants The current system relies on a series of manually selected thresholds and constants in order to perform the tasks of recognition and tr acking. One logical approach to enhance its operation is to have these values being selected automatically by the algorithm itself. Im plementing an efficient algorithm capable of reliably varying the thresholds in t he HSI space that are associated with the object of interest will allow for oper ation under extremely varied lighting
76 conditions. Several methods already ex ist that are based on local or global statistics of the intensity values in t he image. Furthermore, t he information from the GPS sensor that a typica l UAS carries can be used to estimate the position of the sun and therefore the di rection of the illumination. This will permit the disambiguation between the object of in terest and its shadow as well as the shadows cast by other objects present in the scene. 6.3 Active vision The availability of relatively in expensive, lightweight cameras with electronically controlled optics offers the opportunity to fu rther explore the problem of target identific ation and tracking from a fl ying platform which already falls firmly in the area of active vision. Varying the focal length in a known m anner can facilitate the calculation of the essential or fundamental matrix for each camera. This will allow for better reconstruction of the three dimensional space leading to improved reliability for the trajectory predictions made by the system. Furthermore, varying optics can be employed by an attention focusing mech anism to extract mo re information in cases where disambiguation between obj ects is required. An intelligent mechanism of that sort could zoom in to areas that present some difficulty such as between objects that ar e too close or even when on e partially occludes the other. Alternatively the camera can be inst ructed to zoom out in order to acquire a wider area and provide information about some global reference points or
77 landmarks. Generally it can provide t he object identificat ion algorithm with information at different levels of scale This zooming capability can also be of significant help to the human operator who c an employ it to further investigate an object of interest without interferin g with the flight of the VTOL. 6.4 Consistent target selection As previously mentioned the oper ator of the unmanned system can manually select objects for tracking. To accomplish that the images captured by the camera on-board the flying platform must be transmitted to a ground station. As with any transmission there is some delay involved. Care should be taken ensuring that the delays caused by the co mmunications channel are such that at any given time the same frame is ava ilable to both the hum an operator on the ground and to the algorithm r unning on the VTOL. Preferably, the time to transmit a frame to the ground station and have it displayed for the operator to see should not be more than the time elapsed bet ween two consecutive frame acquisition cycles of the camera. Typically, this amounts to 1/30 seconds. Exceeding that can cause the two image sequences to be out of synchronization. As a result, the image coordinates chosen by the operator may not correspond to the intended object in the frame that t he algorithm on the VTOL is currently examining. A mechanism should be put in place that detects this event and selects the appropriate object for tracking.
78 Time stamping the captured frames prov ides a straightforward solution to the problem. Every frame is given a monotonically varying number that determines its relative position in the stream and can be us ed to identify the frame in which the operator designated the intended object for tracking. However, time stamping the captur ed frames is not always possible. Sometimes the video is transmitted to the ground through a channel that bypasses the computer on board the VTOL. In this case, it is safer to assume that there is always a discrepancy betw een the image viewed by the operator and the one presented to the algorithm The problem now becomes one of finding an area in a buffer of recently captured images t hat matches the one selected by the operator. 6.5 Multiple UAVs Employing a team of unmanned aeri al systems instead of just one can improve overall effectiveness in several di fferent ways. Firstly, using several UAS can extent the operational time of the sys tem simply by using one flying platform at a time to cover the sa me area. Even this straightforward approach presents some planning problems. It requires a decision making mechanism to direct the lift-off and land cycles of the helicopters. Secondly, a team of VTOLs can cover a far greater area than a single unit could. Depending on their arrangement it is possible to have overlapping areas of coverage for purposes of redundancy. Having a specific area covered by mult iple cameras can also help resolve some
79 occlusion problems for the simple r eason that an object may be occluded when viewed from a specific point in 3D s pace but visible from another. In general, the deployment of the aerial team can be seen as seeking a 3D c onfiguration that optimizes any set of parameters such as area coverage, occlusion resolution and endurance under possible constrains of no fly areas imposed by the presence of immovable obstacles, forbidden by regulat ions or denied by an adversary. For any deployment of the UAS team that allows overlapping areas establishing correct correspondences is important and remains a significant open problem. Regardless of the existence of overlap, the system must be able to maintain a coherent registry of the objects being tra cked. When an object leaves the field of view of one sensor and enters that of another it should be identified and tracked by the system as being the sa me object. This is known as the handoff problem and it is central in multi sensor distributed systems. To accomplish that some prediction and pr ojection of every objectÂ’s path should be available either centrally at a ground based c oordinating node or distributed to the computer on-board each of the autonomous VTOLs. Every flying platform should be able to expect an area of possible appearance for each of the objects currently tracked by another team member or at least for t hose that are more probable, based on their current and predi cted trajectory, to migrate between fields of view (areas of coverage). For the predicted trajectories to be relevant they have to be on a common coordinat e system. Since the position of the observers varies with time, excellent localization is required for the flying
80 platforms. In summary, the problem can be seen as one of tracking multiple targets given a multiple image, ti me varying, uncertain geometry. 6.6 Improved hardware / different platforms One trend that the computer science co mmunity is contributing to and has come to rely on is that of improving har dware. Central processing units with ever increasing processing power and are becoming available for lower cost. The power consumption penalty usually a ssociated with increased performance has lately been mitigated by the manufacturersÂ’ efforts to improve efficiency. The result is a series of powerful, low c onsumption processors that can be used in mobile platforms such as unmanned vehi cles. This can allow the proposed vision system to operate in real-time on smaller, more efficient processing platforms thus endowing even smaller VTOLs with object identification and tracking capabilities. Conditional on the availabili ty of low power multi core systems, a possible future implementation can also explore the para llelization of each component which will further increase the number of objects that can been tracked simultaneously. During the evaluat ion process, the system described in this thesis exhibited some hints of vers atility by being able to operate in different environments and in a scenario unrelated to the initial purpose for which it was designed. This warrants further invest igation with the ultimate goal being the development of a generalized algorith m capable of operati ng equally well on a wide variety of vehicles such as ground, aerial, marine and submarine.
81 References  Unmanned Systems Roadmap: 2007-2032, Office of the Secretary of Defense, December 2007.  Vision Hardware, University of Southern Calafornia, [online] 2004, http://www-robotics.usc.edu/~avatar/ vision_hw.htm (Accessed: 27 January 2005).  L. Mejias, S. Saripalli, G. Sukhatme P. Cervera, Â“Detection and Tracking of External Features in an Urban Environment Using an Autonomous Helicopter,Â” Submitted to IEEE In ternational Confer ence on Robotics and Automation, 2005.  S. Hrabar, G. S. Sukhatme Â“A Com parison of Two Camera Configurations For Optic-Flow Based Navigation of a UAV Through Urban CanyonsÂ” In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2673 2680, Sep 2004.  V. Sastry Â“Vision based detection of autonomous vehicles for pursuit evasion gamesÂ”, 15th Triennial Worl d Congress, Barcelona, Spain, 2002.  R. Sattigeri, A. J. Calise, Â“An adaptive approach to vision based information controlÂ”, Guidance, Na vigation and Control Conference, number AIAA-2003-5727, Au stin, TX, August 2003.  S. Hrabar and G. S. Sukhatme, Â“Omnidirectional Vision for an Autonomous HelicopterÂ”, Proceedings of IEEE International Conference on Robotics and Automation, pp 3602-3609, 2003.  S. Saripalli, J. F. Montgomery and G. S. Sukhatme, Â“Vision-based Autonomous Landing of an Unmanned Aeri al VehicleÂ”, Proceedings of IEEE International Conference on Robotics and Automation, pp 27992804, May 2002.  B. R. Woodley, H. L. Jones, E. A. LeMaster, E. W. Frew, and S. M. Rock., Â“Carrier Phase GPS and Computer Visi on for Control of an Autonomous HelicopterÂ”, ION GPS-96, Kansas City, Missouri, September 1996.
82  J. Debitetto, Â“Modeling and simula tion for small autonomous helicopter developmentÂ”, AIAA-1997-3511 Modelin g and Simulation Technologies Conference, New Orleans, LA, Aug. 1997.  B. Woodley, H.Jones, E. Frew E. LeMaster, Dr. Stephen Rock Â”A Contestant in the 1997 International Ae rial Robotics Compet itionÂ”, [online], http://sun-valley.stanford.edu/papers/W oodleyJFLR:97.pdf (Accessed: 20 December 2008).  N. Holifield, J. Lallinger, G. U nderwood Â“Internationa l Aerial Robotics Competition 2004Â”, University of Texas at Austin IEEE Robot Team Aerial Robotics Project June 1, 2004, [online] 2004, http://iarc1.ece.utexas.edu/~lyn ca/final_documentation/utiarc2004.pdf (Accessed: 10 June 2006).  Burleson, Salhany, Hudak, Â“Sout hern polytechnic State University Autonomous Remote Reconnaissance SystemÂ”, [online] 2006, http://arobotics.spsu.edu/SP SU_paper2005.pdf, (Acce ssed: 10 June 2006).  M. Musial, U. W. Brandenburg, G. Hommel, Â“MARV IN Technische Universitt Berlin's flying robot for the IARC Millennial EventÂ” In Proc. Symposium of the Association fo r Unmanned Vehicle Systems 2000, Orlando, Florida, USA, 2000.  RMAX Type I/ Type II G, Yamaha, [online] 2004, http://www.yamahamotor.co.jp/global /business/sky/lineup /rmax/index.html (Accessed: 27January 2005).  E. Johnson, S. Mishra, Â“Flight Simulation for the Development of an Experimental UAV,Â” AIAA Modelin g and Simulation Technologies Conference and Exhibit 5-8 August 2002.  A. Proctor, B. Gwin, S. Kannan, A. Koller, H. Christ ophersen, E. Johnson Â“Ongoing Development of an Autonomous Aerial Reconnaissance System at Georgia TechÂ“, [online] 2004, http://controls.ae.gatech.edu/gtar/i arcpapers/git2004.pdf (Accessed: 1 March 2009).  Johnson, E.N., Calise, A.J., T annenbaum, A.R., Soatto, S., Hovakimyan, N., and Yezzi, A.J., Â“Active-Visi on Control Systems For Complex Adversarial 3-D Environments,Â” A tu torial, Proceedings of the American Control Conference, 2005.
83  Enhanced Vision System (EVS) Ov erview, CMC Electronics, [online] 2001,www.cmcelectronics. ca/En/Prodserv/Commav/commav_evs_overvie w_en.html (Accessed: 22 January 2005).  E. Johnson, P. DeBitetto, C. Trott, and M. Bosse, Â“The 1996 MIT/Boston University/Draper Laboratory Au tonomous Helicopter System,Â” Proceedings of the 15th Digital Avionics Systems Conference, 1996.  J. Groven, E. Holk, C. Humbert, J. Krall, D. Schue, Â“Rose-Hulman Institute of Technology Autonomous Helicopter for the 2004 International Aerial Robotics CompetitionÂ”.  J. Chapuis, C. Eck, H. P. G eering, R. Mudra, B. Schneuwly, R. Sommerhalder: "The Swiss Entry into the 1996 International Aerial Robotics Competition," AUVSI '96 Proceedings, pp. 947-953, Orlando, FL, July 1996.  O. Amidi, Â“An Aut onomous Vision-Guided Helic opter,Â” M.A. thesis, Carnegie Mellon University, 1996.  K. Nordberg, P. Doherty, G. Fa rneback, P-E. Forss en, G. Granlund, A. Moe and J. Wiklund, Â“Vision for a U AV helicopter,Â” 200 2 IEEE/RSJ Int. Conf. on Intelligent Robots and Syst emsÂ–IROS 2002. Proc. Workshop WS6 Aerial Robotics, pp 29-34. Lausanne, 2002.  A. Ollero, J. Ferruz, F. Cabal lero, S. Hurtado, L. Merino, "Motion compensation and object detection fo r autonomous helicopter visual navigation in the COMETS system". in IEEE International Conference on Robotics and Automation, New Orleans, LA (USA), April 2004.  F. Ruffier; N. Franceschini, Â“V isually guided micr o-aerial vehicle: automatic take off, terrain following, landing and wind reactionÂ“,Proceedings ICRA '04, 2004 IEEE International Conference on Robotics and Automation, Volume 3, pp2339Â–2346, Apr 26-May 1, 2004.  Ludington, Reinmann, Vachtseva nos, Â“Target tracking and adversarial reasoning for unmanned aerial vehicles Â”, Aerospace Conference, 2007 IEEE, pp 1-17, 3-10 March 2007.  L. Mejias, S. Saripalli, P. Campoy G. Sukhatme, Â“Visual Servoing of an Autonomous Helicopter in Urban Areas us ing feature trackingÂ”, Journal of field robotics 23(3/4), pp 185199, 2006, Wiley Periodicals.
84  Mejias, L., Correa, J. F., Mondr agon, I. and Campoy, Pascual, Â”COLIBRI: A vision-guided UAV for surveillance and visual inspection.Â”Proceedings IEEE International Conference on Robotics and Automation 2007, pp 2760-2761, Rome, Italy,2007.  D.H. Douglas, T. K. Peucker, Â“A lgorithms for the r eduction of number of points required to represent a digiti zed line or its caricatureÂ”, The Canadian Cartographer, 10 (2), pp 112-122, 1973.  W. Bell, P. Felzens zwalb, D. Huttenlocher, Â“Detection and long term tracking of moving objects in aerial videoÂ”, 1999.  J. Shi, C. Tomasi, Â“Good feat ures to trackÂ”, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp 593-600, 1994.  T. Kanade, O. Amidi, and Q. Ke, "Real-Time and 3D Vision for Autonomous Small and Micro Air V ehicles," 43rd IEEE Conference on Decision and Control (CDC 2004), December, 2004.  Anderson, Dah-Jye Lee, Barrett Edwards, Archibald, and Greco, Â“Realtime Feature Tracking on an Embedded Vision Sensor for Small Visionguided Unmanned Vehicles Â”. pp 55-60, International Symposium on Computational Intelligence in Ro botics and Automation 2007, pp 55-60, CIRA 2007.  S. Saripalli, J. F. Montgomery and G. S. Sukhatme, Â“ Visually Guided Landing of an Unmanned Aerial VehicleÂ”, IEEE Transactions on Robotics and Automation, vol. 19, no. 3, pp 371-380, June 2003.  Keith Price Annotat ed Computer Vision Bibliography, University of Southern Calafornia, [online] 2008, http://iris.usc.edu/VisionNotes/bibliography/contents.ht ml (Accessed: 23 April 2008).  R. Jain, R. Kastur i and B. G. Schunck, Machine Vision, Mc Graw Hill, 1995.  V. N. Dobrokhodov, I. I. Kaminer K. D. Jones and R. Ghabcheloo, Â“Vision-Based Tracking and Motion Es timation for Moving targets using Small UAVsÂ”, Proceedings of the 2006 American Control Conference, pp 1428-1433, Minneapolis, Minnesota, USA, June 14-16, 2006.  M. George, S. Sukka rieh Â“Camera Aided Inertial Navigation in Poor GPS EnvironmentsÂ”, Aerospace C onference IEEE, pp1-12, 2007.
85  I.F. Mondragon, P. Campoy, J. F. Correa, L. Mejias, Â“Visual Model Feature Tracking For UAV ControlÂ”, IEEE International Symposium on Intelligent Signal Processing, pp 1-6, 2007.  Open Computer Vision Library, [online] 2008, http://sourceforge.net/projects/open cvlibrary, (Accessed: 15 December 2008).  Wan, C.L.; Dickins on, K.W., "Computer visi on and neural networks for traffic monitoring," Road Traffic M onitoring, 1992 (IEE Conf. Pub. 355), vol., no., pp.143-, 1992.
87 Appendix A This appendix gives a detailed descrip tion of the hardware used for this thesis. As mentioned previously in the text, two different helicopter platforms were used carrying different pr ocessing units and cameras. The Raptor 90 SE has provided the flying platform for the first development iteration. It has an inter nal combustion engine that uses a mixture of methanol as fuel. The selection of th is model was dictated by considerations regarding size, operating and purchasing cost. The processing system was based around an 1.2 GHz EPIA cpu that pr ovided computing power on par with a Pentium 3 class chip at similar clock frequency. The payload in this configur ation includes (Figure A.2): 1.2 GHz EPIA Processor Via Embedded motherboard Unibrain Firewire Camera Microstrain 3DM-G IMU 1 Gig 266 MHz RAM 1 Gig Compact Flash Compact Flash to IDE adapter Motorola M12+ GPS Receiver 8 Channel Servo Controller 200 W Power Supply 11.1 V LiPo Battery
88 Appendix A (Continued) 802.11B Cardbus The Firewire camera can capture images at 30 frames per second in resolutions ranging from 160x120 full color RGB to 640x480 in YUV442 format. Table A.1: On-board equipmentPayload limitations considered. Device Specifications Weight Power consumption % of payload Color CCD camera 0.028 Kg 1.32W 1.03 Transmitter 0.020 Kg 1.08W 0.73 GPS 0.091 Kg <1W 3.34 IMU 0.090 Kg <1W 3.30 PanÂ–Tilt B 0.2 Kg Approx 0.5W 7.33 Flight control computer 0.567 Kg Approx 2W 20.79 Power Supply 0.363 Kg 13.31 Figure A.1: The Raptor 90 SE carrying t he first version of the vision system.
89 Appendix A (Continued) The second implementation was based on the Maxxi Joker 2 helicopter. The main difference between th is and the previously us ed Raptor 90 SE is the use of an electric motor in plac e of the internal combustion unit. Figure A.2: The computer on-board the VTOL vehicle (assembled and tested by Richard Garcia). This allows for a cleaner more quiet operation with a significant reduction in engine caused vibration. Taking advantage of the technology advancements that happened during the first development phase, the on-board computer was updated to a Pentium M at 2GHz that provided more computing power while consuming less energy than the one it r eplaced. The camera was also changed for a Sony FXB 98S block camera that has better resolution approx. 680Kpixels,
90 Appendix A (Continued) superior optics that allow opt ical zooming up to 26x, wh ile maintaining a modest energy consumption. The payload in this configur ation includes (Figure A.5): 2.0 GHz Pentium M Intel 2200 B/G Mini-PCI wireless Garmin 18 GPS 200 Watt power supply 2 GB RAM Microbotics Saftey Servo Switch 30 fps 4-channel Frame Grabber 11.1 V 4Ahr LiPo Battery (3S2P) Figure A.3: The Maxxi Joker 2 loaded with equipment.
91 Appendix A (Continued) Figure A.4: Details of the Maxxi Joker 2.
92 Appendix A (Continued) Figure A.5: The second version of t he on-board processing system (assembled and tested by Richard Garcia).
93 Appendix B This appendix describes the soft ware modules needed for the vision system to function. Starting with the oper ating system that ensures the proper function and facilitates communication between the hardware components and the processes running on the on-board computer. The first implementation of the system used the Slackware 10.0 distribution of Linux as an operating system for the onboard computer. The image acquisition process utilized the firewire interface via the lib1394 and lbraw1394 libraries. The second implem entation was based employed the Gentoo flavor of Linux because it offer ed faster booting time s. Support for the necessary analog video device was provided through the bttv 868 drivers. Lastly, the main algorithm itself was written in the C programming language using the open source computer vision library OpenCv  that offers a substantial gamut of computer vision functions as well as a tangible increase in performance.
About the Author Michail Kontitsis received his Diplo ma in Electronics and Computer Engineering from the Technica l University of Crete, Greece in 2001. From 2001 until 2003 he continued his st udies in the Department of Production Engineering and Management where he received his Mast erÂ’s degree. During that time, he was involved in a research project funded by the Greek Secretariat of Research for the development of a real-time automat ed vision system for surveillance using infra-red cameras to be placed on a UAV. Since January 2004 he is with the Department of Computer Science and Engi neering of the University of South Florida where heÂ’s pursuing a Ph.D. under the guidance of Dr. Kimon Valavanis and since fall 2008 of Dr. Sudeep Sarkar as well. While in USF he participated in various research projects funded partially by the Office of Naval Research, Army Research Office and the Hillsborough County Transportation Division.