USF Libraries
USF Digital Collections

Dynamic programming with multiple candidates and its applications to sign language and hand gesture recognition

MISSING IMAGE

Material Information

Title:
Dynamic programming with multiple candidates and its applications to sign language and hand gesture recognition
Physical Description:
Book
Language:
English
Creator:
Yang, Ruiduo
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla.
Publication Date:

Subjects

Subjects / Keywords:
Sign language recognition
Movement epenthesis
Hand segmentation
Hidden Markov models
Dynamic time warping
Level building
Dissertations, Academic -- Computer Science and Engineering -- Doctoral -- USF   ( lcsh )
Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Abstract:
ABSTRACT: Dynamic programming has been widely used to solve various kinds of optimization problems.In this work, we show that two crucial problems in video-based sign language and gesture recognition systems can be attacked by dynamic programming with additional multiple observations. The first problem occurs at the higher (sentence) level. Movement epenthesis1 (me), i.e., the necessary but meaningless movement between signs, can result in difficulties in modeling and scalability as the number of signs increases. The second problem occurs at the lower (feature) level. Ambiguity of hand detection and occlusion will propagate errors to the higher level. We construct a novel framework that can handle both of these problems based on a dynamic programming approach. The me has only be modeled explicitly in the past. Our proposed method tries to handle me in a dynamic programming framework where we model the me implicitly. We call this enhanced Level Building (eLB) algorithm.^ This formulation also allows the incorporation of statistical grammar models such as bigrams and trigrams. Another dynamic programming process that handles the problem of selecting among multiple hand candidates is also included in the feature level. This is different from most of the previous approaches, where a single observation is used. We also propose a grouping process that can generate multiple, overlapping hand candidates. We demonstrate our ideas on three continuous American Sign Language data sets and one hand gesture data set. The ASL data sets include one with a simple background, one with a simple background but with the signer wearing short sleeved clothes, and the last with a complex and changing background. The gesture data set contains color gloved gestures with a complex background. We achieve within 5% performance loss from the automatically chosen me score compared with the manually chosen me score.^ At the low level, we first over segment each frame to get a list of segments. Then we use a greedy method to group the segments based on different grouping cues. We also show that the performance loss is within 5% when we compare this method with manually selected feature vectors.
Thesis:
Dissertation (Ph.D.)--University of South Florida, 2008.
Bibliography:
Includes bibliographical references.
System Details:
System requirements: World Wide Web browser and PDF reader.
System Details:
Mode of access: World Wide Web.
Statement of Responsibility:
by Ruiduo Yang.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 132 pages.
General Note:
Includes vita.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001970339
oclc - 276367524
usfldc doi - E14-SFE0002310
usfldc handle - e14.2310
System ID:
SFS0026628:00001


This item is only available as the following downloads:


Full Text
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 001970339
003 fts
005 20081125144814.0
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 081125s2008 flu sbm 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0002310
035
(OCoLC)276367524
040
FHM
c FHM
049
FHMM
090
TK7885 (ONLINE)
1 100
Yang, Ruiduo.
0 245
Dynamic programming with multiple candidates and its applications to sign language and hand gesture recognition
h [electronic resource] /
by Ruiduo Yang.
260
[Tampa, Fla.] :
b University of South Florida,
2008.
3 520
ABSTRACT: Dynamic programming has been widely used to solve various kinds of optimization problems.In this work, we show that two crucial problems in video-based sign language and gesture recognition systems can be attacked by dynamic programming with additional multiple observations. The first problem occurs at the higher (sentence) level. Movement epenthesis[1] (me), i.e., the necessary but meaningless movement between signs, can result in difficulties in modeling and scalability as the number of signs increases. The second problem occurs at the lower (feature) level. Ambiguity of hand detection and occlusion will propagate errors to the higher level. We construct a novel framework that can handle both of these problems based on a dynamic programming approach. The me has only be modeled explicitly in the past. Our proposed method tries to handle me in a dynamic programming framework where we model the me implicitly. We call this enhanced Level Building (eLB) algorithm.^ This formulation also allows the incorporation of statistical grammar models such as bigrams and trigrams. Another dynamic programming process that handles the problem of selecting among multiple hand candidates is also included in the feature level. This is different from most of the previous approaches, where a single observation is used. We also propose a grouping process that can generate multiple, overlapping hand candidates. We demonstrate our ideas on three continuous American Sign Language data sets and one hand gesture data set. The ASL data sets include one with a simple background, one with a simple background but with the signer wearing short sleeved clothes, and the last with a complex and changing background. The gesture data set contains color gloved gestures with a complex background. We achieve within 5% performance loss from the automatically chosen me score compared with the manually chosen me score.^ At the low level, we first over segment each frame to get a list of segments. Then we use a greedy method to group the segments based on different grouping cues. We also show that the performance loss is within 5% when we compare this method with manually selected feature vectors.
502
Dissertation (Ph.D.)--University of South Florida, 2008.
504
Includes bibliographical references.
516
Text (Electronic dissertation) in PDF format.
538
System requirements: World Wide Web browser and PDF reader.
Mode of access: World Wide Web.
500
Title from PDF of title page.
Document formatted into pages; contains 132 pages.
Includes vita.
590
Adviser: Sudeep Sarkar, Ph.D.
653
Sign language recognition.
Movement epenthesis.
Hand segmentation.
Hidden Markov models.
Dynamic time warping.
Level building.
690
Dissertations, Academic
z USF
x Computer Science and Engineering
Doctoral.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.2310



PAGE 1

DynamicProgrammingwithMultipleCandidatesanditsApplicationstoSignLanguageandHandGestureRecognitionbyRuiduoYangAdissertationsubmittedinpartialfulllmentoftherequirementsforthedegreeofDoctorofPhilosophyDepartmentofComputerScienceandEngineeringCollegeofEngineeringUniversityofSouthFloridaMajorProfessor:SudeepSarkar,Ph.D.DmitryB.Goldgof,Ph.D.ArthurI.Karshmer,Ph.D.BarbaraL.Loeding,Ph.D.DateofApproval:March7,2008Keywords:SignLanguageRecognition,MovementEpenthesis,HandSegmentation,HiddenMarkovModels,DynamicTimeWarping,LevelBuildingcCopyright2008,RuiduoYang

PAGE 2

DEDICATIONTomyfamily

PAGE 3

TABLEOFCONTENTSLISTOFTABLESiiiLISTOFFIGURESivABSTRACTviiCHAPTER1INTRODUCTION11.1OverviewandIntroduction11.2Contributions13CHAPTER2OBJECTIVESANDRELATEDWORKS152.1OverviewofObjectives152.2BackgroundonSequenceLabeling/ClassicationAlgorithm172.3BackgroundonHandsLocalization20CHAPTER3HIGHLEVELMATCHING223.1ProblemFormulation233.2TheEnhancedLevelBuildingAlgorithm243.2.1DynamicProgramming243.2.2GrammarConstraint29CHAPTER4SINGLESIGNMATCHING324.1CouplingGroupswithDeterministicMatchingAlgorithm334.1.1FormulationoftheMatchingProcess334.1.2DynamicProgramming364.2CouplingwithHiddenMarkovModels384.2.1MaximalObservation,SummedState394.2.2SummedObservation,SummedState414.2.3MaximalObservation,MaximalState43CHAPTER5FEATUREREPRESENTATION455.1LowLevelRepresentation455.1.1DetectionofHands465.1.2GlobalFeatures485.1.3MultipleCandidatesRepresentation51i

PAGE 4

5.2GroupingofLowLevelPrimitives525.2.1GroupingProcess545.2.2AssociatingGroupsAcrossFrames60CHAPTER6CONDITIONALMODELS626.1ConditionalRandomFieldsforaSign/GestureSequence626.2KeyFramesRepresentationandExtraction646.2.1MotionSnapshotRepresentation646.2.2DetectingKeyFrames656.3ConditionalRandomFieldsoverKeyFrameSequences68CHAPTER7EXPERIMENTSANDRESULTS707.1DataSetsandExperimentSetup717.2Study1:eLBvs.LBwithGrammarandParameterAnalysis747.3Study2:ComparisonwithOtherApproaches817.4Study3:GlobalFeaturesvs.MultipleLocalCandidates857.5Study4:ShortSleevesvs.LongSleeves887.6Study5:GroupingResultswiththeDeterministicModel897.7Study6:GroupingResultswithHiddenMarkovModels97CHAPTER8CONCLUSIONANDFUTUREWORK107REFERENCES111APPENDICES119AppendixADataCollection120AppendixBGroundtruthTools122AppendixCTextCorpus129ABOUTTHEAUTHOREndPageii

PAGE 5

LISTOFTABLESTable2.1SummaryofpossibleproblemsinASLrecognition.17Table2.2Imageconstraintsweusedinourimagingprocess.18Table7.1Summaryofthedatasetsusedinthiswork.71Table7.2Outlineofstudy1)]TJ/F15 11.955 Tf 13.2 0 Td[(study4.75Table7.3Outlineofstudy5andstudy6.76Table7.4Comparisonofautomaticallychosenandmanuallychosen.81Table7.5Framewiselabelingresults.84Table7.6Listofmatchedscoresfor3testsentences.97Table7.7Matchedpositionsandmanuallyrecognizedpositions.98Table7.8Compareresultswithgroupingandwithoutgrouping.100TableC.1SequencesusedasthetextcorpusinD1.130TableC.2MoresequencesusedasthetextcorpusinD1.131TableC.3SequencesusedasthetextcorpusinD2.132TableC.4SequencesusedasthetextcorpusinD3.132iii

PAGE 6

LISTOFFIGURESFigure1.1Exampleofmeframes.3Figure1.2Dierentapproachestohandlingmovementepenthesisme.4Figure1.3Appearancechangeovertime.7Figure1.4Overviewofthemultiplecandidatesrecognitionalgorithm.8Figure1.5Allthecomponentsinourexperiments.9Figure1.6Exampleframeofthedataset1weuse.10Figure1.7Exampleframeofthedataset2weuse.11Figure1.8Exampleframeofthedataset3weuse.11Figure1.9Exampleframeofthedataset4weuse.12Figure3.1TherecursivenatureoftheLevelBuildingalgorithm.26Figure3.2TheresultoftheenhancedLevelBuildingmatchingprocess.28Figure4.1Illustrationoftheminimizationproblem.35Figure4.2Nodeswiththepredecessorsshownpartlyforabetterview.36Figure4.3Illustrationofthelocalconstraints.37Figure4.4Illustrationoftheindexedforwardprocess.42Figure5.1Intermediateresultsfortheprocessofhandsegmentation.47Figure5.2Handsegmentationresultsusingallframes.49Figure5.3Handsegmentationresultsusingkeybackgroundframes.50Figure5.4Thecondencemapandthegeneratedcandidatehands.52Figure5.5TheproposedHMMmodel.54iv

PAGE 7

Figure5.6Flowchartofthegroupingprocess.55Figure5.7Illustrationoflocaladjacencygraph.56Figure5.8Exampleofthegenerationprocessofgroups.58Figure5.9Exampleofgeneratedmultiplecandidates.59Figure6.1DierenceofCRF,HMMandkeyframeCRF.63Figure6.2Illustrationofrelationaldistributionrepresentation.66Figure7.1Labelingresultsforthreesentences.77Figure7.2SignlevelerrorratesusingeLBondatasetD1.78Figure7.3ErrorratesforeLBandLB.79Figure7.4Errorrateswithtrigramandbigramconstraints.79Figure7.5Choosingthemovementepenthesismelabelingcost.80Figure7.6Keyframedetection.83Figure7.7TheROCcurvefordetectingmeusingHMMsandCRF.84Figure7.8Compareglobalfeaturesandpart-basedcandidatehands.86Figure7.9Labelingfor"FINISHBUYTICKETNOWFINISH".87Figure7.10Comparewithandwithoutskeletonmedial-axisdetection.89Figure7.11Thelabelingresultsforthesequence"TABLETHAT".90Figure7.12CandidategroupsforASLdatasetwithgrouping.92Figure7.13CandidategroupsforASLdatasetwithoutgrouping.93Figure7.14MatchingpathofanASLsentence.94Figure7.15Therecoveredrighthandinthetestsequence.95Figure7.16ThetestresultsforASLdataset.96Figure7.17HCIdatasetresults.99Figure7.18Candidategroupsforthegesturedatasetwithgrouping.101Figure7.19Candidategroupsforthegesturedatasetwithoutgrouping.102v

PAGE 8

Figure7.20Recognitionofhandgesturesforvedierentapproaches.102Figure7.21Recognitionperformanceofeachhandgesture.103Figure7.22Recognitionperformanceofeachsubjectseparately.104Figure7.23Theoptimalgroupscorrespondingtooneofthehands.105Figure7.24Thetrackingresultsforthersttestinstance.106FigureA.1SetupforcapturingdatasetD2andD3.120FigureB.1Overviewofthetwo-passapproach.124FigureB.2Thegraphicuserinterface.126FigureB.3Performanceoftheannotationtool.127FigureB.4Illustrationofthemultiplecandidates.128vi

PAGE 9

DYNAMICPROGRAMMINGWITHMULTIPLECANDIDATESANDITSAPPLICATIONSTOSIGNLANGUAGEANDHANDGESTURERECOGNITIONRuiduoYangABSTRACTDynamicprogramminghasbeenwidelyusedtosolvevariouskindsofoptimizationproblems.Inthiswork,weshowthattwocrucialproblemsinvideo-basedsignlan-guageandgesturerecognitionsystemscanbeattackedbydynamicprogrammingwithadditionalmultipleobservations.Therstproblemoccursatthehighersentencelevel.Movementepenthesis[1]me,i.e.,thenecessarybutmeaninglessmovementbetweensigns,canresultindicultiesinmodelingandscalabilityasthenumberofsignsincreases.Thesecondproblemoccursatthelowerfeaturelevel.Ambiguityofhanddetectionandocclusionwillpropagateerrorstothehigherlevel.Wecon-structanovelframeworkthatcanhandlebothoftheseproblemsbasedonadynamicprogrammingapproach.Themehasonlybemodeledexplicitlyinthepast.Ourproposedmethodtriestohandlemeinadynamicprogrammingframeworkwherewemodelthemeimplic-itly.WecallthisenhancedLevelBuildingeLBalgorithm.Thisformulationalsoallowstheincorporationofstatisticalgrammarmodelssuchasbigramsandtrigrams.Anotherdynamicprogrammingprocessthathandlestheproblemofselectingamongmultiplehandcandidatesisalsoincludedinthefeaturelevel.Thisisdierentfromvii

PAGE 10

mostofthepreviousapproaches,whereasingleobservationisused.Wealsoproposeagroupingprocessthatcangeneratemultiple,overlappinghandcandidates.WedemonstrateourideasonthreecontinuousAmericanSignLanguagedatasetsandonehandgesturedataset.TheASLdatasetsincludeonewithasimplebackground,onewithasimplebackgroundbutwiththesignerwearingshortsleevedclothes,andthelastwithacomplexandchangingbackground.Thegesturedatasetcontainscolorglovedgestureswithacomplexbackground.Weachievewithin5%performancelossfromtheautomaticallychosenmescorecomparedwiththemanuallychosenmescore.Atthelowlevel,werstoversegmenteachframetogetalistofsegments.Thenweuseagreedymethodtogroupthesegmentsbasedondierentgroupingcues.Wealsoshowthattheperformancelossiswithin5%whenwecomparethismethodwithmanuallyselectedfeaturevectors.viii

PAGE 11

CHAPTER1INTRODUCTION1.1OverviewandIntroductionInthiswork,weproposealgorithms,basedondynamicprogramming,toattacktwofundamentalproblemsinvideo-basedsignlanguage/handgesturerecognition.Therstproblemisthemovementepenthesismeissue.Thisproblemisbroughtbythetransitionbetweentwosigns.WeproposeanenhancedLevelBuildingalgorithmeLBtoattackthisproblemwithoutanyexplicitmodelingofme.Thesecondproblemisthelowlevelhandsegmentationproblem.Weproposeagroupingalgorithmandmatchthegroupswithanewdecodingprocess.Thisalgorithmallowsustoavoidtheneedforperfectsegmentationatthelowlevelfeatureextractionstep.Movementepenthesismeeectisoneproblemthatoccursinthesignlan-guage/gesturesequence.Inthephonologicalprocessesinsignlanguage,sometimesamovementsegmentneedstobeaddedbetweentwoconsecutivesigns[2].Thisiscalledmovementepenthesisme.Fig.1.1showsanexampleofmeframes.Theseframesdonotcorrespondtoanysignandcaninvolvechangesinhandshapeandmovement.Theycanbeovermanyframessometimesequalinlengthtoactualsigns.Themeeecthasbeenconsideredinprioreorts.TheearliestworkthatexplicitlymodeledmovementepenthesisinacontinuoussignlanguagerecognitionsystemwithdedicatedHMMscanbefoundin[3]byVogleretal..Theyalsousedcontextde-pendentsigns[4]tomodelmovementepenthesisandsignstogether.Similartotheir1

PAGE 12

approach,Yuanetal.[5]andGaoetal.[6]alsomodelthemovementepenthesisex-plicitlyanddomatchingwithboththesignmodelandmovementepenthesismodel.Thedierenceisthattheyadoptanautomaticapproachtoclusterthemovementepenthesisinthetrainingdatarst.Otherthanthis,we[7]alsouseconditionalrandomeldsCRFtosegmentthesentencebyremovingmesegments.Inthiswork,wealsocomparetheCRFapproachwiththenewproposedframework.Theexperimentalresultshaveshownthattheapproachestoexplicitlymodelmovementepenthesisyieldresultssuperiortobothignoringmovementepenthesiseectsandcontextdependentmodeling.OurexperimentalresultsalsoshowthatusingdiscriminativemodelssuchasCRFcanachievebetterresultscomparedtoagenerativemodellikeaHiddenMarkovModel.However,themajorquestionofscalabilitystillremains,becausetheseapproachesneedtoexplicitlymodeltheme.ToobtainenoughtrainingdatatotrainthemodelsofmovementepenthesiswithNsigns,onemayexpectthenumberofmovementepenthesismodelstobeON2.Also,tobuildthemodelofmovementepenthesis,onehastoextracttheassociatedframesfromasetofspecicsentencesinthetrainingdata,eithermanuallyorautomatically.Hence,themodelcanbeeasilybiasedtothissetofsentences.Unlikepreviousapproaches,wetakeadynamicprogrammingapproachtoaddresstheproblemofmovementepenthesiswithoutexplicitmodelingofme,buildingupontheideain[8,9].ThismatchingdoesnotplacedemandsonthetrainingdataasmuchasprobabilisticmodelssuchasHMMsdo.WeillustratethedierencebetweenourapproachwiththeonethatignoresmovementepenthesisortheonethatexplicitlymodelsmovementepenthesisinFig.1.2.Fig.1.2arepresentsamatchingprocedurethatignoresmeandmatchesallmodelsignsinamodelbasetoatestsentence.Notethatthemovementepenthesisbetweentwosignscanbefalselyrecognizedasoneofthesigns.Fig.1.2b,ontheotherhand,illustratestheprocessofexplicitlymodeling2

PAGE 13

Figure1.1Exampleofmeframes.Therstframeistheendofsign:"GATE",thelastframeisthestartframeof"WHERE".Thereareseveraltransitionframesthatactuallyhavenomeaningandareknowntobethemesegmentinbetween.allthepossiblemovementepenthesises,wherethemeframesinthetestsequenceareexpectedtobematchedtothemodeledmeframes,andnotasign.Fig.1.2csketchesourapproach.Wehaveconstructedamodelbasethatconsistsofallactualmodelsigns,butnotmovementepenthesis.Duringthesearchfortheoptimalsignsequenceinasentence,wedynamicallydecidewhetheramatchisareliablematchornot.Ifnot,welabelthetestframeasanme.Determiningthecostofthislabelingiscrucialandwehaveaneective,automatedmethodforit.Specically,weusetheBayesianboundaryofthegoodmatchesandthebadmatchesasthecostofthelabeling.Theentireprocessisembeddedinadynamic-programming-basedLevelBuildingeLBalgorithmcoupledwithagrammarmodel.Thesearchprocessisconductedinadeterministicmanner,whereweuseDynamicTimeWarpingDTW,constrainedbyastatisticalgrammarmodel.Theadvantageoftheproposedmatchingprocessisthatimplicitsegmentationofthesentenceintosignshappenswithouttheneedformodelingmovementepenthesis.Tocreatethemodelbase,i.e.,fortraining,weonlyneedthesignframesinacontinuoussentencewithouttheassociatedmovementepenthesisframes.Thisprocessisdonemanually.Thesecondproblemweareattackingbyusingadynamicprogrammingprocessisthelowlevelsegmentationproblem.Forapurevideosignlanguagesequence,afrequencydomainrepresentationoftheframegenerallycannotprovideenoughinfor-mationfordescribinghandshape,handposition,orientation,motion,etc.Instead,3

PAGE 14

aIgnoringme bExplicitlymodelingme cOurapproachFigure1.2Dierentapproachestohandlingmovementepenthesisme.aIftheeectofmeisignoredwhilemodeling,itwillresultinsomemeframesfalselyclassiedassigns.bIfmeisexplicitlymodeled,buildingsuchmodelswillbedicultwhenthevocabularygrowslarge.cTheadoptedapproachinthisworkdoesnotexplicitlymodelmes.Weallowforthepossibilityformetoexistwhennogoodmatchingcanbefound.4

PAGE 15

preprocessingisusuallyrequiredtotrackorsegmentthehands.Evenforasimplebackground,thiscanbehard.Thereasonisthatforasignlanguagegesture,twohandsmaycomeacrosseachotherandthetwohandsmaycomeacrosstheface.Duetothesecomplexissues,previouscontinuousASLrecognitionhasmostlyreliedonexternaldevicestoobtainfeaturevectors.Forexample,Volgeretal.[4,10,11]useda3DtrackingsystemandCybergloves,WangandGaoetal.[12]usedcyberglovesanda3Dtracker,Starneretal.[13,14]usedcolorgloves,accelerometersandhead/shouldermountedcameras,Kadous[15]usedpowergloves.Althoughusingexternaldevicescanyieldbetterresults,italsomakesthesignersfeeluncomfortable,andthenchangestheappearanceofanormalsign.Someotherapproachesuseonlyasinglecamerawithoutexternaldevicesbutwithconstraints.Forexample,BauerandKraissused[16,17]asinglecolorcamerawithoutexternaldevices.However,theydidneedauniformbackgroundtoperformtherecognition.Cuietal.[18]usedasegmentationschemeunderacomplexbackground,buttheirapproachwasworkingonanimagesequenceforisolatedsigns.Forpurevideosequences,lowlevelprocessesareneverperfect.Skincoloristhemostcommonlyusedcueforsegmentingimagepartsfromthehandorfaceingestureanalysis.However,thisdoesnotalwaysproduceperfectsegmentation,withoversegmentationbeingaparticularlyhardproblemtohandle.Fig.1.3showsanexampleoftheilluminationandshadingchangethatcanberepresentedinagesturesequence.Ifpartsoftheimagefromthengerandthepalmarenotgroupedtogether,highlevelmatchingwillbestarvedofcrucialinformationrelatedtorecognizingngerspelledwordsinsignlanguagerecognition.Tohelpovercometheproblemofoversegmentation,oneapproachofoursistouseanintermediategroupingprocess.Thegoalofthisprocessistoformgroupsoflowlevelimageprimitiveswhichmostlikelyformonepartofinterest,participatinginthe5

PAGE 16

actionbeingobserved,oneobviousexamplebeingthehands.Soasnottoshortchangethesubsequentrecognitionprocessbyinsistingondisjointedgroups,asisusuallythepracticeingrouping,weallowforoverlappinggroups,resultinginredundantsetsofgroups.Atthebeginningofsuchagroupingprocess,someregionpatchesareselectedasseedsbasedontheirsize.Wethengrowtheseseedswithadjacentregionstogenerategroups.Astheseedsaregrown,groupsarecheckedforthepossibilityofbeingpossiblehandsbasedonsizeandshape.Groupingcanbeconductedbasedoncolor,position,boundarysmoothnessorboundarygradient.ThesebasicsimilaritycuesresemblethoseadoptedbyHoogsandMundytogroupregionpatches[19]forobjectrecognition,wheretheyusedspatialintensity,parallelismandperimetertoformanobjecthypothesis.However,unlikethemweperformthegroupingbasedoneachcriterionindependentlyofeachother.Eachcriterionresultsinasetofgroups,whichwerefertoasagroupinglayer.Thuswehavegroupinglayers:thecolorgroupinglayer,theproximitygroupinglayerandtheboundarysmoothnessgroupinglayer.Thegroupingstrategyusedisthesameforeachlayerandisdiscussedlater.Fromourliteraturesurvey,wefoundthattherearepreviousapproachesthathavealsousedmultiplehandcandidaterepresentations.Forexample,thecombinationoftop-downandbottom-upapproachesingesturesequencerecognitioncanbefoundin[20]and[21].Theybothusedskinandmotioncuestogeneratemultiplecandi-dates.However,thesepreviousworksareallbasedonisolatedgesturerecognition,andtheydonothaveanintermediategroupingschemetoproduceenoughshapeinformation.OurmultiplecandidateapproachforcontinuousASLrecognitionisputintoaLevelBuildingframeworkwhereacontinuoussignlanguagesentencecanbeanalyzed.Theelegantaspectoftheapproachisthatwecananalyzemovementepenthesisme,two-handedandsingle-handedsignsinauniedframework.Forsingle6

PAGE 17

Figure1.3Appearancechangeovertime.Noticethebrightnessvariationofthelefthandofthesubject.sign/gesturerecognition,ourgroupingschemecanalsoeectivelygeneratethetruehandcandidatestopreventanyerrorsatthefeaturedetectionlevel.Theoverallstructure,testandcontributionsareillustratedinFig.1.5.Atthefea-tureextractionlevel,wehavetestedboththetraditionalfeaturevectorsandmultiplecandidatefeatures.Forasinglefeaturevector,wetestedonCRF,HMMs,LBandourproposedeLBalgorithms.Formultiplefeaturevectors,ourtestswerebasedonHMMs,DTWatthesignmatchinglevelandtheeLB,LBatthesentencelevel.WealsotestedourgroupingschemesbasedonHMMsandDTWmodels.Fig.1.5showsustheplacewhereweproposedtwocorematchingalgorithms.Thetwomoduleswiththeboxinshadowindicatethetwomainproposedmatchingalgorithmsinthiswork.Weexperimentedwithdierentkindsofsingleviewvideodatasets.SomesampleframesareshowninFig.1.6,1.7,1.8and1.9,fromwhichwecanseethatweusethreedatasetsforAmericanSignLanguageandonedatasetforsimplehandgestures.Fig.1.6showstherstdataset,whereacleanbackgroundisused.Fig.1.7showsthedataset2,wherewehaveacomplexandmovingbackground.Fig.1.8showsthedata7

PAGE 18

Figure1.4Overviewofthemultiplecandidatesrecognitionalgorithm.First,theoriginalframesaresegmented,andthegroupsineachframeaswellasthelinksbetweenframesareproduced.Thenthesequencesoflinkedcandidategroupsarematchedtothemodelgroupsinthedatabase.8

PAGE 19

Figure1.5Allthecomponentsinourexperiments.Frombottom,thetrainingdataisprocessedandgroundtruthed.Fromthetop,wegeneratetwotypesoffeaturevectorsinourexperiments.Forasinglefeaturevector,wetestedusingCRF,HMMs,LBandourproposedeLBalgorithms.Formultiplefeaturevectors,wetestedbasedonHMMs,DTWatthesignmatchinglevelandtheeLB,LBatthesentencelevel.9

PAGE 20

Figure1.6Exampleframeofthedataset1weuse.set3wherewehaveshortsleevedclothes.Fig.1.9showsthedataset4,wherewehaveacomplexbackgroundforasinglegesturesequence.Besidesthedierentsetupofdatasets,wealsotesteddierentmatchingalgorithmssuchastraditionalLevelBuildingLBapproach,conditionalrandomelds,HiddenMarkovModels,etc.Wewillshowtheeectivenessofourproposedalgorithmcomparedtothesestandardones.Beforeweproceed,wehaveorganizedourworktoanswerthefollowingresearchquestionsrelatedtothetwofundamentalproblemsweareattacking.Wewillanswerthesequestionsinthefollowingchapters.1.Canwehandlethemovementepenthesisproblemwithouttheneedforexplicitlymodelingmesegments?Ifwedonotexplicitlymodelmesegments,howcanweassociateamatchingscoretoeachmesegment?2.NotonlydoweneedtodetecttheexistenceofeachmeinanASLsentence,butalsoweneedtoexplicitlylocatethepositionandthelengthofeachme,eventhoughwedonotknowbeforehandhowmanymeswewillhaveinasentence.Inaddition,anmecanhappenatanypositioninasentenceintermsofframenumber,anditisnotalwaysthesamelength.Soinordertoconductthesearch,10

PAGE 21

Figure1.7Exampleframeofthedataset2weuse. Figure1.8Exampleframeofthedataset3weuse.11

PAGE 22

Figure1.9Exampleframeofthedataset4weuse.wemustsearchallthepossiblestartpositions,alongwithallthepossiblelengthsandallthepossibleoccurrencesofme.Thissearchspacecanbehuge.Howcanwelimitit?3.Howcanastatisticalgrammarmodel,suchasbigramsandtrigrams,beincor-poratedintothesolutionapproach?4.Howcanwehandleimperfectsegmentationatthelowlevel?Howcanoneusefeaturegroupingprocessestoovercomesegmentationerrors?5.Canourproposedsetofalgorithmshandleacomplexbackground?Canweidentifysignsmadebysignerswearingbothshortandlongsleeves,i.e.,relaxthetypicalclothingconstraints?6.Howwelldoestherecognitionratewiththeproposedapproachmatchwiththatachievedthroughmanuallygroupedsegmentation?Amongthese6researchquestionsabove,questions1)]TJ/F15 11.955 Tf 11.513 0 Td[(3arerelatedtothemove-mentepenthesisproblem,whichexistsinsignlanguage/continuousgesturerecog-nitiondomains.Questions4)]TJ/F15 11.955 Tf 13.177 0 Td[(6,however,arerelatedtofundamentalcomputervisionissuesthatcutacrossmanydierentapplicationdomains,beyondsignlan-12

PAGE 23

guage/continuousgestures.Wewilltrytoanswerthesequestionsintheircorre-spondingchaptersintheremainingpartofthedissertation.1.2ContributionsInthiswork,westrivetosolvetwofundamentalproblemsinautomaticvideo-basedsignlanguageandgesturerecognitionsystems.Therstproblemisthemove-mentepenthesismeproblem.Thisproblemresultsfromthetransitionmovementsasignermakesbetweentwosigns.WeproposeanenhancedLevelBuildingalgorithmeLB[8,9]toattackthisproblemwithoutanyexplicitmodelingofme.Thesecondproblemisthelowlevelhandsegmentationproblem.Weproposeagroupingalgo-rithmandmatchitwithanewdecodingprocess[22,23].Thisalgorithmallowsustoworkwithouttheneedofperfectsegmentationatthelowlevelfeatureextractionstep.Previousapproachestriedtoexplicitlymodelmovementepenthesis,butscalabilitybecameabigproblem.Wearethersttouseaboundarytomodelallofthedynamiceectsofmeinsteadofthemeitself.Andweembeditintoanoptimalframeworktoproducethelabelingandsegmentationsimultaneouslyforcontinuoussignsentences.Thisapproachgreatlyreducestheeortstomodelthemedirectlyandtheproblemofinsucienttrainingdataofme.Itseectivenessistestedinourexperiments.WehavecomparedoureLBalgorithmwithotherconditionalmodelsthatarestateoftheart,whicharealsogoodforlimitedtrainingdatasets.WeshowthatthemethodofconditionalrandomeldsCRFcanworkbetterundera2-classcase[7].Insignlanguagerecognitionwherewehavelargenumbersofclasses,thesemethodscannotworkeectively.However,theeLBmethodscanstillproducegoodresults.Inourexperiments,wefeeddierenttypesoffeaturevectorstotheeLBframe-work.Oneoftheimportantcontributionsinthisstepisthatweusemultiplegrouped13

PAGE 24

observationsasourinput.Wearethersttousemultiplelinkedgroupsequencestomatchasequencemodel.WedevelopalgorithmstocouplethegroupsequenceinbothadeterministicdomainDynamicTimeWarpingandaprobabilitydomainHiddenMarkovModels.Weprovideanovelgroupingstrategythatcangeneratemultipleoverlappinggroupstoreducethechanceofmissingthetruehandsinthelowlevel.Wegrouptheframebasedontheoversegmentationresultandweusedierentgroupingcuesasthebasis.Weshowthatthegroupingprocesscaneectivelyreducethechanceoflosingtherealobservations,butthiscannotbedonewithoutagroupingprocess.Asabyproductofthisresearch,wehavealsoproducedvariousresearchtoolsthatcanbeusedbyanyoneingesturerecognition[24].TheseareincludedinAppen-dixB.Weshareoursourcecodeforouralgorithmsandthetoolsonthewebsite:"http://gment.csee.usf.edu/ASL/",sothatotherscanreproduceourresultsoruseouralgorithms.Inthefollowingpartsofthedissertation,objectivesandmorerelatedworksaredescribedinChapter2.WediscusstheproblemofmeandthehighlevelDPprocessinChapter3.Chapter4describesthelowlevelDPprocesstohandletheambiguityproblem.Chapter5statesthefeaturelevelprocessingandthegenerationofhandcandidates.WethendescribetheframeworktolabelsignlanguagesequencesusingconditionalrandomeldsinChapter6,whichisaverypopularconditionalmodelforsequencelabeling.WethenpresenttheexperimentalresultsinChapter7,andconcludeatChapter8.14

PAGE 25

CHAPTER2OBJECTIVESANDRELATEDWORKS2.1OverviewofObjectivesInthiswork,ourmainobjectiveistobuildanautomaticsystemforrecognitionofsignlanguages/gesturesfromavideosequence.Althoughmostofthemethodsproposedinthisworkcanworkwithdierentfeaturevectors,suchasthosefeaturesfrommagneticglovesoraccelerometers,thisworkisspecicallytargetingrecognitionofsignlanguage/gesturesequencefromasingleimagesequence,withoutanyexternaldevices.Themotivationofdoingthisisthatusingexternaldeviceswhilesigningmaymakethesignunnatural.Eveninanapplicationofcontinuousgestures,usinganyglove-likeexternaldeviceswillinuencethewaythatgesturesareperformed.Hence,thegestureperformedwillbedierentfromtheonethatisperformedwithoutsuchadevice.Ofcourse,weareawarethatthisparticularproblemisnottrivial.Infact,insuchanautomaticsystem,manymoreproblemsneedtobeconsideredotherthanahandtrackingproblem.InTable2.1,welistsomecommonproblemswhichareassociatedwithsuchasystem.Wealsoindicateifwehaveconsideredtheseparticularproblemsinoursystem.InTable2.1,therstcolumnstatesthespecicproblems.Themove-mentepenthesisproblemmeistheproblemoftheinsertedmesegmentstotransportthehandsbetweentwosuccessivesigns.Notethisproblemisdierentfromtheproblemofcoarticulation[1].Coarticulationinsignlanguagereferstothechanging15

PAGE 26

aspectsofsignswhentheyoverlapintime[1].Thegrammarproblemistheproblemofdeterminingwhetherasequenceismeaninglessornot.Thatis,themeaninglesssentencesneedtobeprunedfromthenalrecognition.Thecoarticulationproblemistheproblemthatasign/gesturemaybeperformeddierentlychanginghandshape,orientation,etc.whileinasentence,thentheoverlappingpartoftwoconsecutivesigns/gestureswillhavechangedrepresentations[1].Thenon-manualproblemisthatthefacialexpressionswillinuencethemeaningofasignsentenceeveniftheman-ualpartofasentenceisthesame.Thesignerindependentproblemisthefactthatthevariationofaperformanceofasign/gestureamongdierentsignerscanresultinmodelingfailureandincorrectresults.Theshortsleeveproblemisthattheshortsleevedclothesmaymakeitdiculttosegmentthehands,evenifoversegmentationisused.Thehandsegmentationproblemisthegeneralproblemthatahandseg-mentation/trackingmayfailforavideosequence.Theviewindependentproblemistheproblemthatthetestingsequencecanbetakenfromadierentviewpointfromthetrainingsequenceandthefeaturevectorusedmustbeabletoaccommodatethis.Inthiswork,wehavedevelopedmethodsformeproblem,grammarproblem,shortsleeveproblem,andthehandsegmentationproblem.Wealsoprovideexperimentalresultsfortheseproblems.Forsignerindependentproblem,wedonotproposeanyspecicmethods,butoneofourtestdatadoeshavesignerindependentcases.Inthisworkwehavenotproposedanymethodsorexperimentalresultstohandlethecoarticulationproblem,thenon-manualproblemortheviewindependentproblem.Ontheotherhand,theseproblemsmustbeattackedunderaspecicimagingrestriction.In[25],severalimagingrestrictionsandconstrainshavebeendiscussed.Ourworkistargetinganapplicationofasignlanguagetranslatorwithintheairportdomain,whichmeanswemayhaveacomplexmovingbackground,butourcameracanbestill.Accordingtothis,Table2.2liststheimagerestrictionswemayhave16

PAGE 27

Table2.1SummaryofpossibleproblemsinASLrecognition. ProblemConsideredNovelmethodsExperimented MovementepenthesisproblemYesYes,eLBYesGrammarproblemYesYes,bigram,trigramandsentencewitheLBYesCoarticulationproblemNoNoNoNon-manualproblemNoNoNoSignerindependentproblemNoNoYesShortsleevedclothingproblemYesYes,skeletonwithmul-tiplecandidatesYesHandsegmentationproblemYesYes,usingfragmentsandmultiplecandidatesYesViewindependentproblemNoNoNo correspondingtothediscussionsin[25].FromTable2.2,wecanseethatwetrytouseasfewrestrictionsaspossible.Weonlyusetherstthreetypesofrestrictionsforsomeofourdatasets.Wehaveconsideredboththemovingbackgroundandshortsleevedcasesinourexperiment,basedonourproposedframework.2.2BackgroundonSequenceLabeling/ClassicationAlgorithmThenalobjectiveofthissystemistogeneratealabelsequencerelatedtotheim-agesequence.Manyalgorithmshavebeenproposedtolabelasequenceandtheyhavebeenusedinlabelingasignlanguage/gesturesequence.Forexample,DynamicTimeWarping,HiddenMarkovModels,andconditionalrandomeldsarethemajorlabel-ingmethods.ExtensionsbasedonthesemethodsincludestatisticalDTW[26],par-allelHMM[27],MaximumEntropyMarkovModelsMEMM[28],HiddenCRF[29]andLDCRF[30],etc.Amongthem,CRFisastatisticaldiscriminativemodel.Ithasseveraladvantagessuchasitsabilitytolabelthesequenceregardingtoaglobaloptimalmanner,anditsabilitytodirectlymodeltheposteriorprobabilities.17

PAGE 28

Table2.2Imageconstraintsweusedinourimagingprocess.Theseconstraintsareusedin[25]. ConstraintsUsedDiscussedExperimented LongsleevedclothingYesforsomedatasetsYesYes ColoredglovesYesforsomedatasetsYesYes UniformbackgroundYesforsomedatasetsYesYes ComplexbutstationarybackgroundNoNoNo Head/facerequiredtobestationaryorhavelessmovementthanhandsNoNoNo ConstantmovementofhandsNoNoNo FixedbodylocationandposeorspecicinitialhandlocationNoNoNo LefthandorfaceexcludedfromtheviewNoNoNo VocabularyrestrictedorunnaturalsignstoavoidoverlappinghandsorhandoverfaceNoNoNo FieldofviewrestrictedtothehandwhichiskeptatxedorientationanddistancetothecameraNoNoNo 18

PAGE 29

Theworksofvideosign/gesturesequencerecognitionhavebeenborrowingthesuccessfulmethodsusedinthegeneralsequencerecognitiondomain.Thisisnaturalsinceagesturesequence,canberegardedasatimeserieswithobservationsateachtimeframe,justlikeaspeechsequenceortextsequence.Someworksusedanor-malclassierinsteadofasequenceclassierfortimeseries.Theseworksnormallyfocusonaclassicationofhandpropertiesandmotiontypessuchas[31,32]insignlanguage,and,[33{35]ingeneralgestureclassication.However,thesemethodslacktheabilitytomodelthecontextualinformationinboth2Dimagesand1Dsequences.Hence,theirpopularityhasbeenovercomebyspecicsequenceclassiersthatcanaccommodatethesesuchasHMMsandCRF.Andsincesignlanguage/gesturese-quencesaresequenceswithhighlycontextualinformation,inthecaseofcontinuoussignlanguage/gesturerecognition,atimeseriesclassiersuchasHMMs,CRF,andDTWismoreoftenused.Amongalloftheworks,theHMMisthemostpopularonealongwithitsvariations.TheHMMproducesamodelwhichcanstatisticallycap-turethedierentstatesofasequenceandalsothechangingpropertiesamongthesestates.Itoersaconciserepresentationofcomplexsequencemodelswithdierentvariationswiththestartingprobabilities,transitionprobabilitiesandstatedistribu-tions.Italsooerstheattributesthatasequencecanbeimplicitlysegmentedwhilethelabelingisdone.Worksinsignlanguagerecognition[12,14,27,31,36,37]andgeneralgesturerecognition[20,38{40]haveusedtheHMManditsvariations.Amongthem,subunitsareusedin[37],parallelmodelingisusedin[27],automaticclusteringisusedin[12,31],grammarmodelisusedin[31,36],coupledmodelingisusedin[38],layeredmodelisusedin[39],multiplecandidatesareusedin[20],andmulti-linkedmodelingisusedin[40].ThemajorproblemoftheHMMisthatitisagenerativemodel,whichmayneedmanydatatotrainthestatedistribution.However,thestatedistributionisnot19

PAGE 30

thenaldistributionthatweareinterestedin,inageneralsign/gestureclassica-tionproblem.Therefore,modelingsuchadistributionisnotabsolutelynecessary,althoughitisnatural.Thesameproblemoccursinthetextsequencelabelingcom-munity,whereconditionalmodel,recently,isproposedtoovercomethisproblem.Therehavenotbeenmanyworksinsignlanguage/gesturethatuseconditionalmod-elstoclassifygestures.Sminchisescuetal.arethersttouseCRFforgesturerecognitionin[41]forhumanactionclassication.VariationsofCRFcanbeseenin[30]usingmultiplestatesforeyegazegestureandin[7]usingkeyframesforsignlanguages.Inthiswork,wealsoincludetheworkusingconditionalrandomeldstosimultaneouslysegmentandlabeleachsignlanguagevideosequence.WewillshowintheexperimentsectionthatCRFcanoutperformgenerativemethodina2-classsituation.However,foramulti-classsituationwherethenumberofclassesaretoolarge,themodelcanhardlyndaoptimalboundaryattrainingduetothefactthatthenumberofparametersistoolarge.2.3BackgroundonHandsLocalizationGesturerecognition,andtherelatedareaofautomatedsignlanguagerecognition,isarichareaofresearchsee[25,42,43]forreviewswithmanydierentapplicationsandapproaches,butsharingsomecommonproblemsandsolutions.Vision-basedapproachesallsharetheproblemrelatedtothevagariesoflowlevelsegmentation.Thestatesinastate-space-basedgesturerepresentation,suchastheHiddenMarkovModels[3,44{48]orDynamicTimeWarping[35,49,50]or,FiniteStateMachineFSM[51]approachesarebasedonthelowlevelfeaturesdetectedintheimage.Mo-tiontracksintrajectory-basedgesturerecognitionapproaches[32,52]aredependentontherobustnessofthetrackingprocess,whichinturn,isdependentonthestabilityofthelowlevelsegmentation.Thisproblemoflowlevelsegmentationissometimes20

PAGE 31

addressedbyengineeringtheimagingsetupsoastoeasethesegmentationofhandsbyusingcontrolledlighting,coloredglovesorevennon-vision-basedaidssuchasmag-neticoropticalmarkers.Purevision-basedsolutionsusuallyrelyonskincolorand/ormotioninformationtodetecthands.Theskincolorrelatedsystemsinclude[53{57].However,approachesbasedonpredenedskincolormodelssuerfromsensitivitywithrespecttochangingilluminationconditions.Motion-basedhandsegmentationapproaches[54,58]relyontheassumptionthatthefeaturesimportantforthegesturewillbeassociatedwithmotion.Thisisnotal-waystrueforsignrecognition,whichincludesmovementandholdphases.Fusion[59,60],multi-modal[61],Haar-likeFeatures[62],andaccelerometers[63],3D[64,65]approachescanbeusedtoarriveatbettersegmentationanddetection.However,segmentationwillneverbeperfect.Notonlywilltherebemisseddetections,buttherewillalsobefalsealarms.Thereisdangerthattheseerrorswillbepropagatedtotherecognitionstage.Inthiswork,weadvocateusinganintermediategroupingmodule,coupledwiththerecognitionmodule,tohandlelowlevelsegmentationer-rors.Suchgroupingprocesseshavebeenfoundtobeusefulforobjectrecognitiontasks[66{68],buthavenotbeenusedforgestureandsignrecognition.Thecombi-nationoftop-downandbottom-upapproachesingesturesequencerecognitioncanbefoundin[20]and[21].Althoughtheseapproachescanhandlemultiplecandi-dateobservations,thereisnogroupingprocessincorporated.Forexample,aslidingwindowisusedalongwithaskincolormodelinboth[21]and[20]toobtainthepositionofthemovinghands.However,inrealworldapplication,badlightingcondi-tionsmaycauseproblemsforskincolorapproaches,andaslidingwindowcannotbesucientinsomeapplicationswhereexacthandshapeisneeded.Apartfromhandgestures,Srinivasanetal.alsoproposedgroupingmethod[69]toclassifyhumanbodies.However,theirapproachworksonlyforsingleimages.21

PAGE 32

CHAPTER3HIGHLEVELMATCHINGInthischapter,wewillanswertheresearchquestions1)]TJ/F15 11.955 Tf 12.027 0 Td[(3.Thesequestionsarerelatedtothemovementepenthesisprobleminsignlanguages.Wewillshowhowwecanimplicitlymodelme,howwesolvetheproblemofcombinatoricsinthesearchfortheoptimalsign/mesequence,andhowwecanincorporateagrammarmodelinourframework.Thesethreeresearchquestionsareoutlinedagainasbelow:1.Canwehandlethemovementepenthesisproblemwithouttheneedforexplicitlymodelingmesegments?Ifwedonotexplicitlymodelmesegments,howcanweassociateamatchingscoretoeachmesegment?2.NotonlydoweneedtodetecttheexistenceofeachmeinanASLsentence,butalsoweneedtoexplicitlylocatethepositionandthelengthofeachme,eventhoughwedonotknowbeforehandhowmanymeswewillhaveinasentence.Inaddition,anmecanhappenatanypositioninasentenceintermsofframenumber,anditisnotalwaysthesamelength.Soinordertoconductthesearch,wemustsearchallthepossiblestartpositions,alongwithallthepossiblelengthsandallthepossibleoccurrencesofme.Thissearchspacecanbehuge.Howcanwelimitit?3.Howcanastatisticalgrammarmodel,suchasbigramsandtrigrams,beincor-poratedintothesolutionapproach?22

PAGE 33

3.1ProblemFormulationOurmodelbaseofsignsconsistsofinstancesofindividualsigns,fS1;S2;;SVg.Eachinstanceofasign,Si,isasequencedsetofindividualframes.Inadditiontothesesigns,weusemesymbolsorlabelsofvariouslengthstorepresentmovementepenthesis.Theyvaryinlengthfrom1toNmax,themaximumperiodoverwhichthemovementepenthesiseectcanpersist.Wechosenottohaveexplicitmodelscorrespondingtothesesymbols.LetaqueryortestsequenceoflengthMbedenotedbyT.AsolutiontothematchingproblemwouldconsistofasegmentationofTintosignsandmovementepenthesis,alongwithlabelsforeachsegment.Wedenotethesegmentationofasentenceusingthesequenceofindices:fj0;j1;;jLkg,whereLkdenotesthenumberofsegmentsinthekpossiblesegmentationsofthesentence.Thus,therstsegmentisfromindexj0=1toj1,thesecondsegmentisfromj1+1toj2,andsoforth.LetSkidenotethesignlabelfortheithsegmentoverframestji)]TJ/F19 5.978 Tf 5.756 0 Td[(1totji.Then,thekthpossiblesolutionsequenceisdenotedbySk=fSk1;Sk2;;SkLkg.1whereSk1istherstsignlabelinthesequence,Sk2isthesecondsignlabelinthesequence,andsoforth.Notethatsomeofthesesignlabelscouldbethemelabelsofdierentlength.LkdenotesthenumberofsignsinsequenceSk.Thetotalnumberofsuchpossiblelabelsequencesis,ofcourse,exponential.Ourobjectiveistondasequenceofsignandmovementepenthesismelabels,Se,amongallpossiblesignsequencessuchthatthedistancebetweenSkandTis23

PAGE 34

minimized.Thatis,weneedtondSesuchthatSe=argminCostSk;T:jLk=argminfj1;;jLkgminfSk1;;SkLkgPLki=1DSki;Tji)]TJ/F18 7.97 Tf 6.586 0 Td[(1:ji.2whereD:isthefunctiontocomputethesinglesignmatchingcostandTji)]TJ/F18 7.97 Tf 6.587 0 Td[(1:jiisasegmentofthequerysentencebetweenindicesji)]TJ/F18 7.97 Tf 6.586 0 Td[(1andji.Thenatureofthiscostfunctioncandierbasedonthesituationathand.Forinstance,ifwehaveverygoodsegmentationofhandsandfaces,thenonecouldconstructreliablefeaturevectorsforeachframe.Insuchsituations,thedistancewouldbeconstructedbyDynamicTimeWarpingofthesegments.Ifontheotherhand,wedonothavereliableextractionofhands,thenwesuggestamorecomplexsolutionthatinvolvesoptimizingoverpossiblehandcandidates.Wewilllookintothesedistancecomputationsmethods.But,beforethat,letusconsiderhowweperformtheoptimizationinEq.3.2,assumingthatwehavebeengiventheexistenceofdistancemeasure.3.2TheEnhancedLevelBuildingAlgorithmThesolutionofEq.3.2isoverallthepossiblesignsequencecandidates,withallpossiblelengthsofeachsign,fSk1;Sk2;;SkLkg.Thissearchspaceisverylarge.Westructurethesearchfortheoptimalsolutionusingdynamicprogramming,specically,theLevelBuildingapproach[70]andenhanceittoallowformovementepenthesismelabels.3.2.1DynamicProgrammingTheoverallminimizationcanbeexpressedrecursivelyasoptimizationofonelabelandtheminimumcostfortheremainingsentence.Ifwestructurethisoptimization24

PAGE 35

separatingthelastlabel,wehaveminCostSk;T:jLk=minSkLk;jLk)]TJ/F19 5.978 Tf 5.756 0 Td[(1DSkLk;TjLk)]TJ/F18 7.97 Tf 6.586 0 Td[(1:jLk+CostSLk)]TJ/F18 7.97 Tf 6.587 0 Td[(1;Tj1:jLk)]TJ/F18 7.97 Tf 6.587 0 Td[(1.3Basedonthisdecompositionoftheproblem,eachleveloftheLevelBuildingapproachcorrespondstothelabels,inorder,inthetestsentence.Thus,therstlevelisconcernedwiththerstpossiblelabelinthesentence.Therstlabelcouldcoverdierentpossiblelengths.Thesecondlevelisconcernedwiththesecondpossiblelabelfortheportionofthesentencethatbeginsaftertherstlabelends,andsoforth.Eachlevelisassociatedwithasetofpossiblestartandendlocationswithinthesequence.Andateachlevel,westorethebestpossiblematchforeachcombinationofendpointfromthepreviouslevel.Theoptimalsequenceofsignsandmelabelsisconstructedbybacktracking.Foreachlevell,westoretheoptimalcostformatchingbetweensignSiandwiththeendingframeasmusinga3dimensionalarrayAl;i;m;1lLmax;1iN;1mM,whereAl;i;m=8><>:DSi;T:mifl=1mink;jAl)]TJ/F15 11.955 Tf 11.955 0 Td[(1;k;j+DSi;Tj+1:motherwise.4Tj:mdenotesasubsequenceofthetestsequencethatstartsatthejthframeandendsatthemthframe.ThisrecursionispicturedinFig.3.1.ThequantityAl;i;mgivesustheminimumcumulativescoreformatchingllabels,withtheithmodelsign,Si,asthelastlabeltothetestsequenceuptothe25

PAGE 36

Figure3.1TherecursivenatureoftheLevelBuildingalgorithm.TocomputetheAl;i;m,wesearchamongalltheresultsinthepreviouslevelAl)]TJ/F15 11.955 Tf 11.695 0 Td[(1;k;jplusthecurrentlevel'smatchingDSi;Tj+1:mandndtheminimumvalue.mthframe.TheoptimalmatchingscoreDis:D=minl;iAl;i;M.5Toenableustoreconstructtheoptimalsignsequencebybacktracking,weuseapredecessorarray,whoseindicescorrespondtoA:l;i;m;1lLmax;1iN;1mM,wherel;i;m=8><>:)]TJ/F15 11.955 Tf 9.298 0 Td[(1;ifl=1argminkAl)]TJ/F15 11.955 Tf 11.955 0 Td[(1;k;j+DSi;Tj+1:motherwise.6Fig.3.2illustratesthepossiblematchingsequencessearchedduringtherecursivesearchprocess.Attheendofeachlevel,weobtainthebestmatchedsequences.Forexample,atlevel1,allthematchingmuststartatframe1.Therearearangeofpossibleendingframesforlevel1.Foreachpossibleendingframe,weobtained26

PAGE 37

abestmatchingsign,forinstanceS1;S5;S2;Sy+4;S2;S9respectively,showninthegure.Thenatlevel2,weagainhavearangeofpossibleendingframes.Thestartingframewillbeaftertheendingoftherstlevel.Foreachendingframe,wendthebestcumulativematchingscorewecanhaveamongallthesignsandpossiblestartingframes.Wecontinuethisprocessforallthelevels.Matchingsthatendatthelastframeresultinonepossiblematchingsequence,whichcanbeconstructedbybacktrackingfromthelastframe.SomeexamplesequencesshowninthegurearefS9;S1g;fS2;S8;S9g;fS1;me;S2;meg.NoteallthesignsSV+kareactuallymelabels.Thisprocessalsoshowsusouranswertotheresearchquestion2.Tolimitthesearchspace,weusethedynamicprogrammingapproach,wheretheintermediatesearchforapartialsequenceresultcanbeusedtobuilduptowardsthenalsearchresultforthewholesequence.TheuseofthemelabelistheessentialdierencebetweentheclassicalLevelBuildingformulationforrecognizingconnectedwordsinspeechandourformulationforrecognitionofconnectedsignsinsignlanguages.Weenhancetheclassicalformu-lationbyallowingforsuchlabels,hencethenameenhancedLevelBuildingeLB.However,allowingforsuchlabelisnotequivalenttotheadditionofanadditionalsignlabelsinceitisnotobvioushowtochoosethecostofmelabelbecausetherearenorealsampleofit.Wechoosethecostofassociatinganmelabeltoanobservationsequencetobeproportionaltoitslength.DSV+k;Tj+1;m=m)]TJ/F20 11.955 Tf 11.955 0 Td[(j.7Thisraisesthequestionofhowdoesonechoosetheproportionalityconstant,.Oneviewpointisthatthisisreallyapenaltycostofassigninganmelabeltoaframe.Thispenaltyshouldbelargerthanagoodmatchscorewecannd,sinceeachtime27

PAGE 38

Figure3.2TheresultoftheenhancedLevelBuildingmatchingprocess.Thereare3completematchedsequencesendingatlevels2through4.Thebestoneamongthesethreewillbereturnedasthematchingresultfortheselevels.Note,allthesignsSV+kareactuallymelabels.28

PAGE 39

wendagoodmatchtoaportionoftheunknownsequencefromourdatabase,wewanttokeepit.Atthesametime,thepenaltyshouldbesmallerthananon-matchscore,becauseeachtimewecannotndanygoodmatch,weneedtomakesurethemematchisselected.Anon-matchscoreisobtainedformatchingtwodierentsignsandamatchscoreisobtainedwhenmatchingdierentinstancesofthesamesign.Toobtainthesescoresweconsiderthedistributionofmatchandnon-matchscoresbetweensignsinthetrainingset,computedusingDynamicTimeWarpingdiscussedlater.Theoveralldistancesarenormalizedbythelengthofthewarpingpath.Thedistributionofthesescorestypicallyhasoverlap.Wesearchathresholdvaluethatonecanusetoclassifythesescoresintomatchandnon-matchones.WechoosetheoptimaltobetheoptimalBayesiandecisionboundarytoaccomplishthis.However,insteadofparametricallymodelingeachdistributionmatchandnon-matchandthenchoosingthethreshold,weuseahistogram-basedrepresentationtosearchforit.Withthis,weansweredourresearchquestion1.Basically,thewayweimplicitlymodelmeisthatwedonotassociateanyactualframetotheme,butweuseaboundaryscoretodescribethematchingcostofmeagainstthetestframedirectly.However,traditionalmethodswhichexplicitlymodelmewillneedtheinformationfromtheactualframesintrainingdata,andthematchingscoreisactuallycomputedagainstthetestdataatthetestingtime.3.2.2GrammarConstraintTheexplorationsateachlevelcanbeconstrainedbystatisticalgrammarinforma-tionsuchasthosecapturedbyn-gramstatistics.Weillustratethisusingabigrammodel.Weuseasample-basedmodelofthebigram,insteadofanhistogramone.We29

PAGE 40

representitusingarelationshipmatrixRi;j;1iN;1jN,wherewehaveRi;j=8><>:1;ifSicanbethepredecessorofSj0;ifSicannotbethepredecessorofSj.8WesetRbasedonobservedinstancesinthetrainingtextcorpus.Entriesaresetto1or0ifanexampleiseitherfoundornotfoundinthecorpus.Notethatthisisdierentfromthehistogramofcountsusedintraditionaln-grams.Duetothelimitednatureofthesamples,wedonotusecounts.Essentially,ifwehavesomeevidence,wesettheprobabilityofthatoccurrenceasbeingone.Thisisaveryliberalchoiceofgrammarconstraint.Toallowformelabelsbeforeandaftereachsign,weuseRi;j=1;ifi>Vorj>VAfterobtainingR,theeLBalgorithmcanbeconstrainedwiththepredecessorrelationshipbasedontherelationshipmatrix.Notethatsinceweallowanmelabeltoexistbetweenanytwosigns,localbacktrackingmayneedtobeperformedwhileenforcinggrammarchecking.Forexample,assumeatthecurrentlevel,weareexam-iningthesignSi.Ifthepredecessorwefoundalongtheoptimalpathisanmelabel,weneedtobacktrackuntilwendarealsignSjalongtheoptimalpath.GrammarcheckingisperformedultimatelybetweenSiandSj.Wedenotetheresultofthelocalbacktrackingfortheminimumcumulativedis-tancematrixAas:Bi;l;m;k;j=.9whereSistheactualpredecessorwefoundusingthelocalbacktrackingscheme,whencomputingAl;i;m,alongthepathwherethepredecessorisl)]TJ/F15 11.955 Tf 11.955 0 Td[(1;k;j.30

PAGE 41

Hence,toincorporateagrammarconstraintintooursystem,wecanupdateEq.3.4andEq.3.6as:Al;i;m=8>>>><>>>>:DSi;T:mifl=1mink;jAl)]TJ/F15 11.955 Tf 11.955 0 Td[(1;k;j+DSi;Tj+1:m,suchthatR;i=1;=Bilm;k;jotherwise.10andl;i;m=8>>>><>>>>:-1,ifl=1argminkAl)]TJ/F15 11.955 Tf 11.955 0 Td[(1;k;j+DSi;Tj+1:m,suchthatR;i=1;=Bilm;k;jotherwise.11Inthissection,weansweredtheresearchquestion3.Weusethetextcorpustobuildthegrammarconstraintandusethistoprunesentencesthatarenotmeaningful.ThisisthesameasnormalLB.Thedierenceisthatweneedtodolocalbacktrackingtoskipmetobuildtherealsentenceforgrammartestinourframework.31

PAGE 42

CHAPTER4SINGLESIGNMATCHINGTocomputethenaloptimalsequenceusingtheeLBframework,weneedtobeabletocomputethecostbetweenamodelsignwithasubsequenceofthetestdata,whichismathematicallyexpressedasDSki;Tji)]TJ/F18 7.97 Tf 6.587 0 Td[(1:jiinEq.3.2.Therearetwoscenariosthatweconsiderforthismatchingcost.Therstiswhenwehaveasinglefeaturevectordescribingeachimageframe,andanysignisasequenceofthesefeaturevectors.Thiswouldbepossiblewhenonehasafairlygoodhanddetectioncapabilitybycontrollingthebackgroundandclothing.Tocomputethesinglesignmatchingcostundersuchsituations,wesimplycomputetheDynamicTimeWarpingDTWcostbetweenthetwosequences.Asthecostformatchingoneframefromamodeltooneobservationframe,weconsideronepossiblecostfunctionbasedonspatialdistributionoftheimagefeatures.Wewilldiscussthiscostinalatersection.Thesecondscenario,whichisthemostcommonone,ariseswhenwedonothaveasingledetectedhandregionforeachhandineachframe,buthavemanypossiblehandregions.Foreachframe,wecandetectmanypossiblehandcandidateregions.Wecanpairthesecandidatestogeneratemanypossiblehandcandidates.Thisarisesinuncontrolledimagingsituationswithcomplexbackgroundandlackofcontroloverclothing.Here,theuseofglobalfeaturesisobviouslynotreasonable.Onehastooptformorepart-basedrepresentations.Inthischapter,wewilldescribeourmodiedalgorithmtosolveEq.3.2.Thiswillinvolveasolutioninboththedeterministiccaseandtheprobabilisticcase.WewillshowthetwoalgorithmsseparatelyinSection4.132

PAGE 43

andSection4.2.Westrivetoanswertherstpartofresearchquestion4inthischapter:4.Howcanwehandleimperfectsegmentationatthelowlevel?Howcanoneusefeaturegroupingprocessestoovercomesegmentationerrors?4.1CouplingGroupswithDeterministicMatchingAlgorithmWeperformtherecognitionbasedonmultipleobservationsusingboththedeter-ministicapproachandstatisticalapproach.Forthedeterministicapproach,recog-nitionisconductedbymatchinggroupsfoundinanygivensequencetoeachmodelsequence.Inthemodelsequence,therealhandgroupisextractedmanuallyframebyframe.Thegoalofthematchingistondonecandidategroupsequenceoutofthemanyavailable,directedbythelinkedstructure,whichcanbebestmappedtothemodelsequence.Thisprocessalsoallowsfortimewarping,andisshowntobesolvablebydynamicprogramming.Aftermatchingtoeachmodelsequence,theoneswithlowerdistancescoresareconsideredastherecognitionresults.4.1.1FormulationoftheMatchingProcessLettheithcandidategroupinthekthframeberepresentedasGki.Also,letKbethenumberofframesinthetestsequence.Similarly,themotionmodelwillconsistofasequenceoffeaturevectors,M=fm1;;mTmgthatwillhavetobematchedtothesequenceofcandidategroups,witheachmodelfeaturevectormappedtoonegroup.Thematchingscoreisrepresentedwitha3DMatrixS,wherethei;j;gelementofSdenotestheMahalanobisdistancebetweentheithmodelfeaturevectorandthegthcandidategroup'sfeaturevectorinthejthframe.33

PAGE 44

ThewarpingpathisasequenceofelementsofSdenotingthematching.Sincethemodelgroupcanbemappedtooneofthecandidategroupsinoneofthetestsequenceframes,thewarpingisconductedinboththetimedomainandthecandidategroupdomain.Ifthecardinalityofthecandidategroup'sfeaturevectorsetisone,thenofcourse,thiscorrespondenceestablishmentistrivialonlytimewarpingisneeded.Otherwise,wehavetoselectbetweenthepossiblecandidategroups.Wecastthisproblemasaminimizationproblemthatwesolveusingdynamicprogramming.Formally,wehavetondasequenceofelements,onefromeachcandidateset,whichbestmatchesthemodelsequenceoffeaturevectors.Let,1.kt=beamulti-valuedfunctionthatmapstheindicesofthewarpingpath,denotedbyt,tothe3DcoordinatesinSwherethemodel'sithframeismatchedwiththegthcandidategroupofthejthframeinthetestsequence.2.letdmi;Gjgrepresentthecostofmatchingthemodelgroupfeaturevectorfromtheithimageframe,mi,withthegthgroupfeaturevectorfromthejthimageframe,Gjg.Thenthetotalmatchingcostcanbecastasaminimizationproblem.Moreformally,minktXtdmi;Gjg!.1Fig.4.1illustratestheminimizationspace.Itisa3Dspacespannedbythemodelsequencetimeindex,i,thegivenimagesequencetimeindex,j,andthefeaturevectorindexintothecandidategroupsets,g.Eachpointinthatspaceisassociatedwithacostdenedbetweenthecorrespondingimageandmodelgroups.Weseekacurve,denedbykt,thatminimizesthetotalcostfunctionoverthiscurve,withthefollowingconstraintinboththecandidategroupdomainandthetimedomain.34

PAGE 45

Figure4.1Illustrationoftheminimizationproblem.Wehavetondthewarpingpaththatminimizesthedierencebetweenthemodelsequenceandthetestsequence.Possiblesolutionsarecurvesina3Dspace,spannedbymodelsequenceindexi,imagesequenceindexj,andcandidategroupsetindexg.Whenwematchagesturetoagesture,whereamodelsignwithlengthTmismatchedwithatestsignwithlengthK,thiscurvestartsatandendsat.Whenwematchasign/gesturetoasignsentence,whereamodelsignwithlengthTmismatchedwithasentencewithlengthK,thiscurvecanstartatanyplacewithi=1andendatanyplacewithi=Tm.Wealsoenforceaconstraintwhenassociatingadjacentframes.Thisconstraintdenesofallthepossiblepredecessorsofanodeonthewarpingpath.TheconstraintweusewhenseekingthecurveisillustratedinFig.4.2.Inthetimewarpingdomain,weusethegenerallocalconstraints[71].Inthecandidategroupdomain,weusethepredecessorrelationshipinEq.5.3astheconstraints.Fig.4.2illustratesforusthenodesinthe3Dspace,withthepredecessorsshownasarrows.Onlypredecessorsofafewofthenodesareillustratedinordertoshowtherelationshipforthereader.AlocalillustrationisshownatFig.4.3,wherei;j;g11isa35

PAGE 46

Figure4.2Nodeswiththepredecessorsshownpartlyforbetterview.nodewith7predecessors,i;j)]TJ/F15 11.955 Tf 9.876 0 Td[(1;g11istheonewhichhavethepreviousmodelframe,butthesametestframeandcandidategroupsasi;j;g11,i)]TJ/F15 11.955 Tf 12.002 0 Td[(1;j;g21,i)]TJ/F15 11.955 Tf 12.003 0 Td[(1;j;g22,i)]TJ/F15 11.955 Tf 12.925 0 Td[(1;j;g23aretheoneswhichhastheprevioustestframe,butthesamemodelframeasi;j;g11,andtheyhavedierentcandidategroupsintheprevioustestframe.Similarly,i)]TJ/F15 11.955 Tf 10.521 0 Td[(1;j)]TJ/F15 11.955 Tf 10.521 0 Td[(1;g21,i)]TJ/F15 11.955 Tf 10.52 0 Td[(1;j)]TJ/F15 11.955 Tf 10.52 0 Td[(1;g22,i)]TJ/F15 11.955 Tf 10.52 0 Td[(1;j)]TJ/F15 11.955 Tf 10.52 0 Td[(1;g23aretheoneswhichhavetheprevioustestframeandpreviousmodelframecomparedtoi;j;g11,andtheyhavedierentcandidategroupsintheprevioustestframe.4.1.2DynamicProgrammingThedynamicprogrammingcanbeusedtoobtaintheoptimalwarpingpathinourproblem.Ina3DmatrixD,letDi;j;grepresenttheminimumcumulativecostofmatchingthemodelsequence,fm1;;mtg,tothecandidategroupsetsequenceuptoi;j;g.Theoptimalsubstructureoftheproblemallowsthefollowingrecursive36

PAGE 47

(a)(b)(c) (d) Figure4.3Illustrationofthelocalconstraints.(a)Alllocalconstraintsofonenode. (b)Groupofpredecessorswiththesametestframenumberbutdierentmodelframe number.(c)Groupofpredecessorswiththesamemodelframenumberbutdierent testframenumber.(d)Groupofpredecessorswithdierentmodelframeandtest framenumber. formula. D (i;j;g )= d(m i ;G j g )+min 8 > > > > < > > > > : min r 2Pre(G j g ) D (i;j )Tj/T1_0 11.955 Tf11.955 0 Td(1;r ) min r 2Pre(G j g ) D (i )Tj/T1_0 11.955 Tf11.955 0 Td(1;j )Tj/T1_0 11.955 Tf11.955 0 Td(1;r ) D (i )Tj/T1_0 11.955 Tf11.955 0 Td(1;j;g ) (4.2) Hereweuseaconstraintthatthecoordinate( i;j;g )inthedynamicprogramming spaceisdependentonthelocations,( i;j )Tj/T1_0 11.955 Tf12.699 0 Td(1;g ),(i )Tj/T1_0 11.955 Tf12.699 0 Td(1;j;g ),and( i )Tj/T1_0 11.955 Tf12.699 0 Td(1;j )Tj/T1_0 11.955 Tf12.699 0 Td(1;g ). Thisisbasedonthegenerallocalconstraints[71].Thesolutionto D (i;j;g )isthe solutiontoourproblem.Thisalsoanswersthefrstpartofourresearchquestion4. Toobtainthesolutionwithouttheneedofperfectsegmentation,weusedamultiple observationsapproach.Soatthelowlevel,wedonothavetomaketheharddecision aboutwheretheimportantparts(hands)arereallyat.Wealsohaveanewframework torecognizethesemultipleobservations,basedonadynamicprogrammingapproach. 37

PAGE 48

Wewillshowinthenextsectionthatthiskindofmatchingcanalsobeconductedusingaprobabilisticapproach.4.2CouplingwithHiddenMarkovModelsInthissection,weshowthemultiplecandidatessequencecanalsobematchedtoastatisticalmodellikeaHMM.WhilethestructureandthetrainingoftheHMMisafairlystandardone,thedecodingprocess,i.e.,computingthelikelihoodofanimagesequencetotheHMM,issignicantlydierentandnew.EachgesturegiismodeledusingaHMMioverNstates.Thestateatframekisdenotedasqk,whereqk21;;N,aij=P[qk+1=jjqk=i]isthestatetransitionmatrix.Theinitialstatedistributionisdenotedas=i,wherei=P[q1=i]istheprobabilitythatstateisiatframe=1.TheobservationprobabilityismodeledasamixtureofGaussiandistributions.TheobservationvectorisdenotedasO=[O1;;OK]withKtobethelengthofO.ItsprobabilityatstatejiscomputedasbjO=PMt=1cjtO;jt;jt,whereisaGaussianwithjtasthemeanvectorandjtasthecovariancematrix,cjtisthemixturefactorandMisthenumberofmixturecomponents.Attraining,wehaveobservationsequencesO=Oj;j=1;;K.Theaboveparameters[aij;i;cjt;jt;jt]arefoundtomaximizethelikelihoodPOj.WeusetheBaum-WelchestimationprocesstotraintheHMM.ThedecodingormatchingprocessisradicallydierentfromconventionalHMMs.InconventionalHMMs,theactualstatesequenceisunknown,buttheobservationse-quenceisunique.However,invisiongestureapplication,weconsidertheobservationsequencetobenon-unique.InconventionalHMMs,theinputobservationfeaturevectorO=[O1;;OK]isknownforeachframeandthelikelihoodPOjcanbecomputedusinganiterativeforwardpassprocess.Inourframework,however,wedonotassumethatweknowtheexactobservationvectorOkateachframek.Instead,38

PAGE 49

weallowformultiplehypothesesabouttheobservation.Atframek,wehavethegroupsetsGk=[Gk1;;Gkck],whereeachelementinGkisonepossibleobservationandckdenotesthetotalnumberofgroupsinframek.Weassumeonlyoneelementintheobservationsetisthetrueobservation.Wedonotdecideuponthebestgroupforeachframeindependentlyoftheothers.Theentiresequenceofgroupsetsisusedastheinput.Wewilldiscusstheproblemrelatedtotheoptimalobservationsequenceandproposed3approachestocomputethematchingscorewithsuchaninput.4.2.1MaximalObservation,SummedStateWearegivenasequenceofgroupsets:G=;.3whereGk=[Gk1;;Gkct];1kKisthegroupsetatframek.Theoptimalobservationsequenceproblemistondonegroupsequencethatmaximizesthelikelihood,summedoverthepossibleHMMstatetransitions,Psumj,whereistheHMMand=<1;;K>;i2Gk;1kK;k)]TJ/F18 7.97 Tf 6.586 0 Td[(12Prek.4WedenotethemaximumvalueoflikelihoodprobabilitybyPmax;sumGj=maxk=1;;KPsumkj.5whereKisthenumberofallpossiblesequencesofgroups.TheprobabilityPsumkjrepresentsthelikelihoodofthegroupsequence,summedoverallthepossibleHMM39

PAGE 50

statesequences.Foreachsequenceofgroups,thecomputationofPsumijcanbedoneusingthestandardforward-backwardalgorithmusedforHMMs.AbruteforcesolutionforEq.4.5willbetoenumerateacrossthesetsG1;;GKtogetallpossibleobservationsequences[1;;K],computethelikelihoodforeachoftheobservationsequences,andselectthemaximumvalue.Obviously,ex-haustiveenumerationiscomputationallyexpensive.Henceweresorttoapproxima-tionbasedonincrementalconstructionoftheoptimalsequence.Tondthebestgroupatframek,supposetheobservationsequenceatframe1;;k)]TJ/F15 11.955 Tf 13.133 0 Td[(1hasbeenrecoveredas1;;k)]TJ/F18 7.97 Tf 6.587 0 Td[(1.Wedenetheindexedforwardvariablejkias:jki=P1;;k;qk=i;k=Gkjj.6Thatis,theprobabilityofthepartialobservationsequence<1;;k>,atframekthestateisiandtheobservationvectorisGkj,and<1;;k)]TJ/F18 7.97 Tf 6.586 0 Td[(1>istheobservationvectorswehavefoundattime1;;k)]TJ/F15 11.955 Tf 11.955 0 Td[(1Theinitializationofthevariableis:j1i=ibiG1j4.7andwehave1=G1p;p=argmaxjNXi=1j1i.8Theinductionsolutionisjk+1i=[NXt=1pktait]biGk+1j;k=Gkp.940

PAGE 51

andthenk+1isselectedas:k+1=Gk+1p;p=argmaxjNXi=1jk+1i.10AtframeK,theobservationvectorsequenceiscomputedas<1;;K>.Atthesametime,theprobabilityofthisobservationsequencegiventheHMM,canbecomputedasP<1;;K>j=maxjNXi=1jKi.11Fig.4.4illustratesforustheindexedforwardprocess.ThesummationoftheproductoftheforwardvariablesandtheobservationprobabilitiesremainthesameasinconventionalHMMs.Thedierenceisthatwetaketheobservationvectordynamicallybasedonthepreviouslydecidedobservations,whileatraditionalHMMhasastaticxedobservationvector.NotetheresultofEq.4.11isnotanexactsolutionforEq.4.5.Instead,itisthesolutiontoselectthebestcurrentobservationbasedonacertainselectedpartialobservationsequence.4.2.2SummedObservation,SummedStateInsteadofconsideringthemaximumprobabilityoverallpossiblegroupsequences,wecouldconsiderthesummationoverallpossiblegroupsequences.Thus,theprob-abilityofinterestis.Psum;sumGj=Xk=1;;KPsumkj.12wherethepossiblesequenceofgroupsare1;;K.TheprobabilityPsumjrepresentsthelikelihoodofthegroupsequence,summedoverallthepossibleHMMstatesequences.Asbefore,foreachsequenceofgroups,thecomputationofPsumij41

PAGE 52

Figure4.4Illustrationoftheindexedforwardprocess.Thehorizontallinerepresent thetime,theverticallinecorrespondtothecandidateobservationsandthesubverticallinedenotesthe N states.Noteateachtimestep,onlyonebestobservation isselectedbasedonthepreviousselectedobservationsandtheforwardingresults.In thisexample,theoptimallyselectedobservations(circledones)are < 1; 2; 2; 3 >. 42

PAGE 53

canbedoneusingthestandardforward-backwardalgorithmusedforHMMs.How-ever,wefoundtheprocessofsummingoverallgroupsequencesandoverallstatesequencescanbeeectivelymergedinthedynamicprogrammingprocess.Todothis,wedenethegroupingforwardvariablejkias:jki=X1;;k)]TJ/F19 5.978 Tf 5.756 0 Td[(1P1;;k;qk=i;k=Gkjj.13Thatis,thesummationofthepartialprobabilityofallthegroupsequencesthathavek=Gkjandqk=i.Theinitializationisj1i=ibiG1j.14Theinductionisjk+1i=[Xp2PreGkjNXt=1pktait]biGk+1j.15AndtheresultofEq.4.12isobtainedattheendoftheprocess:Psum;sumGj=Xp2PreOkjNXt=1pKt.164.2.3MaximalObservation,MaximalStateThethirdquantityofinterestismaximumprobabilityoverallthepossiblegroupsequencesandHMMstatesequences.Thus,theprobabilityofinterestis.Pmax;maxGj=max1;;Kmaxq1;;qKP1;;K;q1;;qKj.1743

PAGE 54

wherethepossiblesequenceofgroupsare1;;Kandq1;;qKisaHMMstatesequence.Thisquantitycanagainbecomputedusingdynamicprogramming.Wedenethemax-forwardvariablejkias:jki=max1;;k)]TJ/F19 5.978 Tf 5.757 0 Td[(1P1;;k;qk=i;k=Gkjj.18Thisisthemaximumpartialprobabilityamongallthegroupsequencesthathavek=Gkjandqk=i.Thevariablekrepresentsthebacktrackindexoftheobservationsforthecorrespondingmax-backwardprocess.Theinitializationis:j1i=ibiG1j.19a1=0.19bTheinductionisgivenbyjk+1i=[maxp2PreGkjNmaxt=1pktait]biGjk+1.20ak=argmaxp2PreGkjNmaxt=1pktait.20b1;2;:::;Kisobtainedasthebestgroupsequenceoverthebeststatesequenceandthisgroupsequencecanbeusedtogetthematchingscore.44

PAGE 55

CHAPTER5FEATUREREPRESENTATIONWehavediscussedboththehighlevelmatchingsandlowlevelmatchingswithmultiplecandidatesintheprevioustwochapters,wherewestrivetosolvetheEq.3.2.Inthischapter,wewilldiscussthedierentapproacheswetaketogeneratethemultipleobservationfeaturevectorstobefedintothelowlevelmatchingframework.Inthischapter,wewillalsoanswerthesecondpartofourresearchquestion4,4.Howcanwehandleimperfectsegmentationatthelowlevel?Howcanoneusefeaturegroupingprocessestoovercomesegmentationerrors?5.1LowLevelRepresentationInthissection,wedescribeourlowlevelprocesses.Manyofthemodulesusedarefairlystandardones,exceptforthebackgroundmodelingscheme.Hence,wehaveplacedthissectionafterdescribingourcorecontributions,whichisinthematchingprocess.Tosegmentthehandsautomatically,weuseskincolorandmotion.Aftersegmentingthehands,wewillconstructtwokindsoffeaturesvectors:aglobalfeaturevectorandapart-basedfeaturevector.Wewillexperimentwithboththesefeaturetypesinourexperimentsinhead-to-headcomparisonsandalsodemonstratethatthematchingmethodoutlinedinthisworkcanbeusedinconjunctionwithdierentfeaturetypes.45

PAGE 56

5.1.1DetectionofHandsTheassumptionthatwemakeisthatthehandsmovefasterthanotherobjectsinthesceneincludingtheface,andthatthehandareacanbesomewhatlocalizedbyskincolordetection.WeusethemixedGaussianmodelofJonesetal.[72],withasafethresholdallowingforsomeamountofnon-skinpixelstobefalselyclassiedasskinpixels.Tosegmentbasedonmotion,weemployakey-frame-basedbackgroundmodel.Werepresentthepossiblychangingbutslowlybackground,usingasetofkeyframes.Thesekeyframesareidentiedasframesthataresucientlydierentfromeachother.Wechoosetherstframeasonekeyframeandthensequentiallysearchfortherestofthekeybackgroundframes.Wecomputethedierenceofanyframewiththepreviouskeyframe.Ifthenon-componentsizeinthethresholdeddierenceimageislarge,thentheframeislabeledasthenextkeyframe.Thisprocesscontinuesuntiltheendofthesequence.Thenwecomputethedierenceimageofeachframetoallthekeyframes.Thisdistanceisthresholdedandpostprocessedusingmorphologicaloperations.ThespecicsoftheapproachareoutlinedbelowandsomeillustrativeresultsareshowninFig.5.1,whereStep2egeneratesthemotion-skincondencemap.Step2fgeneratesitsboundary.ForeachsentenceTwithNframes1.Assignrstkeyframek1=1,andinitializekeyframecounterm=1.Forframei=2;;NaComputedierenceimagebetweenTiandTkm.Findthelargestcon-nectedcomponentinthedierenceimageintermsofitsnumberofvalidpixelsNp.46

PAGE 57

abc def ghiFigure5.1Intermediateresultsfortheprocessofhandsegmentation.aOneframeinasequence.bConsecutiveframedierenceimage.cSkinpixelsfound.dFramedierenceimagewithkeyframes.eEdgesfoundind.fAfterdilatinge.gAfteranANDoperationwiththemaskinfwithd.hAfterremovingsmallcomponentsing.iBoundaryofthecomponentinh,whichisthenalhand.47

PAGE 58

bIfNp>thresholdT0,setm=m+1,setkm=i.cSeti=i+1.Ifi>Ngotonextstep,elserepeatabovesteps.2.Forframesi=1;;N,repeataComputeadierenceimageSD,whereSD=mj=1jSi)]TJ/F20 11.955 Tf 9.959 0 Td[(Skjj=m)]TJ/F15 11.955 Tf 9.958 0 Td[(1.bMaskSDwiththeskinlikelihoodimage.DoedgedetectiononSDandobtaintheedgeimageE.cApplyadilationltertoE.dForeachvalidpixelinE,setthecorrespondingpixelofSDtobe0.eRemovethesmallconnectedcomponentsinSD.Thisstepgeneratesthemotion-skincondencemap.fExtracttheboundaryimageB.Wehavefoundtheuseofkey-frame-basedbackgroundsubtractiontobemoreeectivethanusingalltheframestoestimatethebackground,atleastforourkindsofsequences.Fig.5.2showsoneillustrationofresultwegetwhenthekeyframesarenotused.Instead,alltheframesareusedwhenSDiscomputed.Somefeaturesareblurredwhenthereisslowmotionorrepeatedmotioninasign.Ontheotherhand,withthekeyframeapproachinFig.5.3,welocatethehandswithstrongercondence.5.1.2GlobalFeaturesWerstgeneratethefeaturevectorsusingtheboundarymotion-skincondencemapobtainedaboveinStep2f.Giventhehandboundaries,wecapturetheglobalspatialstructurebyconsideringthedistributionhistogramofthehorizontal48

PAGE 59

OriginalSDatStep8Skin SDafterskinmaskEatStep9EatStep10 SDatStep11SDatStep12BatStep13Figure5.2Handsegmentationresultsusingallframes.Handsegmentationresultsobtainedwiththefullsequence,whereeachframeistreatedasakeyframe.Welostsomefeatureswhenthehandmovesslowly.49

PAGE 60

OriginalSDatStep8Skin SDafterskinmaskEatStep9EatStep10 SDatStep11SDatStep12BatStep13Figure5.3Handsegmentationresultsusingkeybackgroundframes.WecanseeatStep8wehavestrongercondenceaboutwherethehandisbrightervalueoverhandpixels.SDisthecumulativedierencewiththebackground.50

PAGE 61

andverticaldistancesbetweenpairsofpixelsinit.Wecomputethejointrelationalhistogramofthedisplacementbetweenallpairsofcoordinatesonboundaryimages.Wethenrepresenttheserelationalhistograms,normalizedtosumtoone,aspointsinaspaceofprobabilityfunctionsSoPF,likethatusedin[73].TheSoPFisconstructedbyperformingaprincipalcomponentanalysisoftheserelationalhistogramsfromthemodelimages.ThecoordinatesintheSoPFisthefeaturevectorusedinthematchingprocess.WeusetheMahalanobisdistanceasthedistancemeasure.5.1.3MultipleCandidatesRepresentationForcaseswithcontrolledbackgroundandclothing,asisthecasewithmostsignlanguagedatabases,thehanddetectionmethodoutlinedperformsreasonablywell.However,underuncontrolledcaseswherewecanhavenuisancemotion-skinblobsinthebackground,orifthesigneriswearingashortsleevedshirt,oreveninthecasewherethesigner'sheadfacemovesalot,ourapproachandmosthanddetectionalgorithmswillgeneratelotsoffalsealarms.Tohandlesuchcases,wegeneratemultiplehandcandidatesandthenselectamongthemduringthematchingprocessasoutlinedearlier.Toconstructthemultiplecandidates,werstrepresentthemotion-skincondencemapasacollectionofconnectedcomponents.Allconnectedcomponentsthatarecompactandsmallareselectedtobehandcandidates.ThecompactnessismeasuredbydividingthenumberofpixelsbythenumberofboundarypixelswithathresholdT1.ThesizeismeasuredbythenumberofpixelswiththresholdT2.Theremainingcomponentsthataretoolargetobethehandcanstillarisefromthemergingofthearmwiththehands.Thehandsinthesecaseswouldmostlikelybeattheboundaryoftheseshapes.Tondthese,wecomputetheirmedial-axisbyiterativelyremovingeachboundarypixelthatwillnotdisconnecttheconnectedcomponent.51

PAGE 62

OriginalCondencemapCandidatehands Figure5.4Thecondencemapandthegeneratedcandidatehands.Wecanseethatthecandidatehandscanbegeneratedcorrectlywhenthebackgroundismovingandthesignerwearsshortsleevedclothes.Then,weconcentrateonalltheleafpixelsonthemedial-axis.Theseleafpixelsarethenclusteredusinganearest-location-neighborclusteringmethodwithrespecttoathresholdT3untilwegetregionsthataresmallenoughtobehands.Fig.5.4showsussomeresultsforsomesampleframesinour3dierentcontinuousASLdatasets.5.2GroupingofLowLevelPrimitivesTheabovealgorithmcangeneratemultiplehandcandidatepairs.However,theshapeinformationofthehandsisstillnotclear.Weprovideanotheralternativemethodtogeneratemultiplehandcandidates,basedonagroupingprocess.Thismethodcanprovidedetailedshapeinformation,butitcurrentlycanonlyworkfor52

PAGE 63

asinglehand,notahandpair.Wewillexperimentwiththismethodusingsinglesign/gesturerecognitioninsteadofcontinuousrecognition.Lowlevelprocessesareneverperfect.Skincoloristhemostcommonlyusedcueforsegmentingimagepartsfromthehandorfaceingestureanalysis.However,thisdoesnotalwaysproduceperfectsegmentation,withoversegmentationbeingaparticularlyhardproblemtohandle.Inourwork,weallowforoverlappinggroups,resultinginredundantsetsofgroups.Thisanswersthesecondpartofourresearchquestion4,wherewecanuseoversegmentationwithgroupingapproachtogenerateredundantobservations,thiscanreducetheriskthatwewilllosethetruehandobservationatthefeatureextractionstep.OurapproachisdepictedinFig.1.4.Weuseatop-downrecognitionprocessandbottom-upgroupingprocess,integratedinadynamicprogrammingframework.First,wesegmenttheimageintoacollectionofnon-overlappedregions.Thesenon-overlappedregionsareourgroupingprimitives.Someoftheseprimitivesareselectedasourseedpatches.Then,weuseagreedy-search-basedgroupingapproachtogener-ategroupsrepresentingpossiblehands.Westartfromtheseedpatches,progressivelyaddingnewadjacentprimitives,followedbycheckingtopruneoutthebadgroups.Wegeneratelayersofgroups,witheachlayerbasedononeattributesuchascolor,orproximity,orboundarygradient.Thegeneratedgroupsarenotdisjointed.Noticethecolorattributesareusedasasimilaritymeasureinsteadofapredenedmodel.Thegeneratedgroupsarethenlinkedacrossadjacentframestogenerateasetofcandidategroupsequences.Finally,wematcheachmodelsequencetothelinkedgroupstruc-ture.Wendthebestmatchandsimultaneouslyamatchingscorebetweenthemodelsequenceandtheinputsequence.Wehaveshownthismatchingcanbeconductedforbothdeterministicandstatisticmodels.Basedonthematchingscore,weusea53

PAGE 64

Figure5.5TheproposedHMMmodel.ForHMM,wedonothaveauniqueobservation sequencetomatch.Rather,wehaveacollectionofpossibleobservationsequences, impliedbythesequenceofmultipleobservationsateachframe. simplenearestneighborruletogettherecognitionresult.Byusingthisapproach, wesignifcantlyreducetheneedforperfectsegmentationatthefrststep. Themodelsweuseinoursystemincludebothstatisticalmodels(HMMs)and deterministicmodels.Forthedeterministicmodel,wesimplystoresequencesof trainingsignsinthedatabaseandmatchthemwithtimewarpingtechniques,which isessentiallyadynamicprogrammingprocess.Asimilarmatchingprocesscanbe seenat[21],withmultipleobservations,butnogrouping.ForHMMs,weshowthe structureinFig.5.5.WematcheachgestureHMMtothelinkedgroupstructureto simultaneouslycomputethematchingscoreandthebestpossiblegroupingforeach frame.Welaterwillshowthreedierentwaystoactuallyconductthematching,like thoseshownin[22]. 5.2.1GroupingProcess Thelowlevelprimitivesofthegroupingprocessareconstantcolor(orintensityfor graylevelimages)regionpatches.Weusethemeanshiftsegmentationalgorithm[74], 54

PAGE 65

Figure5.6Flowchartofthegroupingprocess.whichisfastandeective,togeneratethesepatchesbasedoncolororintensity.LetthesetoflowlevelprimitivesdetectedinthekthimageframebedenotedbySk=fpk1;;pkNkg.Agrouping,Gkioftheseregionprimitives,willrepresentasubsetoftheseprimitives,fpki1;;pking.Weadoptagreedyapproachtoformthegroups,outlinedintheowchartinFig.5.6.FromtheinitialsetofprimitivesSk,weselectasubsetofprimitivesthatarelikelytocomefromahand,basedonthesizeofthepatch.Theseareourseedpatches.Givensomeknowledgeoftheapproximatesizeofhandsinthesequence,wecaneliminatelarge,non-homogeneousregionpatchesfromfurtherconsideration.WeusealistLtostorethepossiblegroups.Thislistisinitializedbychoosingeachselectedprimitivetobeasingletongroup.Thesegroupswouldbemergedtoforma55

PAGE 66

abcFigure5.7Illustrationoflocaladjacencygraph.aAnimageframe.bHomoge-neousregionpatchesbasedonjustintensity.cLocaladjacencygraphoverthesmallregionpatches,whichcorrespondtothehand.Eachprimitivepatchisrepresentedbyanode.Linksdenotepairsofprimitivesthatareadjacenttoeachother.largerconglomerate.L=ffpkxgjaspkxtsize;x=1;;Nkg.1Hereasistheoperatorthatreturnsthesizeofpkx.FortheentriesinL,wemaintainanadjacencygraph,whosenodesarethegroupsinL,andlinksexistbetweengroupsthatshareaboundary.Thisgraphisincrementallyupdatedateachiteration.Fig.5.7showsusanexamplelocaladjacencygraphLookaheadtoFig.5.8forasequenceofiterationsofthisgraph..ThegroupingprocessstartsbypickingtherstgroupinL,denotedherebyp,andsearchesitsneighborsfNipg.EachneighborNipisconsideredforgroupingwithptogenerateatentativelargergrouping.Weselectthebestlocalgrouping,anddenoteitasg.Incolorlayer,thebestneighboristheonethathasthesmallestEuclideandistancewiththebasegroupintheRGBspace.Intheproximitylayer,wechoosetheneighborthatisnearesttothebasegroupaccordingtotheimagecoordinatesof56

PAGE 67

theircenters.Intheboundarylayer,theneighborthatyieldsthesmallestcurvaturescorewhengroupingwiththebasegroupisselectedasthebest.Thegroupgisfurthertestedtoseeifitcanpossiblyrepresentahand.Thistestisbasedonthreeattributes:[an;as;acur],whereanisthenumberofprimitivesinthegroup,acuristheboundarycurvatureofthegroup,asisthesizeoftheboundingbox.astsize^acurtcurvature^antnum.2ThetestisconductedbasedontheresultofEq.5.2,wheretsize;tcurvature;tnumarethecorrespondingthresholds.Here,theboundarycurvatureisapproximatedastheintegralofthesquaredrootofsecondorderderivativealongthecurve.Ifthegroupgpassesthistest,itisinsertedintothenalcandidategrouplist,C,elseifastsizeitisinsertedattheendofthelistL,tobeconsideredforfurthergrouping.Fig.5.8showsusthegroupingprocessbasedontheadjacencygraph,wherethemechanismisessentiallyagreedysearchprocesstotheadjacencygraph,startingfromachosenseed.InFig.5.8,asolidlinkrepresentsagroupingbetweentwonodes.StartingfromtheseedpatchS,adecisionismadetogroupSwiththebestneighbor,denotedbyNcaccordingtoalayerc,andgeneratethenewgroupGc.Aftergrouping,theadjacentgraphisupdated,wherethenewneighborhoodwillbetheneighborhoodofGc,andtheprocessstartsagaintogrouponeofGc'sneighborhoodswithGc,basedonthesamelayercriteriaAfterdetectingtheseedprimitives,theaboveprocessisusedtogenerate3groupinglayersbasedoncolor,position,andboundarygradient.Thisprocessreducesthepossibilitythatthegroupcorrespondingtothehandwillnotbegenerated.Onthedownside,thisstepwilltriplethetimeandspacecomplexity.Notethatthelowlevelprimitivesandthegroupsareformedonaframebyframebasis.Thereisnotrackingorframetoframecorrespondence.Fig.5.9showsusthe57

PAGE 68

Figure5.8Exampleofthegenerationprocessofgroups.Theprocessisrepeatedforeachprimitiveasaseed58

PAGE 69

abcd efgh ijklFigure5.9Exampleofgeneratedmultiplecandidates.aOriginalframe1.bSegmentedframe1.cListofcandidategroupsinthecolorgroupinglayerinframe1.dListofcandidategroupsintheproximitygroupinglayerinframe1.eOriginalframe2.fSegmentedframe2.gListofcandidategroupsinthecolorgroupinglayerinframe2.hListofcandidategroupsintheproximitygroupinglayerinframe2.iOriginalframe3.jSegmentedframe3.kListofcandidategroupsinthecolorgroupinglayerinframe3.lListofcandidategroupsintheproximitygroupinglayerinframe3.Truehandisshownwithwhitecircle.59

PAGE 70

groupingresultsfor3dierentframesatthecolorgroupinglayerandtheproximitygroupinglayer.Forframe1inFig.5.9a,thecolorgroupinglayeratFig.5.9cincludestherealhandgroupshownwithawhitecircle,whiletheproximitygroupinglayeratFig.5.9dfailedtoincludeit.Forframe2,however,proximitygroupinglayergivesusthetruehand,whilethecolorgroupinglayerdoesnot.Forframe3,bothofthecolorgroupinglayerandtheproximitygroupinglayerhavethetruehandgroupintheirlist.Also,wecanseeinFig.5.9howthegroupsdierfromeachotherintermsofmissingngersoraddedextraneousregions.Thiscanconfoundthesignrecognitionprocess.Alsonotethatwedonotrestrictourselvestodisjointgroups.Thus,wemighthaveGkiGkj6=NULL.Thisisdierentfromtheusuallyemployeddisjointgroupsconstraintemployedinsegmentationandgrouping.Allowingforoverlappinggroupsallowsustoavoidmakingharddecisionsaboutgroupboundaries.5.2.2AssociatingGroupsAcrossFramesWedenotethejthgroupdetectedinkthframeasGkj.Thegroupsdetectedineachframeareassociatedwiththosedetectedinpreviousframestoresultinalinkedsequenceofgroupsspanningalltheframes.Thisstructurewillhelpuspropagateconstraintsduringthematchingprocessandreducethenumberofpossibleobserva-tionsequencestobesearched.Wedenethepredecessor'ssetofeachelementineachgroup'ssetasPreGkj=[Gk)]TJ/F18 7.97 Tf 6.586 0 Td[(1j1;;Gk)]TJ/F18 7.97 Tf 6.586 0 Td[(1jn];.3whereGk)]TJ/F18 7.97 Tf 6.587 0 Td[(1jkisonepossiblepredecessorofGkj.Thepredecessorrelationshipbetweenthegroupsfromdierenttimeisbasedonfeaturesimilarity.Itcaptureshowlikelythegroupsarefromthesameunderlyingcauseintheimage.Specically,wetestthe60

PAGE 71

dierenceinlocationbetweenthetwogroups,withaliberallychosenthresholdvalue:DistanceGjg;Gj)]TJ/F18 7.97 Tf 6.587 0 Td[(1rT4;r2PreGjg.461

PAGE 72

CHAPTER6CONDITIONALMODELSConditionalrandomeldsCRFhasbeenconsideredasapopularmethodformodelingandlabelingvariouskindsofsequences,includinggesturesequences.CRFstrivestomodeltheposteriorprobabilitydirectlywithoneglobalrepresentativefunc-tion.Inthischapter,wewillshowourproposedmodicationsonCRF.WewillalsoshowCRFresultscomparedtooureLBmethodsinChapter7.6.1ConditionalRandomFieldsforaSign/GestureSequenceUnlikeaHiddenMarkovModelHMMthatisagenerativemodelbasedonlikelihoodsofobservations,conditionedonstates,andpriorprobabilitiesofstates,CRFisadiscriminativemodelthatdirectlycomputestheposteriorstateprobabilities.TheHMMrequiresstrictindependenceassumptionsacrossmultivariatefeaturesandconditionalindependencebetweenobservations,giventhestates.However,theseindependenceassumptionsaregenerallyviolatedinsignlanguages,i.e.,observationsarenotonlydependentonthestatebutalsoonthepastobservations.TheotherdisadvantageofusingHMMsisthattheestimationoftheobservationparametersrequiresalargeamountoftrainingdata.Thisisaproblembecauseitmakesthetrainingmoredicult.Ifanyconditionofthesystemischanged,retrainingthemodelwillbeharder.Fig.6.1depictstheessentialdierencesbetweenHMMsandCRF.Fig.6.1ashowsthestructureofHMMs,wherethedirectedlinksindicatetheconditionallikeli-62

PAGE 73

aHMM bCRF cCRFwithkeyframeFigure6.1DierenceofCRF,HMMandkeyframeCRF.aHMMdenedwithstateandobservationpairsusingdirectedlinks.Multipleconsecutiveobservationsinanygivensequencecanbemappedontothesamestate.bTheCRFmodelusespairwiseprobabilitiesoverstatesandobservationsforeachtimeinstant.Eachobservationisassociatedwithastatelabel.cKeyframeCRF.hoodsgiventhestateandthestatetransitionprobabilities.TheCRFmodelisshowninFig.6.1b.CRFisanundirectedgraph,allowingforarbitrarydependenceamongthenodes.Eachgivenobservationisassociatedwithastatelabel.Twoconsecutivestatelabelsaswellasthestate-observationpairsarejointlymodeled.Ourkey-frame-basedCRFapproachworksessentiallysimilartoframe-basedCRF,exceptthatthestatesareassignedtoeachkeyframeandeachkeyframeischaracterizedbasedonafewframesaroundittocapturetheshorttermmotion.63

PAGE 74

CRFhasbeenusedsuccessfullyby[75]tolabelandsegmenttextsequentialdata.Recently,Sminchisescuetal.[41]usedCRFtorecognizewholebodyhumanmove-ment,notsignlanguage.TheyreportedCRFoutperformedtheHMM,especiallyunderlargecontextdependentsituations.However,themovementsconsideredbythemarebasicallyconsecutiveperformancesofsinglegestureswithnomeeects.Also,unliketheirapproach,wedonotuseCRFforrecognition,butratherforseg-mentation.6.2KeyFramesRepresentationandExtractionInourtestforkey-frame-basedCRF,beforethetrainingandtestingarecon-ducted,thevideoframesarepreprocessedbyalocalcornerpointtrackerandthenakeyframesdetector.Weusemotionsnapshotstorepresenttheframebasedonthetrackerresult,whichissimpleandrobustcomparedtotheuseofexternaldevicesorskincolorblobs.Thenthekeyframessubsetswithineachsentencearedetected,byusingamatrixformulationoftheframedistancesandtheeigenvector,toindicatethebestkeyframesubsets.6.2.1MotionSnapshotRepresentationThelowlevelimageprocessinginhandgesturerecognition,suchasfeaturetrack-ingandregionsegmentation,canbefacilitatedbyusingexternaldevices.Neverthe-less,therealworldapplication,requiringsignerstowearglovesortrackingmarkersontheirhandswhilesigningcouldbeannoyingandinconvenientintherealapplica-tion.Skincolorblobsarewidelyusedtoextractthemovinghandinsimplegesturerecognitiontasks.However,inASL,thehandmovementsaremorecomplex,whichcanresultinsituationswherethereareshadowsonthehand,2handsarecrossingeachother,andahandcrossestheface,etc.,whereskindetectionmaygenerateam-64

PAGE 75

biguousblobs.Unliketheseapproaches,wetaketheplain2Dcolorvideosequenceasinput,whichconsistsofthesignsentencesofAmericanSignLanguage.Thelowlevelprocessingissimplyconductedbycornerdetection,featurecorrespondence,andconstructionofamotionsnapshotwithinasmallnumberofframes.Specically,foreachframeintheinput2Dvideosequence,weconsideredthosegoodfeaturepointsandtheirmappingstoboththepreviousandnextframe.TheclassicKLTKanade-Lucas-TomasimethodisusedtodetectcornerpointsandthenweusethepyramidalimplementationoftheLucasKanadeTrackertoestimatethemotionofthecornerpointsbetweenadjacentimages.ThecorrespondingfeaturepointsareconcatenatedusingtheBresenhamLine-Drawingalgorithmtoformamo-tiontrajectorymap.Andweonlyconcatenatethosefeaturesinthecurrentframethatcanndagoodcorrespondenceinboththepreviousandnextframe.Hence,thetrackingessentiallyexistsbetweentheneighborframesonly.Werefertothisrepresentationasmotionsnapshot.Withtheobtainedmotionsnapshot,weexaminetwopairsofrelationalfeaturesinsideit,whicharethehorizontaldistanceandverticaldistanceamongallthevalidpixels.Ajoint2Dhistogramisformedwithregardtothetwofeatures.Then,thePrincipalComponentAnalysisPCAisappliedtoformthedominantvectorofthesignframespace.Eachframeisthenprojectedtotheobtainedeigenspace.Wereferto[73]forthedetailsofthemethod.Afterthisstep,theASLsentenceisrepresentedasasequenceoffeaturevectorsS=.6.2.2DetectingKeyFramesAnumberofkeyframeandvideoboundarydetectiontechnologieshavebeenproposedearlier.Forexample,Zhongetal.[76]usedanunsupervisedapproachtodetectunusualeventsinalongvideo,whereagraphisconstructed,eachsmallchunk65

PAGE 76

Figure6.2Illustrationofrelationaldistributionrepresentation.TheASLsign"CAN"consistof3frameswhichareintherstcolumn.Thesecondcolumnisthetrackedresultforthelocalmotiontrajectory.Thethirdcolumnisthe3Dmeshvisualizingtherelationaldistributionforeachframe.66

PAGE 77

ofvideoisrepresentedasanodeandtheedgeweightsareassociatedwithframedierence.Wealsousesimilargraphrepresentationandeigenvectorcomputations,butthedetectionisforperframeanditisconductedunderasemisupervisedway.Inourapproach,wedenekeyframesofeitherasignormetobethoseframesthatarethemostdierentfromtheframesofothersignsorme.Thetrainingsetforkeyframeselectionisasetofindividualsignsthataremanuallysegmentedwiththemeportionremoved.WedenotethetrainingsetasT=ft1;t2;:::;tlgwhereti=,listhesizeofthetrainingsetandliisthelengthoftheithtrainingsigns.Formally,givenasentenceS=withNframes,wedenotethekeyframessequenceK=asasubsequenceofSwhereki2f1;2;::;Ng;i=1;2;:::;mandk1
PAGE 78

frames.Specically,thersteigenvectortheeigenvectorwiththelargesteigenvalueofAdenotestheparticipanceofeachframetothemostcoherentclusterinS.Wereferto[77]forthedetailsofthismethod.SupposethersteigenvectorisobtainedasE.WendallthelocalminimalsofEw.r.tasmallwindowandthecorrespondingframeisselectedasonekeyframe.6.3ConditionalRandomFieldsoverKeyFrameSequencesWeselectkeyframesforeachsentenceinthetrainingdataset.Thekeyframesaremanuallylabeledasasignorme.WeusealinearchainmodelofCRF,wheretheobservationsaredenotedasK=andthecorrespondinglabelsareL=andLi2fSIGN;meg.isaconditionalrandomeldifwhengloballyconditionedonK,LobeystheMarkovruleinthelineargraph.Thatis:PLijK;L)-222(fLig=PLijK;NLi.2whereNLiistheneighborsofLi.LetusconsiderthelinearchaingraphGcon-structedby,letCK,LdenotethesetofcliquesinG.Bythefundamentaltheoremofrandomelds,theprobabilityofalabelsequenceL,giventheobservationsequenceK,canberepresentedas:PLjK/expPc2CK;LfFc;K.3wherefFgarethefeaturefunctionsdenedoverallthecliquesandf=ffgaretheparameterssetweightedthecorrespondingfeaturefunctions.Inalinearchaingraph,thecliquescanbethepairofadjacentlabelsandthepairoflabel-observationpair.Forexample,atthestartupofanASLsentence,usually68

PAGE 79

thesignerliftsthehandsup.LetusdenotethekeyframeofthisactionasK0andthecorrespondinglabelasme.ThenapenaltyofassigningmetoK0isselectedandthenweightedbythecorrespondingf.Foratransitionfeature,similarly,supposewehave2adjacentkeyframesK0andK1whicharelabeledbothasme,thenapenaltyofassigningme)]TJ/F28 11.955 Tf 11.955 0 Td[(metoanedgeisselectedandweightedbycorrespondingf.NotethatunlikeHMMs,wherestrictindependencedoesnotallowustorepresenttherelationshipbetweenthelabelsandobservationsindierenttime,inCRFthiscanberepresentedwithanarbitrarywindowwas,whichcanbemorecontextdependentandismuchmoreexible.Fortraining,weconsideredthelabeledASLkeyframesequences;d21;2;:::;NswhereNsisthesizeofthetrainingdatabase.Theparametersetfcanbefoundbymaximizingtheloglikelihood:Lf=PNsd=1logPLdjKd=PNsd=1Pc2CK;LfFc;K)]TJ/F20 11.955 Tf 11.955 0 Td[(logZK.4whereZisthenormalizationfactordependingontheobservationsequences.Weuseagradient-basedapproachwitharandomstartpointtoseekthemaximalpointof6.4.AbeliefpropagationBPmethodisusedtodoinferenceoverthechainstructure.Theinferenceresultisourdecodedsequenceforthesignlanguagesequence.69

PAGE 80

CHAPTER7EXPERIMENTSANDRESULTSWehaveconductedextensiveexperimentationoftheapproachesproposedinthisworkinthecontextofthetaskofrecognizingcontinuousAmericanSignLanguageASLsentencesandsinglegestures/signsfromimagesequences.WepresentnotonlyvisualresultsoflabelingcontinuousASLsentences,butalsoquantifytheirperfor-mance.ForcontinuousASLsentenceexperiments,wecomparetheperformancewiththatobtainedbyclassicalLevelBuilding,whichdoesnotaccountformovementepenthesis,andtheframelabelingresultsobtainedfromtwostateoftheartmethods:conditionalrandomeldsCRFandLatentDynamic-CRFLDCRF.Wewerenotabletocom-parewithotherexplicitmodel-basedapproachestohandlingmovementepenthesisandsomegenerativemethodssuchastheHMM,sincetheyrequirelargeamountoftrainingdata,whichiseithernotavailableordiculttoacquire.Forthevocabularysizeusedinthiswork,wewouldneedabout1000labeledASLsentences.Wealsopresentempiricalevidenceoftheoptimityofthechoiceoftheparameterthatisusedtodecideonthememappingcostandpresenttheimpactofthegrammarmodelonrecognition.Forsinglesign/gesturerecognition,weexperimentwithbothdeterministicsample-basedmodelsDTWandstatisticalmodelsHMMswithourproposeddecodingprocesses.Wetestedthegroupingalgorithmcouplingwithbothofthetwomodels.Weshowtheresultscomparedtothemethodswithoutagroupingapproachandthe70

PAGE 81

Table7.1Summaryofthedatasetsusedinthiswork. NameD1D2D3D4 Resolution460x290640x480640x480320x240ColorYesYesYesYesFramerate30303024#oftrainingsentence1001510280#ofdistincttrainingsentence2515107#oftestingsentence252220210#ofdistincttrainingsigns4017177#oftwo-handedsigns21977Samesentenceintrainandtest?YesNoNoYesBackgroundUniformComplexwithmotionTexturedandstaticTexturedandstaticShortsleevesNoNoYesNo methodsthatusemanuallyselectedgroups.Wealsoshowthetrackingresultsofourtruegroupsasbyproductsofourdecodingalgorithm.Inthischapter,wewillanswerthefollowingresearchquestions:5.Canourproposedsetofalgorithmshandlecomplexbackground?Canweiden-tifysignsmadebysignerswearingbothshortandlongsleeves,i.e.,relaxthetypicalclothingconstraints?6.Howwelldoestherecognitionratewiththeproposedapproachmatchwiththatachievedthroughmanuallygroupedsegmentation?7.1DataSetsandExperimentSetupWehaveused4datasets,summarizedinTable7.1.ExampleframesfromthesethreedatasetsareshowninFig.1.6,1.7,1.8and1.9.Aswecansee,thedatasetsvaryintermsofthebackground.ThebackgroundindatasetD1isuniform,static,andwithnotexture.Thisistypicalofsignlanguagedatasets.Thebackgroundin71

PAGE 82

D3isstatic,butitistextured.Thelightinginthisdatasetisnotdirectlyonthesubject.ThisdatasetisharderintermsofilluminationandbackgroundconditionsthanD1.Thisdatasetisnottypicalofsignlanguagedatasets,especiallyintheuseofshortsleeves.ThedatasetD2isthetoughestone,withcomplexbackgroundandwithmovingpeopleinthebackground.Thereareseveralpatchesinthebackgroundwithskincolor.D4isthedatasetforisolatedgesturesequences[78],whichhascomplexbutstaticbackground,andithastwoviews.ForeachframeindatasetsD2,D3andD4,wehavemultiplehandcandidates.OnlyforD1canweuseglobalfeatures.Thetrainandtestforthesedatasetsarestructuredasfollows.InD1,wehave5samplespersentence.Weperform5-foldcrossvalidationexperiments,with4samplesofeachsentencefortrainingandonefortest.ForD2andD3,wehavedierentsentencesinthetrainingandtestingset.Methodsthatexplicitlyorimplicitlyrelyonmemodelswillhaveahardtime.ThegesturedatasetD4inourexperimentsisa7handgesturedataset.Thedatasetconsistsof280trainingsequences,40foreachgestureand210testsequencesfrom3subjects,and10foreachsubjectandeachgesture.Thisdatasethastwoviews.SincewehaveenoughtrainingdataonD4,wewillshowresultsbasedontheHiddenMarkovModel.D4has24fps,complexbackground,andcoloredgloveswithlongsleeves.Toenableustoquantifytheperformance,wemanuallylabeledtheframescor-respondingtothesignsinthesentences.Wealsousedthetoolin[24]tomanuallygeneratethetruehandsgroupsforthemodelsigns.WealsoreferthereadertoAppen-dixAfortheprocessofdatacollectionsandAppendixBfortheannotationprocess.Toquantitativelyevaluatetheresults,weuseerrormeasuresasadvocatedin[79].Iftherecognizedsentenceinsertsasignthatdoesnotactuallyexist,oneinsertionerror72

PAGE 83

iscounted.If,however,therecognizedsentenceomitsasignwhereitactuallyexists,onedeletionerroriscounted.Iftherecognizedsentencereportsawrongsign,weconsideritasasubstitutionerror.Wecomputedtheseerrorsautomaticallybycom-putingtheLevenshteindistanceusingadynamicprogrammingapproach[80]betweentheactualresultsandmanuallylabeledgroundtruth.Wenamethismeasurementtobe"wordlevelrate".Wealsoevaluatetheframewiselabelingresult,whichmeansthetotalnumberofcorrectlylabeledframesdividedbythetotalnumberofframes.Wecallthismeasurementthe"framelevelrate".Weusethesamesetofthresholdsforalltheexperiments.Wesetthesethresholdstobealiberalvaluebasedonheuristics.Forexample,wesetT0=100pixels,T1=2,T2=4000pixels,T3=imageheight/8,T4=300pixels.Whilehighrecognitionratesintheorderof>90%ofisolatedASLsignsandisolatedngerspelledsignshavebeenreported,reportedperformancesforrecognitionincontinuoussentencesvaryquiteabit%-99%[81],dependingonvocabularysize,andlengthofsentences,andpossiblyotherfactorsyettobeexplored,suchasthedegreetowhichhumanscanrecognizeeachsignundervariousconditionssuchascomplexbackground,etc.Weconductedsixstudies.Intherststudy,wefocusedontheanalysisoftheeLBalgorithmandtheestimationofparameter.Weusedglobalfeaturesinthisstudy.Wetestedusingbothbigramandtrigramgrammarbuiltusingatextcorpusof150sentences.TheentiretextcorpusisshowninAppendixC.Theperformancewasmeasuredusingthewordlevelrate.Inthesecondstudy,wecomparedourlabelingapproachwithCRF/LDCRFapproaches.SinceCRF/LDCRFonlyproduceaframelevelrateresult,weusedthisasperformancemeasureforthisstudy.Inthethirdstudy,wecomparedtheresultsbetweenglobalfeatureswiththepart-basedcandidatehandsapproach,weexperimentedonbothD1andD2.Inthisstudy,we73

PAGE 84

usedasentence-basedgrammar,whichisstrongerthanjustbigramsandtrigrams.Inthefourthstudy,weusedD1,D2,andD3.TheeLBandpart-basedcandidatehandsareused.Weshowtheresultswithvariationinthealgorithmusedtodetecthands.Inthefthstudy,weconductedexperimentswithD1,whereweshowthegroupingalgorithmcouplingwithadeterministicsample-basedmodel.Inthesixthstudy,weconductedexperimentswithD4usingthegroupingalgorithmcouplingwiththreedierentHMMdecodingprocesses.ThedetailsofthesetupoftheexperimentsarelistedinTable7.2forstudy1)]TJ/F15 11.955 Tf 13.334 0 Td[(study4.Wealsoshowtheexperimentsetupofstudy5andstudy6inTable7.3.Theseexperimentsconsistofsinglesignanalysisandthegroupinganalysis.ForeLBsetup,weassignedtheparametersvaluesasLmax=20andNmax=145,whichmeansweallowedonesentencetohaveamaximumof20signs,andthemaximumdurationofmovementepenthesismetobe145frames.Weusedtherst7coecientsoftheSpaceofProbabilityFunctionsSoPFspacerepresentationastheglobalfeaturevector[73].Inourexperiments,wehavefoundthesechoicestobestable.Varyingthemdidnotchangetheperformancesignicantlywithin1%.7.2Study1:eLBvs.LBwithGrammarandParameterAnalysisTheprimaryfocusofthissetofexperimentsistotesttheeectivenessoftheeLBalgorithmtoovercomethemeproblem.Wealsostudiedthechoiceofthemelabelingcost.WeconductedstudiesusingthedatasetD1,wherebackgroundrelatedissuesareleastlikelytoconfoundthemovementepenthesisrecognitionproblem.ThelabelingresultsforthreesentencesarepresentedinFig.7.1.Eachhorizontalbarrepresentsasentence,andispartitionedintosignsormeblocks.Thesizeofeachblockisproportionaltothenumberofframescorrespondingtothatlabel.Foreachsentence,wepresentthegroundtruthasdeterminedbyanASLexpertandthe74

PAGE 85

Table7.2Outlineofstudy1)]TJ/F15 11.955 Tf 14.443 0 Td[(study4.Thetableshowsthedierentmatching,feature,grammar,anderrormeasurementsusedinourfourtestsforcontinuousASLsentencetest. NameStudy1Study2Study3Study4 PurposeAnalysisofeLBandComparisonbetweeneLBandCRFComparisonbetweeneLBusingglobalfea-turesandeLBusingpart-basedcandidatesComparisonbetweenus-ingskeletonandwith-outusingskeleton DatasetsusedD1D1D1andD2D1,D2andD3 MatchingalgorithmseLBandLBeLB,CRFandLD-CRFeLBeLB FeaturesGlobalGlobalPart-basedcandidatesandglobalPart-basedcandidates GrammarBigramandtrigramTrigramSentenceSentence TextcorpusExtendedsen-tencesExtendedsen-tencesNon-extendedsamenumberasthetestsentencesNon-extendedsamenumberasthetestsentences Errormea-surementsWordlevelrateFramelevelrateWordlevelrateWordlevelrate 75

PAGE 86

Table7.3Outlineofstudy5andstudy6.Thetableshowsthedierentmatch-ing,feature,grammar,anderrormeasurementsusedinourtwotestsforthesinglesign/gesturetestdataset. NameStudy5Study6 PurposeAnalysisofgroupingusingdeterministicsample-basedmodelAnalysisofgroupingusingHid-denMarkovModels DatasetsusedD1D4 Matchingalgo-rithms3DDTWHMMswithmax-max,max-sum,sum-sumapproaches FeaturesGroupingresults,au-tomaticandmanualGroupingresults,au-tomaticandmanual GrammarNoneNone TextcorpusNoneNone Errormeasure-mentsWordlevelrateWordlevelrate 76

PAGE 87

Figure7.1Labelingresultsforthreesentences.Eachhorizontalbarrepresentsasentencethatispartitionedintosignsandmelabels.Thelengthofthehorizontalbarisproportionaltothenumberofframesinthesentence.Foreachsentencewepresentgroundtruthpartitioningandthealgorithmoutput.resultsfromthealgorithm.Itisobviousthatthesignerissigningatdierentspeedforeachsign.Forinstance,thesignIisspreadoveralargenumberofframes.Theframeworkcaneasilyhandlesuchcases.Apartfroma1to2framemismatchatthebeginningandattheend,thelabelingmatchesfairlywell.Fig.7.2showsthesignlevelerrorratesweobtainedwiththeoptimalmoreonthislaterforeachtestsetinthe5-foldvalidationexperimentation,usingatrigrammodelondatasetD1.Thesignlevelerrorrateforeachtestsetrangesbetween9%and28%.Onaverage,theerrorrateis17%,withacorrespondingcorrectrecognitionrateof83%.InFig.7.3,wepresentresultsofahead-to-headcomparisonoftheerrorratesobtainedusingtheenhancedLevelBuildingalgorithmpresentedhereandclassicalLevelBuildingthatdoesnotaccountformovementepenthesis.Wefoundtheinsertionerrorhasbeen77

PAGE 88

Figure7.2SignlevelerrorratesusingeLBondatasetD1.Itisbrokenintoinser-tion,deletion,andsubstitution.Theresultsareforeachtestsetinthe5-foldcrossvalidation.decreasedbyusingtheproposedmethodfrom10%to4%.Atthesametime,thesubstitutionerrorisreducedfrom63%to5%.Next,westudiedtheneedforthegrammarmodel.Fig.7.4showsussidebysidetheerrorratesweobtainedbyusingatrigrammodelandabigrammodel.Weconstructedthegrammarmodelsbasedonatextcorpusof150sentences.Thesesentencesdidnotallhavecorrespondingvideodata.Byusingtrigrammodel,theaverageerrorratedroppedfrom32%to17%.Theconstraintimposedbyabigrammodelismorerelaxedthanthatimposedbyatrigrammodel.Itmaybereiteratedthatweareusinga0-1representationofthen-grams,i.e.,foranyinstanceofarelationshipinthecorpus,thecorrespondingcountissetto1,otherwiseitiszero.Byfar,themostimportantparameteristhemelabelingcost,.Asdescribedearlier,weselectthevalueoftobetheoptimalBayesiandecisionboundarybetweenmatchandnon-matchscores.Fig.7.5ashowsusthematchandnon-matchscoresonthetrainingsetindatasetD1foroneofthe5-foldexperiments.Aswecansee,amatchedscoreusuallyaveragesapproximately78

PAGE 89

Figure7.3ErrorratesforeLBandLB.TheresultisbasedondatasetD1. Figure7.4Errorrateswithtrigramandbigramconstraints.79

PAGE 90

(a) (b) Figure7.5Choosingthemovementepenthesis(me )labelingcost .Theresultisfor oneofthe5-foldexperiments.(a)showsusthematchandnon-matchdistancescores inthetrainingsetusedtochoosetheoptimal .Theoptimalvalueis0.89.(b)shows thevariationoftheerrorswithdierentchoicesof 80

PAGE 91

Table7.4Comparisonofautomaticallychosenandmanuallychosen.ErrorrateswitheLBondatasetD1,withautomaticallyAutochosenandtheoneOpt.thatminimizestheerroronthetestset. TestInsertionDeletionSubstitutionTotalAutoOpt.AutoOpt.AutoOpt.AutoOpt. 14%8%7%4%6%3%17%15% 24%0%0%3%5%5%10%8% 33%1%8%5%10%10%21%16% 43%3%4%4%4%4%11%11% 57%3%8%1%13%13%28%17% Avg.7%3%2%3%7%7%17%14% 0:4,whileanon-matchingscoreiscenteredaround1:4.Theoptimalvaluefortheforthistrainingdatasetis0.89.Howgoodarethetrainedmelabelingcosts,?Tostudythis,wecomputedthebestthatminimizedtheoverallerrorrateonthetestset.Fig.7.5bshowsusthevariationoftheerrorswithdierentforoneofthetestsets.Weseethattheautomaticallychosenvalueof0.89isneartheminimumoftheactualerrorplots.InTable7.4,welisttheerrorswiththeautomaticallychosensforeachofthe5-foldexperimentsandcomparethemwiththeactualpossibleminimums.Theerrorsarewithin4%.Thisshowsthatourmethodforchoosingtheoptimalisfairlyrobust.7.3Study2:ComparisonwithOtherApproachesInthisstudy,werstusetheCRFmodeltoeectivelydetectmesegmentsusingthealgorithmsdescribedinChapter6.Wetakeoneofthe5shotsofsentencesasthetestdata.Theindividualsignsaretakenoutfromtheother4sentencestoformthetrainingspace.Thecornerdetectionmethodusuallygenerates50-100featurepoints.Therelationalfeaturesarecountedby3232bins.Withthefeaturesequences,weusethewindowsizeofw=7tondthekeyframes.Fig.7.6showsustheresultof81

PAGE 92

keyframedetection.Note,wedonotrestrict1keyframeforeachsignormepart.Rather,multipledistinctiveframesmaybechosen.Forexample,inFig.7.6,wehave2keyframesdetectedtoindicatethestartingportionandtheendportionofthesign"GATE".Fig.7.7showsustheROCcurvefordetectingthemepointatthekeyframesequences,where4shotsareusedastrainingdata,witheachofthemhaving25sentences.WerunaHMMdetectoralsoasthebaselinealgorithm.Additionally,weusethewindowsizeof1;3;5toincorporateadjacentobservations.Note,itisdicultfortheHMMtousetheseobservationsbecauseoftheindependenceassumption.Wethencomparetheperformanceofourapproachwithtwostateoftheartmethods:conditionalrandomelds[82]andLatentDynamic-CRF[30].Weusethecodefrom[30]togenerateourresults.Theseparticularmodelshavebeendevelopedinagesturerecognitioncontext,wherewehavelabelscorrespondingtothegesturessigns.Theposteriorprobabilityismaximizedorestimateddirectlyintrainingandtesting.Whilethenumberoflabelsincreases,themodelcouldhavealargenumberofparameterstoestimatedependingontheselectionoffeaturefunctions.Forbothmethods,weuseachainingstructurewherewehave3hiddenstatesforeachlabelforLDCRF.AlthoughCRF[82]andLDCRF[30]haveshownimprovedresultsforlimitednumberoflabels,inourexperimentswehadtousethemfor40+labels.Wequantifyperformancebasedonusingtheframelevelerrorrate,i.e.,whatpercentageoftheframesarewronglyclassiedinthetestset.Table7.5liststheresults.Aswecansee,CRFandLDCRFperformquitepoorly.Thisisbecausethenumberofparametersthatneedstobeestimatedforthesemodelsishugecomparedtoourmethod,whichmakesthetrainingunstable.Also,bothCRFandLDCRFimplicitlymodelmeas1singleclass,whichisnotrealistic.Fromtheresultsinthissection,wecanseethatCRFworkswellundera2-classcases,actuallyoutperformingHMMs.However,inahighnumberofclassescase,whichis82

PAGE 93

a K3:meK8:meK16:GATEK12:GATE K27:GATEK32:meK38:meK48:WHERE K57:WHEREK68:WHEREK76:meK85:mebFigure7.6Keyframedetection.ashowsustheplotoftheelementofrsteigen-vectorforthesentence,wherekeyframeisdetectedbyselectingthelocalminimals.Detectedkeyframesaremarkedaseitherasignorme,whichisshowninb.83

PAGE 94

Figure7.7TheROCcurvefordetectingmeusingHMMsandCRF.Table7.5Framewiselabelingresults.ThecomparisonofeLB,LB,CRF,LDCRFisincluded. MethodseLBLBCRFLDCRF Parameters10196815990Classes41404141DatasetusedD1D1D1D1GrammarmodelTrigramTrigramN/AN/ATotaltestframes2234223422342234Correctlabeledframes1530406642460Errorrate31%82%71%89% 84

PAGE 95

generallytrueinsignlanguagerecognition,atraditionalCRFmodelhastoomanyparameterstoestimate.Theaccuracyofthemodelhasbeendecreasedalotandmaynotperformaswell.However,inthiscase,oureLBmethodcanstillgetcorrectsentencerecognitionresults.7.4Study3:GlobalFeaturesvs.MultipleLocalCandidatesTheeLBframeworkcanhandlebothglobalfeaturesthatarecomputedbasedonthewholeimageframeandpart-basedfeatures.Fig.7.8showstheresultweobtainedforbothD1andD2usingthesetwofeaturetypes.ForD1theglobalfeaturevectormethodworkswellsincethereisnottoomuchbackgroundnoise0%error.However,whenweuseglobalfeaturevectorsonD2,whereacomplexandchangingbackgroundexists,theerrorincreasessignicantly3%error.Withapart-basedcandidateapproach,thecorrespondingerrorrateis36%.Note,althoughthepart-basedapproachhasa47%erroronD1,thisisbecausethevocabularysizeofD1isalmosttwiceasbigasD2.Wecanseethepart-basedcandidateapproachprovidesamorerobustsolutionforthelowleveluncertaintyproblem.Fig.7.9showsusavisualexampleofhowthecandidatehandsareselectedalongwiththeeLBalgorithm.Itshowsasidebysidesub-sampledimagesequencefromonecontinuoussentence.Thesidebysideimagehasthedetectedhandsshownwiththeoriginalimageattheleftside,andhasallthecandidatehandsshownontherightside.Notethatiftheframeislabeledasanme,nohandcandidatewillbeselected.FromFig.7.9,wecansee,duringtheprocessthattheframeislabeled,thecandidatehandsaresimultaneouslyselected.Itcanevenworkwhenasecondpersonisworkingbehindthesignerwhichgeneratesmorenoisyhandcandidates.Forthis,aglobalfeaturewilldenitelyfail.Itisalsointerestingtoseethatforthesentencein7.9,althoughthesentencerecognitioniscorrectwhichiswhatwewant,theframewise85

PAGE 96

Figure7.8Compareglobalfeaturesandpart-basedcandidatehands.TheresultsarebasedondatasetsD1andD2.labelingisnotcompletelyright.Thisisduetothefactthatweonlyuseverycoarsefeaturessuchaspositionandmovingdirectionstoconductthematch.Thesignsinbetweencanbeeasilymixedupwitheachother.However,theeLBframeworkcanstillmakethenalrecognitionforthesentencecorrectbasedontextcorpusandthebestmatchedsignsequence.Theresultinthissectionalsoanswersforustherstpartofourresearchques-tion5.Ourapproachdoesimprovetheresultwhenthereiscomplexandmovingbackground.Weaccomplishthisbyusingmultipleobservationsinsteadofasingleobservation,wherewecanreducethechanceoflosinganimportantobservationatthelowlevel.Andthisgivesusamorestableresultwhenweapplyouralgorithmstobothsingle/complexbackground.86

PAGE 97

meme FINISHFINISH BUYBUY TICKETNOW FINISHFINISH memeFigure7.9Labelingfor"FINISHBUYTICKETNOWFINISH".Thesidebysideimagehasthedetectedhandsshownwiththeoriginalimageattheleftside,andhasallthecandidatehandsshownontherightside.Theactuallabelisbelowtheimage.Noteformeframenohandcandidatewillbeselected.87

PAGE 98

7.5Study4:ShortSleevesvs.LongSleevesInmostsignlanguagedatasets,clothingisusuallycontrolled.Thesignerusuallywearslongsleevedshirtssothatjustthehandscanbesegmentedusingskincolor.However,withshortsleevedshirts,thehandregioncangetmergedwitharm.Mergingcanalsohappenwithlongsleevedshirtswhenhandscrosseachotherorwhenthehandcrossestheface.Sometimeswecanlosetherealhandsduetooversegmentation.Weusethemedial-axisguideddetectionapproachdescribedinSection5.1.3toaddressthisproblem.Wetestedonallofthe3datasets,usingeLBalgorithm,withandwithoutthemedial-axis-baseddetectionapproach.TheresultsareshowninFig.7.10.Asignicantimprovement30%canbeobservedovernotusingthemedial-axis-basedapproach.Fig.7.11showsusavisualexampleofhowthecandidatehandsareselectedalongwiththeeLBalgorithm.Itshowsasidebysidesub-sampledimagesequencefromonecontinuoussentence.Thesidebysideimagehasthedetectedhandsshownwiththeoriginalimageattheleftside,andhasallthecandidatehandsshownontherightside.Notethatiftheframeislabeledasanme,nohandcandidatewillbeselected.FromFig.7.11,wecansee,duringtheprocessthattheframeislabeled,thecandidatehandsaresimultaneouslyselected.Themedial-axisrepresentationofthecandidatehastheadvantagesofseparatingthemergedarms/hands.Itcanworkundercaseswherethesigneriswearingshortsleevedclothes.Thesecandidatescannotbeeectivelygeneratedwithoutthisapproach,anderrorswillpropagatetothecorematchingalgorithmlevel.Theresultinthissectionalsoanswersforusthesecondpartofourresearchquestion5.Ourapproachdoesimprovetheresultwhenthesignerwearsshortsleevedclothes.Thisisaccomplishedbyusingourskeletonrepresentation,bywhichwecansegmentthehandfromthearms.Hence,wecanavoidlosingthehandobservation88

PAGE 99

Figure7.10Comparewithandwithoutskeletonmedial-axisdetection.whenshortsleevedclothesareused,whichwillleadtoamorestableresultwhenweapplyouralgorithmstobothshortsleeved/longsleevedcases.7.6Study5:GroupingResultswiththeDeterministicModelInthisexperiment,weuseD1withthedeterministicapproach.Theobjectiveofthisexperimentistotestthegroupingmethodcouplingwithadeterministicapproach,showninChapter4.Themodelsigndatasetisformedfromfouroftheveinstancesofeachsentence.Specically,foreachsignwehave4examples.Wemanuallyselectthegroupsofregionpatchesthatarefromeachhandframebyframe.Thesequencesofthesemanuallyselectedgroupsformthemodelsequences.SincethenumberoftrainingsamplesislimitedanditwillbehardtoestimatetheHMMaccurately,weusethedeterministicmatchinginthisexperiment.Forfeaturevectors,wetthehandgroupswithanellipseinaleastsquareerrormanner,supposetheellipsehasamajoraxisa,minoraxisb,andtheanglebetweenmajoraxisandxaxisis.Wethenhavea10dimensionalfeaturevectortorepresent89

PAGE 100

meme meTABLE TABLETABLE meme THATTHAT memeFigure7.11Thelabelingresultsforthesequence"TABLETHAT".Thesidebysideimagehasthedetectedhandsshownwiththeoriginalimageattheleftside,andhasallthecandidatehandsshownontherightside.Theactuallabelisbelowtheimage.Notethatiftheframeislabeledasanme,nohandcandidatewillbeselected.90

PAGE 101

thehandgroup:x-axis,y-axis,motiondisplacementatxdirection,motiondisplace-mentatydirection,lengthofmajoraxisa,lengthofminoraxisb,sineof2,cosineof2,eccentricityoftheellipse,andareaoftheellipse.Fig.7.12andFig.7.13showusexamplesofthegeneratedgroupsforoneframe.Aswecansee,evenforthesimplebackgroundandsimpleclothes,thehandcanbeveryfragmented.Fig.7.12hasmorethan100candidategroups,wheretherealhandscanbegeneratedduringthegroupingprocess.Withoutgrouping,wecannotguaranteetohavetherealhandinthecandidatelist,asshowninFig.7.13.Thematchedsignsarerankedaccordingtotheirmatchingscores.Table7.6showsustheactuallistafterrankingthematchingscoreforafewsentences.Thesignswithacheckmarkpareactuallyinthesentence,withthescoreslistedbesidethem.Thecorrectsignsaretowardslowerrank,whichiswhatwewant.Notethatthisresultwasobtainedwithoutusinghigherlevelgrammar.Table7.7showstheresultforthesamesentences,butwiththematchedstartingandendingpointslisted.Groundtruthstartingandendingpointsareinthebrackets.Fig.7.14showsusonematchresultforthetestsequence:PEOPLELONGLINEWAITANGRY.Fig.7.14aisthewarpedpathinthe3Dspacewherethewarpingisfromboththecandidatehandsselectionandthetimewarping.Fig.7.14bshowstheprojectionofthesamedataontotheX)]TJ/F20 11.955 Tf 12.121 0 Td[(Yplane,whichisessentiallyonlythetimewarpingprocess.Fig.7.14c,istheprojectionofthesamedataontotheX)]TJ/F20 11.955 Tf 13.085 0 Td[(Zplane,whichrevealsthedetectedhand'sXcoordinates.Fig.7.15showsustherecoveredpositionofthehandXcoordinatesandtheirhandmovementsinthetestsequence.Theresultisshownbyfourparts,eachofwhichcorrespondstoasigninthesentence.Theseresultsshowtherealhandpositionisnallyrecoveredfromtherecognitionresultevenwhenthehandiscrossingtheface.Thisisaparticularhardproblemtoovercomeingesturerecognition.TheoverallrecognitionresultforthisdatabaseisshownatFig.7.16.In91

PAGE 102

aOriginalbSegmentedcBoundary dCandidategroupsfromgroupingalgorithmFigure7.12CandidategroupsforASLdatasetwithgrouping.Whilegroupingwesettnum=10.Ind,thereare125groupsgenerated,thegroupswithacirclearetherealleftandrighthands.92

PAGE 103

aOriginalbSegmentedcBoundary dCandidategroupswithoutgroupingFigure7.13CandidategroupsforASLdatasetwithoutgrouping.Whilegroupingwesettnum=1,basicallynogrouping.Ind,thereisnogroupingprocess,justasegmentation,wecanseethehandarehighlyfragmented.Withoutgroupingwecannotgettherealhandinthelist.thisresult,125sentencesarecountedwithatotalof348signs.Eachmodelwordismatchedtothetestsequenceandtheresultsarerankedaccordingtotheirmatchingscore.Thosewordswhichareintheoriginalsentencebuthavealargerrankingthan6,7,8,9,10willbecountedasoneerrorfortheranking6,7,8,9,10,respectively.Fig.7.16showsustheperformanceunderdierentnumberofprimitivesallowedinonegroup,from1,5to10,20.Theuppercurveistherecognitionratesachievedusingmanuallyselectedhands.Wecanseetherecognitionresultsincreasewhenthenumberofprimitivesincreasefrom1,5to10.Thereisaslightdropon20,whichisledbytheintroducednoisygroups.Theoverallperformancedropcomparedwiththemanuallyselectedhandsiswithin1%-5%.Noticethattheexperimentisdonewithoutanypre-denedhandmodel,andsimilarlythemodelingofsigndynamicsisveryweakwiththesimplenearestneighborrules.Atrank6,weachievedarecognitionratearound90%to94%.93

PAGE 104

(a)(b) (c) Figure7.14MatchingpathofanASLsentence.(a)Thematchedsequencesinthe candidate-timespaceofthetestsentence:PEOPLELONGLINEWAITANGRY.(b) Thematchedsequences(timewarped)ofthetestsentence.(c)Therecoveredhand position(x coordinates). 94

PAGE 105

(a) PEOPLE (b)Therecoveredrighthandposition(xcoordinates)forsignPEOPLE. (c) LONGLINE (d)Therecoveredrighthandposition(xcoordinates)forsignLONGLINE. (e) WAIT (f)Therecoveredrighthandposition(xcoordinates)forsignWAIT. (g) ANGRY (h)Therecoveredrighthandposition(xcoordinates)forsignANGRY. Figure7.15Therecoveredrighthandinthetestsequence. 95

PAGE 106

a bFigure7.16ThetestresultsforASLdataset.Eachcurverepresentstheresultwhenoneshotisusedastestsequence.Thehorizontalaxisdenotesthedierentranks,theverticalaxisdenotestherecognitionrates.96

PAGE 107

Table7.6Listofmatchedscoresfor3testsentences.Thesignswithpareactuallyinthesentence,withthescoreslistingbesidethem. TestTICKETBUYFINISHPEOPLELONG-LINEWAITAN-GRYGATEWHERERankInSignsScoresInSignsScoresInSignsScores 1pBuy1.51pWait0.52pGate1.742I2.24pLongLine1.56Suitcase3.593Wait2.35Buy3.07Not3.94pTicket2.44pPeople3.07Phone4.095pFinish3.23I3.66pWhere4.486You6.89pAngry3.87No4.747Have7.07Not4.16Need5.288Mad7.4Have4.26Have5.819Need7.5Again4.26ThatOne6.1210Phone8.38Airplane4.66Postpone6.3511Mean8.87Ticket4.79Yes6.3712Suitcase9.11Lipread4.82Can6.8113Cannot9.84Gave5.11Again6.8514It10.1Phone6.54Mad7.6315Can10.52Finish6.64Understand7.7116People11.17Just7.21Key8.3417Again11.58Key7.24Table8.4518Gave11.95Gate7.73Lipread8.819Not12.62Understand8.52Angry9.5920Gate13.68Need8.9I9.62 7.7Study6:GroupingResultswithHiddenMarkovModelsTostudytheeectofgroupingwithHMM-basedmatching,weneedtouseadatasetthatsupportstheHMMlearning.TheASLdatasetsdonothavesucientnumberofrepetitionspersigntoallowthis.HenceweusetheHumanComputerInteractionHCIdatasetthathasbeenrecentlycollectedbyanotherresearchgroup,i.e.,JustandMarcel[78],whichisalsoreferredasD4.Thedatasetisforrecognizing7handactions:push,rotatefront,rotateback,rotateleft,rotateright,rotateup,androtatedown.Theauthorsofthedatahaveexplicitlyseparatedthetrainingandtestdata,wherethetrainingdataconsistof4subjects,eachofwhomperformedthe7actions97

PAGE 108

Table7.7Matchedpositionsandmanuallyrecognizedpositions.Themanuallyrecog-nizedpositionsareinthebracket. TestTICKETBUYFINISHPEOPLELONGLINEWAITANGRYGATEWHERERankSignsStartEndSignsStartEndSignsStartEnd 1Ticket182121People1838Gate11302Buy283131Long5267Where48683Finish424747LineWait71794Angry99107 10times,with5ofthematonesessiontimeand5ofthemattheother.Thetestdatahasthesameshotsbutwith3dierentsubjects.Thetotalnumberoftestsequencesis210.Thedatasethasshotsfrom2xedcameras,oneshotfromtheleftsideandtheothershotfromtherightside.Weusedthejoinedresultsofthetwoviewsinthiswork.Forthisexperiment,sincewehavesucientnumberoftrainingdata,weuseHMMstoconductthematchinginstead.Sincethisdatasetwascollectedwithyellowandbluecoloredgloves,itallowsustomakecomparisonswithcolor-basedhandsegmentationschemes.Asbaselineperformancecomparison,weconsideredimanuallysegmentedhands,andiihandssegmentedusingtheinformationaboutthecolorofthegloves.Forcolor-basedhandsegmentation,eachglovecolorismodeledasamixtureof3Gaussiansinthecolorspace.Fortheproposedapproach,weconsideredjustregionsegmentationpatches,detectedasoutlinedearlier.Notethatalthoughweusecolorforsegmentationandgrouping,wedonotusetheknowledgethataspeciccolorcorrespondstothehand.Fig.7.17showsexamplesofregionsegmentationandgroups.Notethatsomehypothesescorrespondtonon-handpartsoftheimageorforotherhandsthatmightbepresent.WeusedthesamefeaturevectorasweusedinExperiment7.6.Fig.7.18andFig.7.19showusallthecandidategroupsforoneframe.WecanseethatthelistinFig.7.18consistsoftherealhandgroupandthegroup98

PAGE 109

abc def ghiFigure7.17HCIdatasetresults.Candidategroupsofregionsgeneratedforsomeframes.Noticethereare3handsintheframe.aOriginalframe.bSegmentedimageboundary.cSegmentedimage.dPrimitivesaroundthethirdhand.ePrimitivesaroundlefthand.fPrimitivesaroundtherighthand.gThecandidategroupsforthethirdhand.hThecandidategroupsforthelefthand.iThecandidategroupsfortherighthand.99

PAGE 110

Table7.8Compareresultswithgroupingandwithoutgrouping. LowlevelGroupingNogroupingGroupingNogroupingMatchingSum-sumSum-sumMax-maxMax-max #oftotaltestframes7249724972497249#ofgroups/frame,view94229422#oftotalsampleseq.210210210210#ofcorrectsamplesseq.1935819060 thatgeneratesit.InFig.7.19,thesegroupsdonotexistbecausenogroupingisdone.ThefollowingTable7.8showsusthenumberofgroupsperframeandthenumberofcorrectlyrecognizedgestures.Withoutgrouping,allthecandidategroupsaresingletongroups,therealhandnormallyisnotincludedbecausethehandareaisfragmented.Groupingisanecessaryprocess,evenforsuchadatasetwithcolorgloves.Weconsiderrecognitionwitheachofthethreeprobabilisticmeasuresoutlinedearlier.ThecorrectrecognitionratesareshowninFig.7.20.The5approachesthetwobaselinesandthethreeHMMsonesgiveustherecognitionrates:79%,94%,91%,92%,and91%.Thisresultactuallygivesustheanswerofresearchquestion6.Fromthisresultwecanalsoobserve:1.Foreachframe,above95%ofthegroupsgeneratedwerenoisy,withsomebeingjustrandompatches.However,theircontributiontothenaloverallsequenceisquitesmall,sincetheywerenotwelllinkedacrossframes.Ourapproachallowsustorecoverfromsucherrors.However,forthecommonlyusedcolor-basedhandsegmentationapproach,ifanyoneframehasnoisyhands,therecognitionmightfail.Thisisthereasonwhytherecognitionwithhandssegmentedusingjustcolorinformationresultsinlowperformance.2.Ourapproachthataccommodatesimperfectsegmentationonlyhasa2%per-formancelosscomparedwiththeapproachwithmanualsegmentation.100

PAGE 111

aOriginalbSegmentedcBoundary dCandidategroupsfromthegroupingalgorithmFigure7.18Candidategroupsforthegesturedatasetwithgrouping.Whilegroupingwesettnum=10.Ind,thereare118groupsgenerated,thegroupwitharectangleistherealhand,andthegroupwithacircleistheonethattherealhandisgeneratedfrom.101

PAGE 112

aOriginalbSegmentedcBoundary dCandidategroupswithoutgroupingFigure7.19Candidategroupsforthegesturedatasetwithoutgrouping.Whilegroup-ingwesettnum=1.Ind,sincethereisnogroupingprocess,justasegmentation,wecanseethehandarehighlyfragmented,withoutgroupingwecannotgettherealhandinthelist. Figure7.20Recognitionofhandgesturesforvedierentapproaches.Thersttwoarebasedonmanualandcolor-basedsegmentationofthehands.Thenextthreedoesnotusetheknowledgeofthehandcolorandtakeintoaccountfragmentedobservations.Thethreecorrespondstothethreedierentkindsofprobabilitiesthatcanbecomputed,Pmax;sum;Psum;sum;andPmax;maxusingtheHMMproposedinthiswork.102

PAGE 113

Figure7.21Recognitionperformanceofeachhandgesture.Theyareconductedbyusingsummed-summedapproachandmanualsegmentation.Fig.7.21andFig.7.22showtherecognitionrateonaper-gestureandper-subjectbasis.Wecanseethemajorityoferrorscomefromonesubjectandthethreegesturesthatcanbeeasilymixedup.Subject1performedeachgesturewithlargermotionthantheothersubjectsinthetrainingdata.Suchacaseishardtoimprovebyusingonlythepositionfeatures,hencesubject1producedthemajorityoftheerrors.Amongthegestures,rotatefront,pushandrotaterightallhavemotionsmovingforwardandbackward.Thereareonlysubtleorientationchangeinthepalm.Hencetheseactionsproducedmajoritiesoftheerrors.However,theperformancemeasureofinterestforthisworkishowwelltherecognitionratewithmultiplegroupedobservationsmatchthatwithperfectsegmentation.Onthisaccount,theperformanceisquitestrong.Fig.7.23showsavisualexampleoftheoptimalgroupsselectedforthebestmatchcorrespondingtotherotatebackaction.Therearetwopartstothemovement,backwardsandforwards.Fig.7.23aandbshowstheselectedgroupsforthesetwopartsoverlaidoneachother.Fig.7.23cshowstheXhorizontalcoordinates103

PAGE 114

Figure7.22Recognitionperformanceofeachsubjectseparately.Theyareconducted byusingsummed-summedapproachandManualSegmentation. oftherevealedhandbyusingtheoptimalstateandsequencepairapproach,wecan seethenatureofthechangeofXcoordinatesmatchthehandpositions.Theindexed forwardapproachproducessimilarresults.Afterthis,Fig.7.24showsthetracking resultsoftheother7gestures. 104

PAGE 115

ab cFigure7.23Theoptimalgroupscorrespondingtooneofthehands.ItisdonebyusingtheHMM.aistherstpartofthe"rotateback"gesture.bisthesecondpartofthe"rotateback"gesture.cisthecomputedhorizontalpositionofthehand.105

PAGE 116

rst-DOWNtrackrst-DOWNtracksecond-DOWNlast-DOWN rst-FRONTtrackrst-FRONTtracksecond-FRONTlast-FRONT rst-LEFTtrackrst-LEFTtracksecond-LEFTlast-LEFT rst-PUSHtrackrst-PUSHtracksecond-PUSHlast-PUSH rst-RIGHTtrackrst-RIGHTtracksecond-RIGHTlast-RIGHT rst-UPtrackrst-UPtracksecond-UPlast-UPFigure7.24Thetrackingresultsforthersttestinstance.Thesetestinstancesarefromtheother6gesturesandtheresultisgeneratedbymax-maxmethod.Therstcolumnistherstframeofthesequence.Thesecondcolumnisthetrackingoftherstpart,followedbythetrackingofthesecondpartinthethirdcolumn,andnallyarriveatthelastframeofthesequenceincolumn4.106

PAGE 117

CHAPTER8CONCLUSIONANDFUTUREWORKInthiswork,westrivedtoattacktwofundamentalproblemsinautomaticvideo-basedsignlanguageandgesturerecognitionsystems.Therstproblemisthemove-mentepenthesismeproblem.Thisproblemisduetoourneedtoexcludefromanalysis,theextramovementsi.e.,movementepenthesisthatsignersnaturallyhavetomaketotransitormovetheirhandsfromonesigntothenext.Ifourrecognitionsystemalsoanalyzedtheseextramovements,itmightbemisledandgenerateextrasignswheretherewerenonepresent.WeproposedanenhancedLevelBuildingalgo-rithmeLBtoattackthisproblemwithoutanyexplicitmodelingofme.Thesecondproblemisthelowlevelhandsegmentationproblem.Ambiguityofhanddetectioncanalwayshappen.Ifthehandisnotdetectedcorrectly,thehighlevelmatchingprocesswillbemisledandgeneratewrongmatchingresults.Weproposedagroupingalgorithmandmatchedthegroupswithseveralnewdecodingprocesses.Thisalgo-rithmallowedustoavoidtheneedforperfectsegmentationatthefeatureextractionlevel,andthegroupingalgorithmeectivelyreducedthechanceoflosingthetruehandwewanted.Initially,wepresentedtheenhancedLevelBuildingalgorithm,builtarounddy-namicprogramming,toaddresstheproblemofmovementepenthesisincontinuoussignsentences.Ourapproachdidnotneedtoexplicitlymodelmovementepenthe-sis.Hence,thedemandonannotatedtrainingvideodatawaslow.WecomparedtheperformanceofenhancedLevelBuildingwithclassicalLevelBuildingalgorithm,107

PAGE 118

whichhasbeenusedforconnectedwordrecognitioninspeech.Wefoundsignicantimprovements.Toovercomethelowlevelhandsegmentationerrors,weincorporatedanotherdynamicprogrammingprocess,nestedwithintherstone,tooptimizeoverpossiblechoicesfrompart-basedmultiplehandcandidates.Ourresultshaveshownthatthepart-basedcandidateapproachismorestableforacomplexandchangingbackground.Ourextensiveexperimentsdemonstratedtheeectivenessofthematch-ingbetweenthetestdataandthetrainingdata.Wedemonstratedthisbyextensivelytestingtheimportantparameters,suchastheautomaticallychosen,andthenum-berofprimitivesinagroup.InthecontextofASL,wemovedforwardtheareaofrecognitionofsignsinsentences,whileaccountingformovementepenthesis.Wealsocontributedtowardstheabilitytohandlegeneralbackgroundsandrelaxationofclothingrestrictions.ThedevelopedenhancedLevelBuildingalgorithmsolvedthegeneralproblemofrecognizingmotionpatternsfromstreamsofcompositionsofmo-tionpatternswithportions,forwhichwedidnothaveanymodel.Suchsituationscouldalsoariseinhumancomputerinteractionwhereonehastoconsidercomposi-tionsofindividualgesturesorinlongtermmonitoringofapersonperformingmultipleactivities.WealsocomparedtheeLBalgorithmwithstateoftheartlabelingalgorithmssuchasconditionalrandomeldsCRF,etc.WerstusedconditionalrandomeldsCRFformulationalongwiththeconceptofkeyframes,capturingframeswiththedistinctiveshorttermmotion,todetectandlabelmeinasignlanguagesentence.TheCRFhadtheadvantageofdirectlymodelingtheposteriorprobabilityandcouldallowanydependencebetweenthestatesandobservations,whichisdesiredforlabelingasequencewithhighlyrelatedcontextsuchasASLsentences.OurexperimentsfoundthattheCRF-basedapproachsignicantlyoutperformedanHMM-basedone.However,thiswasa2-classcase.Wethendidanexperimentbasedon40-class108

PAGE 119

modelstocomparetheperformanceofCRFandeLB,wherewecouldseeCRFdidnotworkproperly.ThiswasbecauseCRFhadalargenumberofparametersandthetrainingcouldnotguaranteeagoodpointofconvergencewhensearchingforthebestparameterset.However,oureLBwasmoreeectiveandstraightforward.Weonlyhad1parametertotrain,whichwasusedtomodeltheboundarybetweenamatchandanon-match.Wediscoveredthattheboundaryfoundinourtestwasveryeectiveatseparatingthesignsandmesegments,whichledtothecorrectlylabeledresultsintheend.Forthelowlevelsegmentationproblem,weproposedanewgroupingmethodforgestureandsignrecognitionfromvideoswhichdonotrelyonskincolormodelsandcanworkwithimperfectsegmentationofscenes.Weaddressedthehardproblemofhandsegmentationbycouplingitwithrecognition,viaanintermediategroupingprocess.Thegroupingprocessgeneratedlayersofoverlappinggroupsthatwerelinkedacrosstimeinagraphstructure.Weshowedhowthesearchfortheoptionalsequenceofgroupscouldbearrivedatwithdierentmatchingmodels.ForHMMsweshowedhowthreedierentkindsofprobabilitiescouldbecomputed,basedonmaximizationandaveragingovertheunderlyingstatesandgroups.WedemonstrateditseciencyforHCIhandactionrecognitiontasksusingapubliclyavailabledatasetspanningmultiplesubjectsandactions,againstcomplexbackgrounds.Therecognitionrateswereveryclose91%comparedto93%ofthoseachievedbymanualsegmentationandmuchbetter1%comparedto79%thanthatachievedbycolor-basedhandsegmentation.Asabyproductoftherecognitionproblem,wealsosegmentedthehandineachframe.WedemonstrateditseciencyforsignrecognitionandHCIhandactionrecognitiontasks.Asourresultsshow,usingthecoupledframework,wewerealsoabletoprovideanoverallsolutionbasedonthesegmentationandmatchingresults,andcouldimprovetheresultswhenthehandsegmentationwasnotsuccessful.109

PAGE 120

Inthiswork,wehavefocusedonthetwoproposedmatchingalgorithms.Wehavefullyinvestigatedtheimportantparametersforthematchingprocess,suchastheineLBalgorithm,andthenumberofprimitivesinthegroupingalgorithm.Inthefuture,wewillalsoneedtoinvestigatetheparametersregardingtheimagingprocessandthelowlevelprocessingprocess.Theseparametersincludethetemporal/spatialresolu-tions,lightingconditions,edgedetectionparameters,andsegmentationparameters,etc.Ontheotherhand,wefocusedonsolvingthetwoproblemsinacontinuoussignlanguagerecognitionsysteminthiswork.Inordertobuildamorecomprehensiveandrobustsystem,wewillneedtoaddressotherimportantproblemsinthefuture.WehavediscussedsomeoftheseproblemsinChapter2.Theseproblemsinclude,butarenotlimitedtotherecognitionofnon-manualaspectsofsignedsentences,recognitionofsignsmadebydierentsignersorlmedfromdierentviewanglesanddealingwiththeproblemofhowtorecognizesignsthataremadeslightlydierentlybasedonwhichsignsprecedeorfollowthemcoarticulation.Otherthanthese,oursystemcouldnotworkforrealtimevideocurrently.Ourfutureworkmayincludeaninvestigationtospeedupthematchingprocesssothatwecanworktowardsap-plicationswhichworkinrealtime.Onewaytospeeduptheprocessisbyusingamorerepresentativemodelsuchasaprobabilisticmodel.Note,althoughourcurrenteLBmatchingprocessisconductedunderanexample-baseddeterministicapproach,theeLBalgorithmcanbeextendedtoaprobabilisticmodeltofurthercapturethevariationsamongthedata,likeanormalLBdoes.Wehavesharedthesourcecodeofouralgorithms,includingtheeLBalgorithm,HiddenMarkovModelswithgroups,andtheannotationtoolsonline.Thereadercanreferto"http://gment.csee.usf.edu/ASL/"formoredetails.110

PAGE 121

REFERENCES[1]T.H.Gineke,H.Petra,andA.Tjeerd,Whydon'tyouseewhatImean?prospectsandlimitationsofcurrentautomaticsignrecognitionresearch,"SignLanguageStudies,vol.6,no.4,pp.416{437,2006.[2]C.ValliandC.Lucas,LinguisticsofAmericanSignLanguage:AResourceTextforASLUsers.GallaudetUniv.Press,1992.[3]C.VoglerandD.Metaxas,Aframeworkofrecognizingthesimultaneousas-pectsofAmericanSignLanguage,"JournalofComputerVisionandImageUn-derstandingCVIU,vol.81,no.81,pp.358{384,2001.[4]C.VoglerandD.Metaxas,ASLrecognitionbasedonacouplingbetweenHMMsand3Dmotionanalysis,"inIEEEInternationalConferenceonComputerVisionICCV,1998,pp.363{369.[5]Q.Yuan,W.Gao,H.Yao,andC.Wang,Recognitionofstrongandweakcon-nectionmodelsincontinuoussignlanguage,"inIEEEInternationalConferenceonPatternRecognitionICPR,2002,pp.75{78.[6]W.Gao,G.Fang,D.Zhao,andY.Chen,Transitionmovementmodelsforlargevocabularycontinuoussignlanguagerecognition,"inIEEEInternationalConferenceonAutomaticFaceandGestureRecognitionFGR,2004,pp.553{558.[7]R.YangandS.Sarkar,Detectingcoarticulationinsignlanguageusingcondi-tionalrandomelds,"inIEEEInternationalConferenceonPatternRecognitionICPR,2006,pp.108{112.[8]R.Yang,S.Sarkar,andB.L.Loeding,EnhancedLevelBuildingalgorithmforthemovementepenthesisprobleminsignlanguagerecognition,"inIEEEInternationalConferenceonComputerVisionandPatternRecognitionCVPR,2007.[9]R.YangandS.SarkarandB.L.Loeding,Handlingmovementepenthesisandhandsegmentationambiguitiesincontinuoussignlanguagerecognitionus-ingnesteddynamicprogramming,"submittedtoIEEETransactionsonPatternAnalysisandMachineIntelligencePAMI,2007.111

PAGE 122

[10]C.VoglerandD.Metaxas,Handshapesandmovements:Multiple-channelASLrecognition,"inLectureNotesinArticialIntelligence,2004,pp.247{258.[11]C.Vogler,H.Sun,andD.Metaxas,AframeworkformotionrecognitionwithapplicationtoAmericanSignLanguageandgaitrecognition,"inWorkshoponHumanMotionWHM,2000,pp.33{38.[12]C.Wang,W.Gao,andS.Shan,Anapproachbasedonphonemestolargevo-cabularyChineseSignLanguagerecognition,"inIEEEInternationalConferenceonAutomaticFaceandGestureRecognitionFGR,2002,pp.393{398.[13]H.Brashear,V.Henderson,K.Park,H.Hamilton,S.Lee,andT.Starner,AmericanSignLanguagerecognitioningamedevelopmentfordeafchildren,"inACMInternationalConferenceonComputersandAccessibilityASSETS,2006,pp.79{86.[14]T.StarnerandA.Pentland,VisualrecognitionofAmericanSignLanguageusingHiddenMarkovModels,"inIEEEInternationalConferenceonAutomaticFaceandGestureRecognitionFGR,1995,pp.189{194.[15]M.Kadous,MachinetranslationofAUSLANsignsusingpowergloves:Towardslargelexicon-recognitionofsignlanguage,"inWorkshopontheIntegrationofGestureinLanguageandSpeechWIGLS,1996,pp.165{174.[16]B.Bauer,H.Hienz,andK.Kraiss,Video-basedcontinuoussignlanguagerecog-nitionusingstatisticalmethods,"inIEEEInternationalConferenceonPatternRecognitionICPR,vol.2,2000,pp.2463{2466.[17]B.BauerandK.Kraiss,Video-basedsignrecognitionusingself-organizingsub-units,"inIEEEInternationalConferenceonPatternRecognitionICPR,vol.2,2002,pp.434{437.[18]Y.CuiandJ.Weng,Appearance-basedhandsignrecognitionfromintensityim-agesequences,"JournalofComputerVisionandImageUnderstandingCVIU,vol.78,no.2,pp.157{176,2000.[19]A.HoogsandJ.Mundy,Anintegratedboundaryandregionapproachtoper-ceptualgrouping,"inIEEEInternationalConferenceonPatternRecognitionICPR,2000,pp.284{290.[20]Y.SatoandT.Kobayashi,ExtensionofHiddenMarkovModelstodealwithmultiplecandidatesofobservationsanditsapplicationtomobile-robot-orientedgesturerecognition,"inIEEEInternationalConferenceonPatternRecognitionICPR,2002,pp.515{519.112

PAGE 123

[21]J.Alon,V.Athitsos,Q.Yuan,andS.Sclaro,Simultaneouslocalizationandrecognitionofdynamichandgestures,"inIEEEWorkshoponMotionandVideoComputingWACV/MOTION,vol.2,2005,pp.254{260.[22]R.YangandS.Sarkar,GesturerecognitionusingHiddenMarkoveModelsfromfragmentedobservations,"inIEEEInternationalConferenceonComputerVi-sionandPatternRecognitionCVPR,2006.[23]R.YangandS.Sarkar,Coupledgroupingandmatchingforsignandgesturerecognition,"SubmittedtoJournalofComputerVisionandImageUnderstandingCVIU,2007.[24]R.Yang,S.Sarkar,B.L.Loeding,andA.I.Karshmer,Ecientgenerationoflargeamountoftrainingdataforsignlanguagerecognition:Asemi-automatictool,"inInternationalConferenceonComputersHelpingPeoplewithSpecialNeedsICCHP,2006.[25]C.SylvieandS.Ranganath,Automaticsignlanguageanalysis:Asurveyandthefuturebeyondlexicalmeaning,"IEEETransactionsonPatternAnalysisandMachineIntelligencePAMI,vol.27,no.6,pp.873{891,2005.[26]C.BahlmannandH.Burkhardt,MeasuringHMMsimilaritywiththebayesprobabilityoferroranditsapplicationtoonlinehandwritingrecognition,"inIn-ternationalConferenceonDocumentAnalysisandRecognitionICDAR,2001.[27]C.VoglerandD.Metaxas,ParallelHiddenMarkovModelsforAmericanSignLanguagerecognition,"inIEEEInternationalConferenceonComputerVisionICCV,1999,pp.116{122.[28]A.McCallum,D.Freitag,andF.Pereira,MaximumentropyMarkovmodelsforinformationextractionandsegmentation,"inIEEEInternationalConferenceonMachineLearningICML,2000.[29]A.Quattoni,S.Wang,L.Morency,M.Collins,andT.Darrell,Hiddencon-ditionalrandomelds,"IEEETransactionsonPatternAnalysisandMachineIntelligencePAMI,vol.29,pp.1848{1852,2007.[30]L.Morency,A.Quattoni,andT.Darrell,Latent-dynamicdiscriminativemod-elsforcontinuousgesturerecognition,"inIEEEInternationalConferenceonComputerVisionandPatternRecognitionCVPR,2007,pp.1{8.[31]W.Gao,J.Ma,J.Q.Wu,andC.L.Wang,SignlanguagerecognitionbasedonHMM/ANN/DP,"InternationalJournalonPatternRecognitionandArticialIntelligenceIJPRAI,vol.14,no.5,pp.587{602,2000.113

PAGE 124

[32]M.Yang,N.Ahuja,andM.Tabb,Extractionof2Dmotiontrajectoriesanditsapplicationtohandgesturerecognition,"IEEETransactionsonPatternAnalysisandMachineIntelligencePAMI,vol.24,no.8,pp.1061{1074,2002.[33]A.Efros,A.Berg,G.Mori,andJ.Malik,Recognizingactionatadistance,"inIEEEInternationalConferenceonComputerVisionICCV,2003.[34]Z.ManorandM.Irani,Event-basedanalysisofvideo,"inIEEEInternationalConferenceonComputerVisionandPatternRecognitionICPR,2001.[35]A.BobickandA.Wilson,Astatebasedapproachtotherepresentationandrecognitionofgesture,"IEEETransactionsonPatternAnalysisandMachineIntelligencePAMI,vol.19,no.12,pp.1325{1337,1997.[36]C.Vogler,AmericanSignLanguagerecognition:Reducingthecomplexityofthetaskwithphoneme-basedmodelingandparallelHiddenMarkovModels,"Ph.D.dissertation,Univ.ofPennsylvania,2003.[37]B.BauerandK.F.Kraiss,Towardsanautomaticsignlanguagerecognitionsystemusingsubunits,"inInternationalGestureWorkshop,2002,pp.64{75.[38]M.Brand,N.Oliver,andA.Pentland,CoupledHiddenMarkovModelsforcomplexactionrecognition,"inIEEEInternationalConferenceonComputerVisionandPatternRecognitionCVPR,1996.[39]C.Bregler,Learningandrecognizinghumandynamicsinvideosequences,"inIEEEInternationalConferenceonComputerVisionandPatternRecognitionCVPR,1997.[40]S.GongandT.Xing,Recognitionofgroupactivitiesusingdynamicprobabilis-ticnetworks,"inIEEEInternationalConferenceonComputerVisionICCV,2003.[41]C.Sminchisescu,A.Kanaujia,Z.Li,andD.Metaxas,Conditionalrandomeldsforcontextualhumanmotionrecognition,"inIEEEInternationalConferenceonComputerVisionICCV,2005,pp.1808{1815.[42]K.Derpanis,Areviewofvision-basedhandgestures,"2004,http://www.cvr.yorku.cavisitedatJun2005.[43]V.Pavlovic,R.Sharma,andT.Huang,Visualinterpretationofhandges-turesforhuman-computerinteraction:Areview,"IEEETransactionsonPatternAnalysisandMachineIntelligencePAMI,vol.19,no.7,pp.677{695,1997.[44]J.Yamato,J.Ohya,andK.Ishii,Recognizinghumanactionintime-sequentialimagesusingHiddenMarkovModels,"inIEEEInternationalConferenceonComputerVisionandPatternRecognitionCVPR,1992,pp.379{385.114

PAGE 125

[45]M.YangandN.Ahuja,Recognizinghandgesturesusingmotiontrajectories,"inIEEEInternationalConferenceonComputerVisionandPatternRecognitionCVPR,1999,pp.466{472.[46]J.Schlenzig,E.Hunter,andR.Jain,RecursiveidenticationofgestureinputsusingHiddenMarkovModels,"inWorkshoponApplicationsofComputerVisionWACV,1994,pp.187{194.[47]J.SiskindandQ.Morris,Amaximum-likelihoodapproachtovisualeventclas-sication,"inEuropeanConferenceonComputerVisionandPatternRecognitionECCV,1996,pp.347{360.[48]T.StarnerandA.Pentland,Real-timeAmericanSignLanguagerecognitionfromvideousingHiddenMarkovModels,"inSymposiumonComputerVisionSCV,1995,pp.265{270.[49]T.DarrellandA.Pentland,Space-timegestures,"inIEEEConferenceonCom-puterVisionandPatternRecognitionCVPR,1993,pp.335{340.[50]E.KeoghandM.Pazzani,DerivativeDynamicTimeWarping,"inSIAMIn-ternationalConferenceonDataMining,2001.[51]P.Hong,M.Turk,andT.Huang,Gesturemodelingandrecognitionusing-nitestatemachines,"inIEEEInternationalConferenceandGestureRecognitionICGR,2000,pp.410{415.[52]C.Rao,A.Yilmaz,andM.Shah,View-invariantrepresentationandrecognitionofactions,"InternationalJournalofComputerVisionIJCV,vol.50,no.2,2002.[53]K.Imagawa,S.Lu,andS.Igi,Color-basedhandtrackingsystemforsignlan-guagerecognition,"inIEEEInternationalConferenceonAutomaticFaceandGestureRecognitionFGR,1998,pp.462{467.[54]T.Starner,J.Weaver,andA.P.Pentland,Real-timeAmericanSignLanguagerecognitionusingdeskandwearablecomputerbasedvideo,"IEEETransactionsonPatternAnalysisandMachineIntelligencePAMI,vol.20,no.12,pp.1371{1375,1998.[55]J.Zieren,N.Unger,andS.Akyol,Handstrackingfromfrontalviewforvision-basedgesturerecognition,"inDAGMSymposiumonPatternRecognition,2002.[56]J.Terrillon,A.Pilpre,Y.Niwa,andK.Yamamoto,Robustfacedetectionandhandposturerecognitionincolorimagesforhuman-machineinteraction,"inIEEEInternationalConferenceonPatternRecognitionICPR,2002.115

PAGE 126

[57]Y.Xiong,B.Fang,andF.Quek,Extractionofhandgestureswithadaptiveskincolormodelsanditsapplicationstomeetinganalysis,"inInternationalSymposiumonMultimedia,2006.[58]C.HuangandW.Huang,Signlanguagerecognitionusingmodel-basedtrack-inganda3Dhopeldneuralnetwork,"JournalofMachineVisionApplication,vol.10,no.5-6,pp.292{307,1998.[59]Y.Azoz,L.Devi,M.Yeasin,andR.Sharma,Trackingthehumanarmusingconstraintfusionandmultiple-cuelocalization,"MachineVisionandApplica-tions,vol.13,no.5-6,pp.286{302,2003.[60]Y.ChenandT.Tseng,Multiple-anglehandgesturerecognitionbyfusingsvmclassiers,"inIEEEInternationalConferenceonAutomationScienceandEn-gineering,2007.[61]H.Graf,E.Cosatto,D.Gibbon,M.Kocheisen,andE.Petajan,Multimodalsystemforlocatingheadsandfaces,"inIEEEInternationalConferenceonAu-tomaticFaceandGestureRecognitionFGR,1996,pp.88{93.[62]Q.Chen,N.Georganas,andE.Petriu,Real-timevision-basedhandgesturerecognitionusinghaar-likefeatures,"inIEEEConferenceonInstrumentationandMeasurementTechnology,2007.[63]X.Chen,X.Zhang,Z.Zhao,J.Yang,V.Lantz,andK.Wang,Handgesturerecognitionresearchbasedonsurfaceemgsensorsand2D-accelerometers,"inIEEEInternationalSymposiumonWearableComputersICSW,2007.[64]L.DingandA.Martinez,RecoveringthelinguisticcomponentsofthemanualsignsinAmericanSignLanguage,"inIEEEConferenceonAdvancedVideoandSignal-basedSurveillanceAVSS,2007.[65]H.Guan,R.Feris,andM.Turk,Theisometricself-organizingmapfor3Dhandposeestimation,"inIEEEInternationalConferenceonAutomaticFaceandGestureRecognitionFGR,2006.[66]W.Grimson,ObjectRecognitionbyComputer:TheRoleofGeometricCon-straints.Boston,MA:MITPress,1991.[67]D.ClemensandD.Jacobs,Spaceandtimeboundsonindexing3Dmodelsfrom2Dimages,"IEEEtransactionsonPatternRecognitionsandMachineIn-telligencePAMI,vol.13,no.10,pp.1007{1017,1991.[68]Z.Tu,X.Chen,A.Yuille,andS.Zhu,Imageparsing:unifyingsegmentation,detection,andrecognition,"inIEEEInternationalConferenceonComputerVi-sionICCV,2003,pp.18{25.116

PAGE 127

[69]P.SrinivasanandJ.Shi,Bottom-uprecognitionandparsingofthehumanbody,"inIEEEInternationalConferenceonComputerVisionandPatternRecognitionCVPR,2007,pp.1{8.[70]L.RabinerandB.Juang,FundamentalsofSpeechRecognition.PTRPrenticeHall,1993.[71]H.SilvermanandD.Morgan,Theapplicationofdynamicprogrammingtoconnectedspeechrecognition,"IEEEASSPMagazine,vol.26,no.6,pp.575{582,1990.[72]M.JonesandJ.Rehg,Statisticalcolormodelswithapplicationtoskinde-tection,"InternationalJournalofComputerVisionIJCV,vol.46,no.1,pp.81{96,2002.[73]I.RobledoandS.Sarkar,Representationoftheevolutionoffeaturerelation-shipstatistics:Humangait-basedrecognition,"IEEETransactionsonPatternAnalysisandMachineIntelligencePAMI,vol.25,no.10,pp.1323{1328,2003.[74]D.ComanicuandP.Meer,Meanshift:Arobustapproachtowardfeaturespaceanalysis,"IEEETransactionsonPatternAnalysisandMachineIntelli-gencePAMI,vol.24,pp.603{619,2002.[75]J.Laerty,A.McCallum,andF.Pereira,Conditionalrandomelds:Proba-bilisticmodelsforsegmentingandlabelingsequencedata,"inIEEEInternationalConferenceonMachineLearningICML,2001.[76]H.Zhong,M.Visontai,andJ.Shi,Detectingunusualactivityinvideo,"inIEEEInternationalConferenceonComputerVisionandPatternRecognitionCVPR,vol.2,2004,pp.819{826.[77]S.SarkarandK.Boyer,Quantitativemeasuresofchangebasedonfeatureorganization:Eigenvaluesandeigenvectors,"inIEEEInternationalConferenceonComputerVisionandPatternRecognitionCVPR,1996,pp.478{483.[78]A.JustandS.Marcel,Two-handedgesturerecognition,"IDIAPResearchIn-stituteCH-1920,Tech.Rep.,2005.[79]H.Brashear,T.Starner,P.Lukowicz,andH.Junker,Usingmultiplesensorsformobilesignlanguagerecognition,"inIEEEInternationalSymposiumonWear-ableComputersISWC,2003,pp.45{52.[80]V.Levenshtein,Binarycodescapableofcorrectingdeletions,insertionsandreversals,"DokladyAkademiiNaukSSSR,vol.1634,pp.845{848,1965.117

PAGE 128

[81]B.L.Loeding,S.Sarkar,A.Parashar,andA.I.Karshmer,Progressinauto-matedcomputerrecognitionofsignlanguage,"inLectureNotesinComputerScience,vol.3118,2004,pp.1079{1087.[82]C.Sminchisescu,A.Kanaujia,andD.Metaxas,Conditionalmodelsforcontex-tualhumanmotionrecognition,"JournalofComputerVisionandImageUnder-standingCVIU,vol.104,no.2,pp.210{220,2006.[83]A.Parashar,Representationandinterpretationofmanualandnon-manualin-formationforautomatedAmericanSignLanguagerecognition,"Master'sthesis,UniversityofSouthFlorida,2003.[84]T.Volkmer,J.R.Smith,andA.Natsev,Aweb-basedsystemforcollaborativeannotationoflargeimageandvideocollections:anevaluationanduserstudy,"inACMInternationalConferenceonMultimedia,2005,pp.892{901.[85]L.V.AhnandL.Dabbish,Labelingimageswithacomputergame,"inACMInternationalConferenceonHumanFactorsinComputingSystemsCHI,2004,pp.319{326.[86]T.PfundandS.M.Maillet,Adynamicmultimediaannotationtool,"inPro-ceedingsofSPIEPhotonicsWest,ElectronicImaging2002,2002,pp.216{224.[87]D.DoermannandD.Mihalcik,Toolsandtechniquesforvideoperformanceevaluation,"inIEEEInternationalConferenceonPatternRecognitionICPR,2000,pp.167{170.[88]B.MarcoteguiandP.Correia,Avideoobjectgenerationtoolallowingfriendlyuserinteraction,"inIEEEInternationalConferenceonImageProcessingICIP,1999,pp.391{395.118

PAGE 129

APPENDICES119

PAGE 130

AppendixADataCollection FigureA.1SetupforcapturingdatasetD2andD3.Inourexperiment,D1wascapturedfrom[83].D4wascapturedandsharedby[78].Fig.A.1showsusthecapturingprocessofD2andD3.WeonlyusedthedatafromCameraGintheexperimentshowninthiswork.Whilecapturingthevideos,thesignerwasstandingatthecenter.2500wattwhitebulbswereusedwithawhiteumbrellatosustainillumination.Asideviewcamerawassetupatthemain-handsideofthesigner.Ewasadragonycameraforthefacesequence,whichwassetupattheeyelevelofthesigner,withasmallangle.Gwasanotherdragonycameraforthefrontviewsequence,whichwasalsosetupattheeyelevelofthesigner.FwassetupjustinfrontofG,whichwasastereobumblebeecamera.Thiscamerawassetupatthenecklevelofthesigner.ThevideoswerestreamedintoanSCSIdriveusinganMPEG4encoderwith2Mbits/sbitrate,theframeratewassettobe30,andtheresolutionofthecamerawas120

PAGE 131

AppendixAContinuedsetto640x480pixelswith24bitdepth.APGRsynchronizerwasusedtosynchronizethe2dragonycamerasandthe2-viewstereobumblebeecamera.121

PAGE 132

AppendixBGroundtruthToolsWeprovidedasemiautomaticannotationtooltogroundtruthASLimagese-quences.Acandidatehandgeneratorwasappliedbyusingthemeanshiftimagesegmentationalgorithmandagreedyseedsgrowingalgorithm.Afteranumberofhandcandidatesaregenerated,theusercanreducethethenumberofcandidatesbysimpleinteractionmouseclick.Thetoolalsoprovidedahandtrackingfunctionforfasterprocessingandafacedetectionfunctionfornon-manualsignalgroundtruthingpurposes.Inaddition,weprovidedatwo-passgroundtruthingschemeunlikeothergroundtruthingtoolsthatonlydoone-pass.Ourrstpassprocessingwasautomatic,andthesecondpasswassemiautomaticbasedontherstpass'sresult.Wewereawareofmanyothervideoannotationtoolsforgroundtruthingpurposes.However,mostofthemfocusonscenesegmentationorkeyframedetection,e.g.IBMEVA[84],ESPGame[85].Onlyfewofthemfocusonlocalfeatureextractionandtemporaltrackingtogether.Forexample,TheVIPERannotationToolproposedbyPfund[86]providedimagesegmentation,temporalsegmentationandevenannotationtogether.ViPer-GTproposedbyDoermann[87]candetectmultipleobjectsandtrackthemusingboundingboxautomatically,Marcoteguietal.proposedVOGUE[88],whereanumberofimageandvideosegmentationtechniqueswereincorporatedforobjectannotationpurpose.Allofthesetoolsarestandaloneapplicationsprovidingsemiautomaticgroundtruthingfunctionwithfriendlyuserinterface.OurannotationtoolsSignGTwereaside-productofourvision-basedASLrecog-nitionsystem.Italsoprovidedasemiautomaticschemeforecientgroundtruthingpurposes.However,itsmainpurposewastosegmentthehandpixelsframebyframe.Insteadofusingtheexistingsegmentationandtrackingalgorithmasintheexistingtools,weadvocatedacandidatehandgeneratorapproachthatwasmorereliabledur-122

PAGE 133

AppendixBContinuedinghandshapechange,andhandcrossingfacesituation.Unliketheexistingtoolswhereonlyone-passisconducted,weoeredatwo-passschemeforfasterprocessing,wheretherstpassgeneratedthecandidatehandsautomatically.Fig.B.1illustratesthetwo-passscheme.InFig.B.1a,ASLvideoframeswererstlysegmentedintoseedprimitives.Theseprimitivesweregroupedbyagroupingenginetogenerateoverlappedcandidatehandgroups,whereeachgroupmayconsistofoneormoreprimitives.Thisstepwasautomaticandnouserinteractionwasinvolved.InFig.B.1b,wherethesecondpasswastaken,thegroupedresultswasloadedbackfortheuser'sexamination.Notethenumberofgeneratedgroupscouldbehuge.Hence,weallowedtheusertomouseclickthehandregionandreducethenumberofcandidatestobeexamined.Atthesametime,atrackingmethodwasalsoincorporatedamongadjacentframestoimproveeciency.Afterthecandidategroupsandtheirlinksweregenerated,userinteractionwasneededtoselectthebestgroupandguardthetrackingresult.Weprovidedasetoffunctionalitieswhichspecicallyworkwithsignsentences.ThesefunctionalitieswerebuiltuponthecandidategeneratordiscussedinChapter5,asimpletrackingtechniquethatworkswiththelinksbetweenthecandidategroups,afacedetectorfornon-manualinformationanalysis,aglossingtool,andvariouselementsthatfacilitatesthehandgroundtruthing.TheapplicationwascodedundertheMicrosoftVisualStudioEnvironment,usingMFCclass,OpenCVlibrariesandrelatedwindowsAPIs.Fig.B.2illustratesustheGUI.ImportantfunctionsrelatedtoASLweresupportedas:1.3viewsofcurrentframebeingprocessed:therstisthedominanthandview,thesecondisthenon-dominanthandview,andthethirdistheviewforboth.123

PAGE 134

AppendixB(Continued) (a)Firstpass:grouping (b)Secondpass:selection FigureB.1Overviewofthetwo-passapproach. 124

PAGE 135

AppendixBContinued2.Clicktoselect:onecanclickonthehandareatogivethedirectionofthegroupstobeshown.3.Missinghandcheckbox:onecanchoosethecurrenthandasmissingifthehandisoutofscene.4.Glossingtextbox:onecaninputtheglossforthecurrentframe.Theglosswaspropagatedtothenextframeautomatically.5.2handlistboxes:thetwolistboxesbelowshowthelistofthecandidatedom-inanthandsandcandidatenon-dominanthands.Thelistwasrankedbytheirboundarysmoothnessandthetrackingresult.6.Facedetector:automaticfacedetector,shownastheblueboundingbox.7.Play,stop,stepbutton:pressingtheplaybuttonwillautomaticallytrackallofthehandsandsavetheresult.Stopbuttonwillstopthetracking.Stepbuttonallowedtheapplicationtrackoneframeandwaitfortheuser'sresponse,whichwasmostoftenused.8.Redocheckbox:redocheckboxallowsonetore-detectthecurrentsequence.WeranourSignGTwith2datasetswithdierentparametersettings.BothofthemconsistedofASLsignsentences.Therstdatasethadsimplebackground,theresolutionwas460x290andtherewere10675frames.Onaveragetherewere100candidatehandsgeneratedforthisdataset.Andittookuslessthan8hourstonishgroundtruthingbothhands.Theseconddatasethadacomplexbackgroundwith640x480resolution.Therewere500candidatehandsgenerated.Wetook500125

PAGE 136

AppendixBContinued FigureB.2Thegraphicuserinterface.framesandgroundtrutheditwithin1hour.Notethetimewerefertohereistheuserinteractiontime.Thatis,thetimeofthesecondpass.InFig.B.3a,weshowtheperformanceoverthe2datasetsofthetwopasses.WeusedaP42.4GCPUwith4Gmemory.Thenumbershownisthetimetakenperframe.Ourrstpasstookrelativelylongersinceweincorporatedautomaticsegmentation,facedetection,andthegreedyseedsgrowingalgorithm.However,itwascompletelytakeno-line.Thesecondpasswasdonebyreloadingthecandidateresult.Ittookmuchshortertime.Ontheotherhand,Fig.B.3bshowsusthetimetakenforadierentcongu-rationofthetool.Wechose500framesfromthesimplebackgroundsettodotheexperiment.Herewerefer"A"asthemethodtoonlyusethegeneratedresult,"B"referstousingtheclick-to-selectmethodtoreducethecandidateset,"C"referstous-126

PAGE 137

AppendixB(Continued) (a ) (b) FigureB.3Performanceoftheannotationtool.(a)Performanceoffrstpassand secondpass.(b)Performanceofdierentmethodsets. ingthecandidatewithtrackingmethod,"D"referstothemethodofusingcandidate withbothclick-to-selectandtrackingmethods.Trackingcontributedalotbecause itexploitedthetemporalrelationship.Theclick-to-selectmethoddidhelpespecially whentrackingfailed.Forexamplewhenlargemotionhappened,handshapechanged drasticallyandocclusionhappened. Fig.B.4showsussomevisualresultofthegeneratedcandidatehands.Wecan seeourmethodcangetthehandshapeoutevenwhenthereissignifcantocclusion andoverlaps.Inparticular,Fig.B.4(c)showsustheresultwherefacecrosseshand, andFig.B.4(f)showsuswherethetwohandscrosseachother. 127

PAGE 138

AppendixBContinued abc defFigureB.4Illustrationofthemultiplecandidates.aOriginalframe1.bSeg-mentedframe1.cListofcandidategroupsinframe1.dOriginalframe2.eSegmentedframe2.fListofcandidategroupsinframe2.128

PAGE 139

AppendixCTextCorpusHereweshowthetextsequenceweusedwhenwerecognizedthecontinuoussen-tenceonthedatasetD1,D2,D3.TableC.1andC.2showsustheextended150sentencesweusedasthetextcorpuscorrespondingtoD1."Extended"meanswehadextranumber5ofsentencecomparedtotheoriginalnumberoftestsentences.TableC.3andC.4showsusthenon-extendedsentencesweusedasthetextcorpuscorrespondingtoD2andD3.129

PAGE 140

AppendixCContinuedTableC.1SequencesusedasthetextcorpusinD1. LIPREADCANILIPREADAGAINCANIUNDERSTANDCANIWAITCANIWAITAGAINCANIMYTICKETGIVECANIMYTICKETGIVEAGAINCANIMYIDPAPERGIVECANIMYIDPAPERGIVEAGAINCANIMYPHONEGIVECANIMYPHONEGIVEAGAINCANITICKETBUYAGAINCANITICKETBUYCANISUITCASEPACKAGAINCANISUITCASEPACKCANIFINISHCANIPHONEBUYAGAINCANIPHONEBUYCANILIPREADCANNOTILIPREADAGAINCANNOTIUNDERSTANDCANNOTIWAITCANNOTIWAITAGAINCANNOTIMYTICKETGIVECANNOTIMYTICKETGIVEAGAINCANNOTIMYIDPAPERGIVECANNOTIMYIDPAPERGIVEAGAINCANNOTITICKETBUYAGAINCANNOTITICKETBUYCANNOTISUITCASEPACKAGAINCANNOTISUITCASEPACKCANNOTIFINISHCANNOTIPHONEBUYAGAINCANNOTIPHONEBUYCANNOTISUITCASESUITCASEMEANWHATIUNDERSTANDYOUUNDERSTANDYOUUNDERSTANDMEIUNDERSTANDYOUIUNDERSTANDTHATYOUUNDERSTANDTHATSUITCASEIPACKFINISHWAITIFINISHMYTICKETJUSTGIVEIFINISHMYTICKETGIVEFINISHTICKETIGIVEFINISHMYPHONEIJUSTGIVEFINISHIGIVEMYTICKETFINISHIJUSTGIVEMYTICKETFINISHIJUSTGIVETICKETFINISHIJUSTGIVEMYPHONEFINISHSUITCASEMOVEIFINISHIFINISHMOVESUITCASESUITCASEIMOVEFINISHSUITCASEIFINISHMOVEDONTKNOWIDONTKNOWWHEREIIDONTKNOWWHEREIDONTKNOWIDPAPERSWHEREIDONTKNOWTABLEWHEREIDONTKNOWKEYWHEREIDONTKNOWPHONEWHEREIDONTKNOWSUITCASEWHEREIDONTKNOWAIRPLANEWHEREIDONTKNOWTICKETWHEREIDONTKNOWPEOPLEWHEREIDONTKNOWKEYWHEREDONTKNOWWHEREIDONTKNOWIDPAPERSWHEREIDONTKNOWTABLEWHEREIDONTKNOWKEYWHEREIDONTKNOWPHONEWHEREIDONTKNOWSUITCASEWHEREIDONTKNOWAIRPLANEWHEREIDONTKNOWTICKETWHEREIDONTKNOWPEOPLEWHEREIDONTKNOWKEYWHEREIINOTHAVEKEYINOTHAVESUITCASE 130

PAGE 141

AppendixCContinuedTableC.2MoresequencesusedasthetextcorpusinD1. INOTHAVEPHONEINOTHAVEIDPAPERINOTHAVETICKETTHATMYTHATSUITCASEMYTHATTICKETMYITPHONEMYITIDPAPERMYITINEEDTHATIINEEDMYPHONEINEEDMYSUITCASEINEEDMYTICKETINEEDMYIDPAPERINEEDPHONEINEEDSUITCASEINEEDTICKETINEEDIDPAPERMYPHONEINEEDMYSUITCASEINEEDMYTICKETINEEDMYIDPAPERINEEDPHONEINEEDSUITCASEINEEDTICKETINEEDIDPAPERINEEDMYPHONENEEDMYPHONENEEDIMYSUITCASENEEDMYTICKETNEEDMYIDPAPERNEEDPHONENEEDSUITCASENEEDTICKETNEEDIDPAPERNEEDWHYMEANSUITCASEWHERETICKETWHEREIDPAPERWHEREPHONEWHEREAIRPLANEWHEREMYSUITCASEWHEREMYTICKETWHEREMYIDPAPERWHEREMYPHONEWHERESUITCASEMOVECANITICKETGIVECANIAIRPLANEPOSTPONEAGAINMADIAIRPLANEPOSTPONEAGAINIMADAIRPLANEAGAINPOSTPONEIMADAIRPLANEAGAINPOSTPONEMADITICKETBUYFINISHPEOPLELONGLINEWAITANGRYGATEWHERESUITCASEYESNOMYTICKETJUSTGIVEMYIDPAPERJUSTGIVETICKETJUSTGIVEIDPAPERJUSTGIVEIDPAPERWHEREIDPAPERTABLEPHONETHATTABLETHATSUITCASETHATTABLETHATIDTHATTABLETHATPAPERTHATTABLETHATPHONETHATTABLETHATTICKETTHATTABLETHATIDPAPERTHATTABLETHAT 131

PAGE 142

AppendixCContinuedTableC.3SequencesusedasthetextcorpusinD2. NOWIBUYTICKETFINISHIWANTBUYTICKETIWHEREWHYSHOULDIBUYTICKETWHYWHYCANNOTTHATBUYTICKETWHYFINISHBUYTICKETIFINISHBUYTICKETFROMAGENTFINISHWHERECANIBUYTICKETWHERENOWIBUYTICKETFINISHFINISHBUYTICKETFROMAGENTWHERECANIBUYTICKETWHERENOWIBUYTICKETFINISHFINISHBUYTICKETNOWFINISHYOUCANBUYTHATFORTHATICANBUYTICKETFORTHATITHATFINISHBUYTICKETTHATWHYCANNOTTHATBUYTICKETTHATBUYTICKETNOWFINISHYOUCANBUYALSOFORTHATWHERECANIBUYTICKETWHEREWANTBUYTICKETIWHEREAGENTWHEREWHYCANNOTIBUYTICKETITHATFINISHBUYTICKETTHAT TableC.4SequencesusedasthetextcorpusinD3. CANIBUYTICKETFINISHBUYMYTICKETICANUNDERSTANDYOUIIDPAPERTHATTABLETHATIDPAPERTHATTABLEIHAVEPAPERTICKETINOTUNDERSTANDYOUIIUNDERSTANDYOUIMYTICKETTABLETHATNEEDBUYTICKETNOTUNDERSTANDITABLETHATTABLEWHERETICKETTHATTABLEUNDERSTANDIWHEREMYIDPAPERWHEREYOUNOTUNDERSTANDIYOUYOUUNDERSTANDI 132

PAGE 143

ABOUTTHEAUTHORRuiduoYangreceivedaBachelorofScienceDegreeinComputerSciencefromPekingUniversity,Beijing,Chinain2001andaMasterofPhilosophyDegreeinCom-puterScienceandEngineeringfromHongKongUniversityofScienceandTechnol-ogy,HongKong,Chinain2003.HeiscurrentlyaPhDCandidateinthedepartmentofComputerScienceandEngineeringinUniversityofSouthFlorida.HisresearchinterestsincludeSignlanguage/GestureRecognition,MachineLearning,SequenceRecognition,VideoAnalysisandVideoCoding.Hehasauthored/coauthoredmorethan10publicationsintheeldofPatternRecognitionandVideoprocessing.HeservedasareviewerforInternationalJournalofPatternRecognitionandArticialIntelligenceIJPRAIandJournalofPatternRecognition.