USF Libraries
USF Digital Collections

Representation and learning for sign language recognition

MISSING IMAGE

Material Information

Title:
Representation and learning for sign language recognition
Physical Description:
Book
Language:
English
Creator:
Nayak, Sunita
Publisher:
University of South Florida
Place of Publication:
Tampa, Fla
Publication Date:

Subjects

Subjects / Keywords:
American Sign Language
Gestures
Human-human interaction
Probabilistic distances
Motion
Dissertations, Academic -- Computer Science and Engineering -- Doctoral -- USF   ( lcsh )
Genre:
non-fiction   ( marcgt )

Notes

Summary:
ABSTRACT: While recognizing some kinds of human motion patterns requires detailed feature representation and tracking, many of them can be recognized using global features. The global configuration or structure of an object in a frame can be expressed as a probability density function constructed using relational attributes between low-level features, e.g. edge pixels that are extracted from the regions of interest. The probability density changes with motion, tracing a trajectory in the latent space of distributions, which we call the configuration space. These trajectories can then be used for recognition using standard techniques such as dynamic time warping.Can these frame-wise probability functions, which usually have high dimensionality, be embedded into a low-dimensional space so that we can still estimate various meaningful probabilistic distances in the new space? Given these trajectory-based representations, can one learn models of signs in an unsupervised manner? We address these two fundamental questions in this dissertation. Existing embedding approaches do not extend easily to preserve meaningful probabilistic distances between the samples. We present an embedding framework to preserve the probabilistic distances like Chernoff, Bhattacharya, Matusita, KL or symmetric-KL based on dot-products between points in this space. It results in computational savings.We experiment with the five different probabilistic distance measures and show the usefulness of the representation in three different contexts - sign recognition of 147 different signs (with large number of possible classes), gesture recognition with 7 different gestures performed by 7 different persons (with person variations) and classification of 8 different kinds of human-human interaction sequences (with segmentation problems). Currently, researchers in continuous sign language recognition assume that the training signs are already available and often those are manually selected from continuous sentences. It consumes a lot of human time and is tedious. We present an approach for automatically learning signs from multiple sentences by using a probabilistic framework to extract the parts of signs that are present in most of its occurrences, and are robust to variations produced by adjacent signs.We show results by learning 10 signs and 10 spoken words from 136 sign language sentences and 136 spoken sequences respectively.
Thesis:
Dissertation (Ph.D.)--University of South Florida, 2008.
Bibliography:
Includes bibliographical references.
System Details:
Mode of access: World Wide Web.
System Details:
System requirements: World Wide Web browser and PDF reader.
Statement of Responsibility:
by Sunita Nayak.
General Note:
Title from PDF of title page.
General Note:
Document formatted into pages; contains 85 pages.
General Note:
Includes vita.

Record Information

Source Institution:
University of South Florida Library
Holding Location:
University of South Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 002007397
oclc - 403813184
usfldc doi - E14-SFE0002362
usfldc handle - e14.2362
System ID:
SFS0026680:00001


This item is only available as the following downloads:


Full Text

PAGE 16

(b)ContinuousSentence`YOUCANBUYTHISFORHER'Figure1.2.Movementepenthesisinsignlanguagesentences.beroftimes.ThenumberoftimesICMisrunisdecidedbasedontheaveragelengthofallthesentences.ThemostfrequentlyoccurringsolutionfromalltheICMrunsisconsid-eredasthenalsolution.Wecalltheseextractedcommonpatternsfromthesentencesassignemes.TheycanbeusedforspottingorrecognitionofsignsincontinuoussignlanguagesentencesusingeitherHiddenMarkovModelsorDynamicTimewarping.Signlanguageexpertscanalsousetheextractedsetofframesforteachingorstudyingvariationsbetweeninstancesofsignsincontinuoussignlanguagesentences,orinautomatedsignlanguagetutoringsystems.Thesignemeextractionalgorithmcanalsobeappliedtoaudiodata.The6

PAGE 21

ApproachesusingdirectmeasuringdevicessuchasmagnetictrackersordataglovesFangetal.[75]HMM,Self-organizingmap,recurrentNN208CSL92.1Kadous[76]Instance-basedlearning95Auslan80Murakami&RecurrentNeuralNetworks10JSL96Taguchi[77]Wangetal.[5]HMMmodelsequentialsubunits5119CSL92.8Wu&Gao[78]DynamicGaussianMixtureModels274CSL97.4 Vision-basedapproachesAssan&Grobel[69]HMM,usedcolorgloves262SLN91.3Bauer&Kraiss[71]HMM-basedsubunits,usedcolorgloves100GSL92.5Cui&Weng[65]RecursivePCA,motion-based28ASL93.2handsegmentationHuang&Huang[66]3DHopeeldNNmodel-basedhandtracking15TSL96Matsuoetal.[79]Rule-based,usedglovesandstereo38JSL76-79Starneretal.[62]HMM,colorcamerasatangularviews40ASL92-98Tanibataetal.[80]HMM,usedonlycorrectlyextractedfaceandhands65JSL100Yangetal.[42]NN,correspondencebetween40ASL96.2consecutiveframesImagawaetal.[60]PCA+clustering,handtracking33JSL72-94Yangetal.[81]RelationalHistograms+PCA39ASL83Parashar[82]RelationalHistograms+PCA39ASL88-95Ourwork[83]RelationalHistograms+Probabilisticdistancepreservingembedding147ASL80.3 CSL-ChineseSignLanguage,SLN-SignLanguageofNetherlands,GSL-GermanSignLanguage,TSL-TaiwaneseSignLanguage,JSL-JapaneseSignLanguage,NN-NeuralNetworks 11

PAGE 29

BhattacharyalogaP1=21(a)P1=22(a)(Pj)1=2(Pj)1=2loghf1(P1);f2(P2)iloghQ1;QN+2idistance[35] Matusitaa(P1=21(a)P1=22(a))2(Pj)1=2(Pj)1=2hf1(P1);f1(P1)ihQ1;Q1idistance[36]2hf1(P1);f2(P2)i+2hQ1;QN+2i+hf2(P2);f2(P2)ihQN+2;QN+2i

PAGE 35

z1E(z1;z2t)and z2E(z1t+1;z2),havetobezero.Theseconditionstranslateto z1E(z1;z2t)= z1(Az1c1)T(Az1c1)+(zT2z1c12)2= z1(zT1ATAz12cT1Az1+cT1c1)+(zT1z2zT2z12zT2z1c12+c212)=2zT1ATA2cT1A+2zT1z2zT22zT2c12=0(3.12)ThisconditioncanberewritteninacompactformbyaugmentingAwithz2asB2T=ATz2andusingd1T=c1Tc12.BT2B2z1=BT2d1(3.13)orz1=BT2B21BT2d1=B2d1(3.14)25

PAGE 41

3anda2=2 3.Inaddition,weusethePCAontherelationaldistributions,followedbyaEuclideandistancemetricasanotherpossibledistancecomputationmethod.Dependingontheprobabilisticmeasure,eachtrainingsequenceisrepresentedbyeitherasinglesequenceofcoordinates(forMatusitaandBhattacharya)ortwosequencesofcoordi-nates(forChernoff,KLandSymmetricKL)inthelowdimensionalspace.Atestsequencesimilarlyresultsineitheroneortwosequencesofcoordinates,obtainedbyembeddingthetransformationsofrelationaldistributionscorrespondingtoframesinthetestsequenceontothelowdimensionalspace.Thisembeddingprocessneedsthechoiceofasetofembeddingpointstowhichwecomputethedistances.Togeneraterecognitionperformancestatistics,i.e.meanandstandarddeviationaccuracy,weconsider25possiblerandomchoicesofthissetforeachtestsequence.Wematchthetestandeachtrainembeddingusingdynamictime31

PAGE 42

Withoutdimensionality reduction reduction DistanceMeasure Accuracy% Time #ofdimensions Accuracy Time (Std.Dev.) (insec) (p) (in%) (insec) Matusita 74.6(0.5) 0.97 50 74.8 2.69 Bhattacharya 80.3(0.7) 0.61 50 81.0 1.30 Chernoff 75.5(1.1) 1.05 68 76.9 1.75 SymmetricKL 77.2(1.4) 2.33 96 78.2 2.60 KL 54.9(3.9) 1.98 96 61.2 2.25 Euclidean(PCA) 76.2 1.11 283 76.9 2.80 Table3.3.Recognitionperformanceonthetwo-handedtestgesturesequencesbasedon98testsequences. Withoutdimensionality reduction reduction DistanceMeasure Accuracy% Time #ofdimensions Accuracy Time (Std.Dev.) (insec) (p) (in%) (insec) Matusita 98(1.03) 3.11 40 99 8.92 Bhattacharya 95(0.79) 1.78 40 97 4.96 Chernoff 96(0.66) 2.88 49 96 6.51 SymmetricKL 94(1.27) 3.29 40 95 8.83 KL 64(9.34) 3.04 40 86 8.24 Euclidean(PCA) 96 3.61 246 96 10.11 warpingbasedonthedistancescomputedinthelow-dimensionalspaceandclassifyusingthenearestneighborclassier.Tables3.2,3.3and3.4listtherecognitionperformancesonthethreedatasets,forvar-iousdistancesmeasures,andwithandwithoutthelow-dimensionalembedding.Wecanmakeseveralobservations.32

PAGE 43

Withoutdimensionality reduction reduction DistanceMeasure #correct Time #ofdimensions #correct Time (insec) (p) (insec) Matusita 6(0) 8.42 55 6 17.66 Bhattacharya 6(0) 5.45 55 6 9.30 Chernoff 7.5(0.5) 8.55 64 7 13.63 SymmetricKL 6.6(0.5) 7.67 53 7 20.55 KL 4.8(0.5) 7.18 53 5 17.04 Euclidean(PCA) 6 8.78 310 6 21.95 1.Thedropinrecognitionrateswithusingalow-dimensionalembeddingissmall.Thisisevidentwhenwecomparerecognitionrateswithandwithoutanydimensionalityreduction.2.Thereisabout2to3timesspeedupinperformancewiththelow-dimensionalem-bedding.Thetimereportedinthetableandatotherplacesisthetimerequiredtoembedallrelationaldistributionsofatestsequenceintothecongurationspaceandtorecognizethetesttrajectorybymatchingittoallthetrainingtrajectoriesusingdynamictimewarping.Thetimeisaveragedoverallthetestsequences.AllthereportedtimeindicatetheCPUtimeona3GHzXeonworkstationwith2GBofmemory.3.Therepresentationalsoholdspromiseforrecognitionacrosspersons.Fromthetwo-handedgesturedataset,weused2sessionsofeachof7typesofgesturesperformedby4personsfortraining,and10sessionsofeachofthe7typesofgesturesperformedby3otherpersonsfortesting.Thepersonsusedfortestingwerenotusedfortraining.Altogether,weused56trainingsequencesand210testsequences.Weobtainedanaveragerecognitionaccuracyof73%withastandarddeviationof1:1%acrossthe2533

PAGE 46

(b)Figure3.2.Embeddingprobabilitydensityfunctions.(a)Three1Dprobabilitydensityfunctions,a,bandc,eachbeing50dimensional,areembeddedintoa6-dimensionalspace.TheSymmetricKLdistancebetweenthem,bothbeforeandafterthedimensionalityreductionareshownasimages.Itshouldbenotedthatallthepairwisedistancesarepreservedintheembedding.(b)showsanewprobabilitydensityfunction,d.Twotransformedfunctionsofd,f1(d)andf2(d)arecomputed,whichareembeddedontothe6-dimensionalspace.Itshouldbenotedthatthedistanceofdfromtheexistingfunctionsa,b,cinthelowdimensionalspaceareveryclosetothoseintheoriginalspacebeforeembedding.36

PAGE 47

(b)LOST (c)MISS (d)AGAIN (e)PROHIBITEDFigure3.3.FiveASLsigns(bestviewedincolor).Thecompletetestdatasetconsistsof147differentsigns.Fordisplaypurposes,intermediateframeshavebeenskippedforsomeofthesignsshownaboveandthedisplayedonesarereducedinsize.37

PAGE 48

(b)RotateFront (c)RotateBack (d)RotateUp (e)RotateDown (f)RotateRight (g)RotateLeftFigure3.4.Sevendifferenttypesoftwo-handedgestures(bestviewedincolor).Fordisplaypur-poses,intermediateframeshavebeenskippedandthedisplayedonesarereducedinsize.(Source:IDIAPResearchInstitute,Switzerland).38

PAGE 49

(b)Asits,BpullsA (c)ChickenDance (d)Walktogether (e)Walkawayfromeachother (f)Walktowardseachother (g)Onegetsupafterascrambleforlastseat (h)OnesavesotherFigure3.5.Eightdifferenttypesofhumaninteractionsequences(bestviewedincolor).Fordis-playpurposes,intermediateframeshavebeenskippedandthedisplayedonesarereducedinsize.(Source:CarnegieMellonUniversityGraphicsLab).39

PAGE 50

(b)Asits,BpullsA (c)ChickenDance (d)Walktogether (e)Walkawayfromeachother (f)Walktowardseachother (g)Onegetsupafterascrambleforlastseat (h)OnesavesotherFigure3.6.ExtractedcontoursforthesequencesinFigure3.5.Itshouldbenotedthatduetothepresenceofsimilarcoloredobjectsinthebackgroundtheblobsarenotperfectlysegmented.40

PAGE 51

(c)TimeFigure3.7.Effectofthenumberofreferencepointsusedduringembedding.(a)showstheaverageaccuracyobtainedover25testruns,(b)showsthestandarddeviationoftheaccuracy,and(c)showstheaveragetimetakentoembedandrecognizeagivenseriesofrelationaldistributions.41

PAGE 52

(b)NumberofdimensionsFigure3.8.Effectofthepercentageofenergyretainedwithvariousdistancemeasures.(a)showstheaverageaccuracyobtainedover25testruns,(b)showsthenumberofdimensionsforeachtypeofdistancemeasureforarangeofthepercentageofenergyretained.42

PAGE 54

(c)(d)Figure4.1.Exampleofrelationaldistributionsconstructedfortheedgeimagein(a)using(b)ex-haustivesamplingand(c)randomsampling.(d)showsthechangeinentropywithrandomsamplingiterations.respectivelyfortheedgeimagein(a).Fig.4.1(d)showsthechangeinentropywitheachiteration.Obviously,thereissomefallinthedelityofrepresentationwithsampling.Doesthislossimpactnalrecognitionrates?Forthis,westudiedtheimpactontherecognitionratesontheASLdatasetandthetimetoestimatetherelationaldistributionwithnumberofsamplingiterations.Wevariedmfrom1%to5%ofthetotalnumberofbinsinthehistogram(5151=2601inourexperiments),andkwasvariedfrom5to30iterationsatanintervalof5iterations.Usingexhaustivesampling,theaveragetimetakentocomputea44

PAGE 56

(b)Figure4.3.Illustrationofrobustnessofrelationaldistributionswithrespecttoparameters.(a)Edgesextractedwithdifferentparametersinaframefromasequence.(b)Bhattacharyadistanceoftherelationaldistributionofeachframeinasequencefromtherstframe.Eachplotcorrespondstoadifferentedgedetectionparameter.therelationaldistributionsaresparsematricesandallexistingefcientimplementationsforsparsematricescouldbeusedforthem.Thiswouldresultinfurthersavings.4.2RobustnesswithImperfectEdgesHere,wediscusshowrobustaretherelationaldistributionstotheedgedetectionparam-etersusedtondtheedges?Thisstudyofrobustnessoftheshapedescriptorsisimportantforthecontour-basedshaperecognitionalgorithms[107].InFig.4.3weshowanillus-trationofthestabilityoftherepresentationforamotionsequencerepresentingoneASLsentence.WecomputeBhattacharyadistancebetweentherelationaldistributionforeachframefromthatfortherstframeinthesequence.Werepeatthisplotforeachedgedetec-tionparameter,inthiscaseitistheCannyedgehighthreshold,whichcontrolstheamountofclutteratanygivenscale.DifferentplotsareobtainedbyvaryingtheCannyedgehighthresholdvaluefrom50 255to200 255inintervalsof5 255.Iftherepresentationisstabletheplotsshouldclusterwellandthevariationbetweenthecurvesateachframeshouldbelower46

PAGE 57

255forallthetrainsequencesandvarythethresholdvaluesforthetestsequences.Therecognitionaccuraciesobtainedusingthesamesetof2preferencepoints(3.4)were76%,77%,80%,78%,67%,64%,and60%forthethresholdvaluesof10 255;30 255;50 255;75 255;100 255;125 255,and150 255respectively.Thisshowsagracefuldegradationinaccuracyasthedifferencebetweenthethresholdvaluesusedfortrainandtestsequencesisincreased.4.3InterpolationInmanygesture-relatedapplications,itisdesirabletobeabletomatchtwogesturesequencesperformedatdifferentspeedsorcapturedatdifferentframerates.Insomecases,thesamplingratemightbeverysparseortheframesoftwomotionsequencesperformedatthesamespeedmightnotbealignedthesequencesmighthaveanoffsetbetweenthem.Dynamictimewarpingisacommonlyusedtechniqueinsuchsituations,butitsresultsaremoreaccurateifthetwosequencesbeingcomparedareatnearbytemporalscales.Inthissection,wedescribeanapproachtonormalizethegesturesequenceswithrespecttovaryingspeedsbyinterpolatingtheseriesofpointsinthecongurationspace.Forthis,weperformcubicsplineinterpolationinthecongurationspace.Interpolationbetweentwocongurationsletsusarriveatacontinuousrepresentationofthemotionintermsofchangeintheunderlyingstructure.Forfurthermatchingpurposes,themotionisthenindexedbyarclengthalongtheinterpolatedcurve.Thesplineinterpo-47

PAGE 59

(b)Figure4.4.Exampleofinterpolationincongurationspace.(a)Firstthreecoordinatescorrespond-ingtoimageframesfromtwosequences,eachrepresentingapartofamotionsequence.Alternateframesinthesequencearedroppedinonetosimulatedifferencesinspeed/framerate.Theorigi-nalimageframesareshownwiththeircorrespondingpoints.(b)Resampledsequencesaftercubicsplineinterpolationalongarclengthofeachcurvewithouttheknowledgeofexistingspeeddiffer-ences.49

PAGE 61

(b)TimeFigure4.5.Recognitionresultsonthetwo-handedgesturedatasetatvarioustemporalscales(xto5x).Experimentsincludesplineinterpolateddataalongthearclengthwithoutusinganytemporalscaleinformation,splineinterpolateddataalongtemporalaxisusingtemporalscaleinformation,andnon-interpolateddata.(a)showstherecognitionscores(b)showstheaveragetimetakentorecognizeasequenceofrelationaldistributions.(Thelegendin(a)appliesto(b)aswell.)51

PAGE 64

Figure5.2.Overviewofourapproachalongwithsamplesofintermediaterepresentations.EachofthensentencesisrepresentedasasequenceintheSpaceofRelationalDistributions,andthecom-monpatternisextractedusingIteratedConditionalModes(ICM).Theparameterseta1;w1;:::an;wnisinitializedusinguniformrandomsamplingandtheconditionaldensitycorrespondingtoeachsen-tenceisupdatedinasequentialmanner.(ai1;wi1;:::ain;win)denotestheparametersetattheendofithiterationofanICMrun.MultipleICMrunsaremade,eachwithadifferentstartingparametervectorandthemostfrequentlyoccurringsolutionrepresentsthesetofcommonpatterns,i.e.signemes.54

PAGE 65

Weformulatethesignemeextractionprobleminaprobabilisticframework.Table5.1denesthenotationsthatwouldbeusedinthischapter.Weformulatethesignemeextrac-tionproblemasndingthemostrecurringpatternamongasetofnsentencesf~S1;;~Sng,thathaveonecommonsignpresentinallthesentences.Thecommonalityconceptun-derlyingthedenitionofasignemecanbecastintermsofdistances.Let~swiairepresentasubstringfromthesequence~Siconsistingofthepointswithindicesai;a+wi1,andd(~x;~y)denotethedistancebetweentwosubstrings~xand~ybasedondynamictimewarp-ing.Wedenethesetofsignemestobethesetofsubstringsdenotedbyf~sw1a1;;~swnangthatismostlikelyamongallpossiblesubstringsfromthegivensetofsentences.Letq=fa1;w1;an;wngdenotetheparametersetrepresentingasetofsubstrings,andqm55

PAGE 80

(b)CANT (c)DEPART (d)FUTURE (e)MOVE (f)PASSPORT (g)SECURITYFigure5.9.Signemesextractedfromsentences70

PAGE 81

(i)TIME (j)TABLEFigure5.10.Signemesextractedfromsentences (b)CANT-PartiallyCorrect(Thelasttwoframesdonotbelongtothesign) (c)CANT-Badextraction(NopartoftheaboveextractedpatternbelongstothesignCANT)Figure5.11.Somepartiallycorrectandbadextractions71

PAGE 95

FigureA.1.Ascreenshotofthewebpageshowingthecommonpatternextractionresults.85


xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam Ka
controlfield tag 001 002007397
003 fts
005 20090619094824.0
006 m||||e|||d||||||||
007 cr mnu|||uuuuu
008 090619s2008 flu s 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0002362
035
(OCoLC)403813184
040
FHM
c FHM
049
FHMM
090
TK7885 (Online)
1 100
Nayak, Sunita.
0 245
Representation and learning for sign language recognition
h [electronic resource] /
by Sunita Nayak.
260
[Tampa, Fla] :
b University of South Florida,
2008.
500
Title from PDF of title page.
Document formatted into pages; contains 85 pages.
Includes vita.
502
Dissertation (Ph.D.)--University of South Florida, 2008.
504
Includes bibliographical references.
516
Text (Electronic dissertation) in PDF format.
520
ABSTRACT: While recognizing some kinds of human motion patterns requires detailed feature representation and tracking, many of them can be recognized using global features. The global configuration or structure of an object in a frame can be expressed as a probability density function constructed using relational attributes between low-level features, e.g. edge pixels that are extracted from the regions of interest. The probability density changes with motion, tracing a trajectory in the latent space of distributions, which we call the configuration space. These trajectories can then be used for recognition using standard techniques such as dynamic time warping.Can these frame-wise probability functions, which usually have high dimensionality, be embedded into a low-dimensional space so that we can still estimate various meaningful probabilistic distances in the new space? Given these trajectory-based representations, can one learn models of signs in an unsupervised manner? We address these two fundamental questions in this dissertation. Existing embedding approaches do not extend easily to preserve meaningful probabilistic distances between the samples. We present an embedding framework to preserve the probabilistic distances like Chernoff, Bhattacharya, Matusita, KL or symmetric-KL based on dot-products between points in this space. It results in computational savings.We experiment with the five different probabilistic distance measures and show the usefulness of the representation in three different contexts sign recognition of 147 different signs (with large number of possible classes), gesture recognition with 7 different gestures performed by 7 different persons (with person variations) and classification of 8 different kinds of human-human interaction sequences (with segmentation problems). Currently, researchers in continuous sign language recognition assume that the training signs are already available and often those are manually selected from continuous sentences. It consumes a lot of human time and is tedious. We present an approach for automatically learning signs from multiple sentences by using a probabilistic framework to extract the parts of signs that are present in most of its occurrences, and are robust to variations produced by adjacent signs.We show results by learning 10 signs and 10 spoken words from 136 sign language sentences and 136 spoken sequences respectively.
538
Mode of access: World Wide Web.
System requirements: World Wide Web browser and PDF reader.
590
Advisor: Sudeep Sarkar, Ph.D.
653
American Sign Language
Gestures
Human-human interaction
Probabilistic distances
Motion
690
Dissertations, Academic
z USF
x Computer Science and Engineering
Doctoral.
773
t USF Electronic Theses and Dissertations.
4 856
u http://digital.lib.usf.edu/?e14.2362