|USFDC Home | USF Electronic Theses and Dissertations||| RSS|
This item is only available as the following downloads:
Micro e xpression Spottin g in Video U sing Optical Strain by Sridhar Godavarthy A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Department of Computer Science and Engineering College of Engineering University of South Florida Major Professor: Dmitry B. Goldgof, Ph.D. Sudeep Sarkar, Ph.D. Rangachar Kasturi, Ph.D. Date of Approval: Ju ly 1, 2010 Keywords: Computer vision, FACS, Opt ical flow, Facial deformation, D atasets Copyright 2010, Sridhar Godavarthy
DEDICATION To the Omnipotent, Omnipresent and Omniscient one... What are we but for him ?
ACKNOWLEDGEMENTS First and foremost: God for allowing me to tread this path. My parents Subrahmanya Sarma and Raja Rajeswari brother Dr. Srinivasa Sastry and sister in law Sarada but for whose support, trust and perseverance, I would have been just another baccalaureate. A special thanks to Dr. Dmitry Goldgof for accepting me as his student I have never worked under a more understanding professor. My sincere gratitude goes to Dr. Rangachar Kasturi the first person from USF I spoke to if not for his encouraging words, I might have missed this golden opportunity of working with so many intelligent minds. I am also grateful to Dr. Sudeep Sarkar for his timely suggestions and ideas. I am greatly indebted to all three of them each of them a brilliant mind overflowing with ideas, of which, I was able to glean a few for their guidance and motivation. Dr. Vasant Manohar whose spark started this whole work, Matthew Shreve for sharing his knowledge, his technical support and guidance Joshua Candamo my inspiration. I must thank my supervisor Daniel Prieto, an amazing person for understanding my constraints and allowing me to work flexibly. Parameswaran Krishnan and Ramya Parameswaran both unrelated to each other my prime movers and pillars of support, who will never w illingly accept my thanks. Ravikiran Krishnan for being my constant companion, those midnight trips to Starbucks and brainstorming sessions. Lucy Puentes for her pep talks and motivation in my times of need. Ekta Karia for her delightful company and more importantly, sticking by me despite all the pestering I subjected her to. Pavithra Bonam for her confidence in me and all those delightful dinners. Ravi Panchumarthy, a gem of a person for his support and company. Valentine Gnana Selvi for being that fr iend who blindly believes in you Last but by no means the least Aruna Pandian for showing me what life is all about.
There are several others who helped me to get to where I am. I am thankful to every one of them. Portions of the research in this paper use the Canal 9 Political Debates Database made available by A. Vinciarelli at the Idiap Research Institute, Switzerland.
i TABLE OF CONTENTS LIST OF TABLES iii LIST OF FIGURES iv ABSTRACT v CHAPTER 1 INTRODUCTION 1 1.1 Microexpression 1 1.2 Applications of Automatic Microexpression Detection 2 1.3 FACS 3 1.4 Expression Seg mentation 4 1.5 Aim of This Research 5 1.6 Proposed Method 5 1.7 Thesis Org anization 6 CHAPTER 2 PREVIOUS WORK 7 2.1 Microexpressions 7 2.2 Automatic Facial Expression Recognition Systems 8 2.2.1 Face Detection 9 2.2.2 Fe ature Extraction 10 2.2.3 Expression Analysis 10 2.3 Datasets 11 CHAPTER 3 BACKGROUND 14 3.1 Optical Flow 14 3.2 Stress and Strain 16 3.3 Strain Magnitude Characteristics 17 3.4 Computation of Strain 20 3.5 The OpenCV H aar Classifier 21 CHAPTER 4 ALGORITHM AND IMPLEMENTATION 24 4.1 Overview 24 4.2 Face Alignment 26 4.3 Optical Fl ow 28 4.4 Computation of Optical Strain 28 4.5 Split into Regions 33 4.6 Thresh olding 34 CHAPTER 5 DATASETS 37 5.1 USF Dataset 37 5.1.1 Feigned Dataset 37 5.1.2 Live Questioning Dataset 37 5.2 Canal 9 Dataset 38 5.3 Found Videos 39 CHAPTER 6 EXPERIMENTAL RESULTS 41 6. 1 Determination of Threshold 41 6.2 Results with Only Macroexpressions 43
ii 6.3 Microexpression Spotting Results 44 CHAPTER 7 CONCLUSION AND FUTURE WORK 48 7.1 Conclusions 48 7.2 Future Work 49 REFERENCES 50
iii LIST OF TABLES Table 1.1 Comparison of various facial expression coding schemes 4 Table 2.1 Select automatic facial expression recognition systems in literature 13 Table 4.1 List of parameters used 24 Table 5.1 Characteristics of datasets used for evaluation 40 Table 6.1 Results of thresholding on the training set with 22 microe xpressions 42 Table 6.2 Microexpression spotting results 44
iv LIST OF FIGURES Figure 1.1 Sample microexpressions 2 Figure 3.1 Errors in optical flow caused by change in illumination 15 Figure 3.2 Errors in optical flow due to rotation of the object 15 Figure 3.3 Types of motion in an object when acted upon by an external force 16 Figure 3.4 Sample frames from the stability test video 18 Figure 3.5 Optical flow and optical strain with out of plane movement 19 Figure 3.6 Representations of some Haar like features from the OpenCV library 22 Figure 4.1 Flowchart of proposed algorithm 25 Figure 4.2 Illustration of the face alignment process 27 Figure 4.3 Dense optical flow on two frames of the Alex Rodriguez video 29 Figure 4.4 Computed dense optical flow on two frames of the USF dataset 30 Figure 4.5 Strain maps for the optical flow computed in Figure 4.3 31 Figure 4.6 Strain map for the optical flow computed in Figure 4.4 32 Figure 4.7 Split of face into separate regions 33 Figure 4.8 Algorithm for thresholding strain magnitudes 35 Figure 4.9 Example strain thresholding 36 Figure 5.1 Canal 9 dataset 39 Figure 5.2 Found video dataset 40 Figu re 6.1 ROC curve used for threshold determination 41 Figure 6.2 Typical strain pattern for a microexpression 42 Figure 6.3 Strain map and thresholding with only macroexpressions 44 Figure 6.4 Strain plots for select USF sequences 45 Figure 6.5 Microexpression spotting results 46
v Micr oe xpression Spotting in Video U sing Optical Strain Sridhar Godavarthy ABSTRACT Microexpression detection plays a vital role in applications such as lie detection and psychological consultations. Current research is progressing in the direction of automating microexpression recognition by aiming at classifying the microexpressions in terms of FACS Action Units. Although high detection rates are being achieved, the datasets used for evaluation of these systems are highly restricted. They are limited in size usually still pictures or extremely short videos; motion constrained ; contain ing only a single microexpression and do not contain negative cases where microexpressions are absent Only a f ew of these systems run in real time and even fewer have been tested on real life videos. T his work prop oses a novel method for automated spotti ng of facial microexpressions as a preprocessing step to existing microexpression recognition systems By identifying and rejecting sequences that do not contain microexpressions, longer sequences can be converted into shorter, constrained relevant sequen ces which comprise of only single microexpressions, which can then be passed as input to existing systems improving their performance and efficiency. This method utilizes the small temporal extent of m icroexpressions for their identification The extent i s determined by the period for which strain due to the non rigid motion caused during facial movement, is impacted on the facial skin regions and facial strain is calculated for each of these regions. The strain pa tterns in individual regions are used to identify subtle changes which fa cilitate the detection of micro expressions. The strain magnitude is calculated using the central difference method over the robust and dense optical flow field of each subject s face. The computed strain is then thresholded using a variable
vi threshold If the duration for which the strain is above the threshold corresponds to the duration of a microexpression, detection is reported. The datasets used for algorithm evaluation are comprised of a mix of natural and enacted microexpressions. The results were promising with up to 80% true detection rate Increased false positive spots in the Canal 9 dataset can be attributed to talking by the subjects causing fine movements in the mout h region. Performing speech detection to identify sequences where the subject is talking and excluding the mouth region during those periods could help reduce the number of false positives.
1 CHAPTER 1 INTRODUCTION Decision making is an integral part of our day to day life. Some of the decisions we make are based on discernible facts and are well thought out, whilst others are made instantaneously considering only the immediately available evidence In the latter scenario, the human mind does not consciously weigh the various contributing factors; instead it just recognizes that there is decisions we make are rel ated to humans and are made when in direct visual contact with them most often when listening to them. When an individual is lying, trying to deceive or attempting to hide ones natural emotions, his or her body exhibits subtle anomalies that can be perceived and used by our mind s in the decision making  but are not limited to a partial scorn, a shrug of the shoulder, a raised eyebrow etc. These anomalies are so consistent and so full of information that researchers in psychology have even christened them microexpressions 1.1 Microexpressi o n A microexpression can be described as a n involuntary pattern of the human body that is significant enough to be observab le, but is insufficient ly brief to convey an emotion  They are very subtle and are easy to miss for the untrained eye. A typical microexpression lasts anywhere between th to th of a second  as opposed to a regular expression that can last for several seconds and hence the term micro expression Microexpressions are not restricted to the face and can occur in any part of the body Microexpressions can be classifie d, based on the type of modification to an emotional expression, into three types  as follows: i. Simulated Expressions: When a microexpression is not accompanied by a genuine expression
2 ii. Neutralized expressions: W hen a genuine expression is suppressed and the face remains neutral iii. Masked Expressions: When a genuine expression is completely masked by a falsified expression Of the above, type 1 is the easiest to detect as they are straightforward and do not carry a ny other interference. An example of a type 1 microexpression would be a half scorn which would be indicative of contempt ( Figur e 1 1 a) Type 3 expressions are the ha rdest to detect as the microexpression can be easily missed in the presence of a larger and longer expression. (a) (b) Figur e 1 1 Sample m icroexpressions a) A partial, clearly visible scorn lasting about 0.06 seconds; b) A microexpression of thrill duping delight visible along with a partial smile M icroexpressions can span across the entire human body, but in this work we focus only upon the microexpressions exhibited in the facial region also known as Facial Microexpressions 1.2 Applications of Automatic Microexpression Detection The most prominent application of automatic microexpression detection is in security as a lie detecting aid. Although microexpression detection as a lie detection technology is not yet a reality, it is finding use in a variety of other fields. Researchers are working on using microexpressions to detect pain  in patients especially autistic and anaesthetized ones and for measuring nerve damage in facial regions Social signal processing research is progressing towards using microexpressions in detecting boredom  which may be used by commercial market researchers to measure the impact of their television commercials  Research has also begun on using automated microexpression detection in psychologic al counseling such as
3 marriage counseling and others. Another intersting application is in classrooms to determine the effectiveness of the lecture from the face of students. 1.3 FACS As opposed to the expressions for the six standard emotions (anger, disgust, fear, happiness, sadness and surprise) that are universally recognizable and can be irrefutably associated with their corresponding emotions, microexpressions are harder to discern and even harder to decipher. This can be attributed to the fact th at microexpressions are not mere visible cues, but can contain much more information, which if properly analyzed could even describe the mood or the intent ions of the individual exhibiting it  The human min d is a very subjective observer of expressions whereas, to derive maximum information from a microexpression, we need a system of quantitative representation and objective measurement. Several such systems of measurement exist  viz. Facial affect scoring technique (FAST), Facial action coding system (FACS), Maximally discriminative facial movement coding system (MAX) Facial electrom yography (EMG) and Emotion FACS ( EMFACS). A comparison of these various coding schemes is given in Table 1 1 For a detailed description of the terminology involved, the reader is referred to  FACS i s the only coding system that is observati onal and anatomical and is hence capable of representing and thereby measuring any visible facial movement. Because of this, FACS is one of the most extensively used coding systems today The FACS system represent s each movement of the human face in terms of an Action Unit (AU). The FACS system comprises of 24 unique Action Units defined in terms of muscular movements and 1 9 Action Units defined in terms of actions performed. Every action performed by the human fac e is expressed as a combination of AUs. For example, a genuine smile is always a combination of AU 12 (Lip Corner Puller) + AU 6 (Cheek Raiser). The AU combinations for known emotional expressions have already been defined. Any deviation from these implies and not natural. This is the primary principle behind any method used in the recognition of microexpressions be it manual or automated. In addition to the standard emotions, several AU
4 combinations for deceptiv e behavior have also been determined, simplifying the detection of some frequently occurring microexpressions. Table 1 1 Comparison of various facial expression coding schemes Name Measurement Method Derivatio n Type Measurement Content Advantages Disadvantages FAST Observational Theoretical Selective One of the first methods to be developed Used only for negative emotions. FACS Observational Anatomical Comprehensive Covers all muscles Allows for discovery of new configurati ons MAX Observational Theoretical Selective Faster performance Codes only pre defined configurations. EMG Obtrusive Anatomical Comprehensive Able to detect Muscular activity invisible to naked eye Self Consciousne ss Interference from nearby muscles EMFACS Observational Theoretical Selective Selective version of FACS. Faster performance. Can only be applied to certain emotion expressions. 1.4 Expression Segmentation Although, there has been a lot of research in th e area of spontaneous expression recognition and a lot of advances have been made towards clearer definition and recognition of microexpressions, most of the research has been in the field of psychology. All the deciphering and FACS coding was being done manually by experts whi le observing video recordings. Despite the innumerable possible benefits, efforts on automation of expression recognition have only recently started gaining momentum. Expression segmentation is the first step in any expression recognition system and is vi tal for many applications that analyze temporal facial changes, including micro and macro expression recognition, genuine and feigned emotion detection, animation and a rtificial
5 expression generation among others. A number of problems such as movements of the head translation and out of plane rotations, uneven lighting conditions etc., pos e challenges to expression segmentation Solutions to these problems have been presented for specific scenarios such as extremely short sequences, restricted movement, co nstant lighting conditions etc., but a panacea is yet to be discovered  F urther only a few of the algorithms developed are in real time and most of them involve preprocessing and training. Present algorithms take far too long for the recognition of A ction U nit s even in clips that are just a couple of seconds long and have not been tested on longer sequences that may or may not contain microexpressions. 1.5 Aim of This Research The aim of this research is to design a preprocessing system that is capable of spotting microexpress ions by filtering out non micro expressions and thereby help existing recognition algorithms perform better on video sequences of longer durations. It is unrealistic to assume that e will remain still during a sequence and the system should be able to handle small 1.6 Proposed Method A two step app roach to segment temporal micro expressions from face videos is being proposed. First, facial strain maps are calculated based on the facial deformation observed in video sequences using the methods described in [11, 12] Next, the strain magnitude is calculated and used to segment micro expressions fro m the video. This approach has the advantage of eliminating motion vectors due to in plane head movement, thus making the method more stable in real world scenarios. One way to analyze the observed facial tissue deformation is to calculate optical strain patterns under the application of a force, a condition that naturally happens during facial expressions  The reasoning behind this approach is threefold: i. The strain pattern can easily be used in conjunction with existing face recognition methods
6 ii. The strain pattern is related to the biomechanical properties of facial tissues, and iii. As has been demonstrated empirically in  the strain pattern is more stable compared to other methods that use only single frames, under lighting variations. 1.7 Thesis Organization Chapter 2 describes the work so far on automate d facial expression recognition and also the constraints and advantages of each method. Chapter 3 gives a brief introduction to the concepts involved in the proposed method. The proposed algorithm, the assumptions made, the reasoning behind the assumptions and the actual implementation are described in detail in Chapter 4 The datasets we have used and the characteristics of the videos in our dataset are visited in Chapter 5 Chapter 6 discusses the results of our approach and finally Chapter 7 gives a brief insight into the road ahead.
7 CHAPTER 2 PREVIOUS WORK 2.1 Microexpressions Microexpressions are exhibited when people try to control a natural expression and might have been in pl ace even before the beginning of verbal communication. Humans were able to identify the meaning behind these, most often subconsciously. Although the discovery of microexpressions and the first published work on microexpressions is attributed to Haggard an d Isaacs  Ekman et al. in their work  say that Darwin in his epic work  had inadvertently mentioned the features of a microexpression and their characteristic involuntary nature. Research on microexpressions has progressed in two major directions: i. Research in psychology is progressing towards detecting the reasoning behind occurrence of microexpressions and the implications of an exhibited microexpression. ii. Research in Computer Vision and Pattern Recognition is progressing towards automated expression recognition and its applications. Dr. Paul Ekman is widely considered as th e father of microexpressions and his inspiration to work on microexpressions could well have started with an interest in face reading  Since then, Dr. Ekman and his collaborator Dr. Wallace have published se veral works on expressions discussing about their occurrences, their implications and their meaning in a cross cultural background [1, 17 19] Studies have a lso shown that although the emotions may be the same, the expression of emotion is hereditary  and depends on the bringing up. Hence some expressions may vary based on geographic location Research has also shown t hat the extent of microexpressions or even the presence or absence of microexpressions is highly dependent on the consequences involved  For example, a subject who has been caught lying to a parent is less likely t o exhibit discernable microexpressions when compared to another subject who may
8 be involved in illegal activities. Training and the frequency of deceit also affects the expression of microexpressions A trained spy is less likely to be caught exhibiting m icroexpressions than an amateur. The prime characteristic of a microexpression is the fact that it is involuntary and cannot be forced. Hence, collection of datasets for microexpressions is a very complex task. Informed consent will nullify the actual pur pose and even if videos were to be recorded first and them consent requested, a s microexpressions usually occur at situations that could be embarrassing to the subject, getting consent for the usage of that data is a hard task. Several psychologists use da ta collected from their counseling sessions and hence are unable to share datasets for automation of expression recognition by engineers. 2.2 Automatic Facial Expression Recognition Systems With the increasing requirement for faster, easier and more intuitiv e interfaces between humans and computers being fueled by the ever improving computer field, the automatic analysis of the human face is being looked up to provide a solution. Analysis of human facial expressions has been around for a long period albeit ma nually. Earlier attem pts at developing an automatic facial expression recognition s ystem were not greatly successful because of the unavailability of sufficient technology and expertise in various supporting areas such as image processing, c omputer vision, pattern analysis etc., and also the lack of powerful hardware that was capable of handling the data. According to  the entire process of automatic facial expression recognition can be divided into three main phases as described below: i. Detection of face in the image ii. Extraction of the required features from the face detected in (i) above iii. Analysis of the extracted features
9 2.2.1 Face Detection Stan Z. Li et al. in  give a detai led insight into the various methods on face detection. Although, the book deals with face detection in the context of face recognition, the methods used are the same for facial expression recognition too. Face detection comes with its own challenges int ernal and external. Internal factors are those that are inherent in the face such as facial hair, expressions, texture etc., while external factors are those that are not under control of the subject such as illumination, occlusions, camera quality, pose and rotation of the face  Several approaches exist for face detection: Knowl edge based methods that emulate the human method of face detection by converting the properties that identify a face into rules which are then used for detection These methods are further classified into top down and bottom up approaches. Several works ha ve been published based on this kind of method using varying rules Guangzheng and Thomas propose one such hierarchical rule based classifier in  Feature invariant methods are those which work on the principl e that there are several features in the human face that are invariant to outside factors which are used by humans for detection. Yow and Cipolla in [25 27] use multiple faci al features, while [28, 29] use facial texture Some works use various color representations for determining skin models. Template matching is another approach for face detection where a predefined f ace model is present and the correlation between the template model and the presented image is calculated and is used to detect the location of the face. Several works have been published based on this method, but it has been shown that template based meth ods do not produce the best results  In appearance based methods, there is usually a classifier such as a neural network or a SVM which is trained on some input images and is then used to classify the test im ages. These methods vary in the feature selection for use in the classifier and may even include features from the knowledge based and feature invariant methods. The Viola Jones face detector  arguably the fastes t and most used face detector  belongs to this category of detectors.
10 2.2.2 Feature Extraction In order to automate the process of feature extraction, a model of the face is developed and the available visual informatio n from the detected face is correlated to the developed model. Several models are in existence today. The most prominent ones among them are, the point based model where certain critical points on the face are identified and the distances from each other a re used to identify the expression on the face  and 3D mesh mapping of the whole face [32, 33] There are also approaches which use a combination of both these approaches. A comprehensive and detailed discussion of this topic is provided in  Most of the challenges that were presented in the face detection step are obstacles in this step too and care should be taken to overcome those challenges without altering the available information. 2.2.3 Expression Analysis Once the expressions have been extracted, the next process is to interpret the expressions. There ar e two paths that recent works follow one is to classify the expression into an emotion based on the prior definition of the expression. However, this is not as easy as it sounds, because the six basic emotions described by Dr. Ekman in  are not objectively defined, leaving the interpretation to the individual. Another matter of concern is that these six emotions are by no means all encompassing. Several other expressions such as pain, boredom etc., have also been described in recent works [4, 5, 35 37] An attempt to differentiate between pain and crying in newborns is made in  The other approach is to code the expression in terms of Action Units as defined in the Facial Action Coding System (FACS)  This is an objective measurement and the resulti ng A ction U nit s can then be interpreted or classified into emotions or non emotions. Most of the recent works attempt to automatically code the expression in terms of A ction U nit s [4, 40 43] Essa and Pentland  was one of the initial a utomatic facial expression recognition systems that used o pt ic al f low with a very high accuracy rate of 98%. The only concern is that the method required absence of rigid body movements.
11 2.3 D atasets Selection of dataset is an important criterion in the evaluation of any algorithm. Initially, due to lack of availability of suitable datasets, most algorithms u sed facial recognition datasets despite being ill suited for the task  As research progressed, the algorithms became too sophisticated to provide meaningful results when tested on these generic datasets. Res earchers soon came up with their own datasets to suit their testing scenarios. The Cohn Kanade  and the JAFFE  are some of the initial datasets that were generated specificall y for facial expression analysis. Both these datasets comprise of static images of subjects. While all the expressions in the JAFFE dataset are enacted, the Cohn Kanade dataset comprises of both enacted and natural expressions that have been FACS coded. Th e HID  and the CMU PIE  are other static image datasets that have been used in literature. The MMI dataset  is among the most recent datasets for expression evaluation and comprises of FACS coded static image and video sequences of prolonged enacted expressions. According to  at least 50 examp les are needed for training an a utomatic f acial e xp ression r ecognition s ystem ( AFERS) even to achieve moderate accuracy levels of 80%. However, most of the above discussed datasets do not have enough samples of expressions to meet these requirements. Further, research studies have also shown that enacted expressions are significantly different from natural, spontaneously emitted expressions not only because of the fact that they are generated by different parts of the brain, but also because enacted expressions differ from spontaneous ones in the extent of deformation and duration  Most of the datasets discussed above are enacted sequences and would not produce accurate evaluations of AFERS. Further, systems that are trained on synthetic expressions would not fa re well when used for detecting natural expressions. The final hurdle in using existing datasets for evaluation of the current algorithm is the availability of spontaneous microexpressions. The Rochester/UCSD Facial Action Coding System Database 1 (RUFACS1 )  has been claimed to consist of several video sequences each comprising several microexpressions, that have been FACS coded. The videos are recordings of
12 subjects attempting to lie for monetary rewards. They a re unaware that their expressions are the primary focus. Although this dataset seems very promising, it has not yet been made publicly available.
13 Table 2 1 Select automatic facial expression recognition systems in literature Work Dataset Face Detection Features Used Expression Analysis Accuracy Name Type # Sub j /Seq Technique Method Pantic and Patras  MMI Image s equences Posed 19 /740 Manual 15 facial points fast direct chaining inference procedure Action U nits 87% Bartlett et al.  Cohn Kanade Videos Posed 100/313 Enhanced Viola Jones method Holistic face SVM Action U nits 94.8% Rothwell, Bandar et al.  Self Image se quences Spontaneous 39/3 9 Ar tificial n eural n etworks Holistic face and shoulders Artificial neural n etworks Emotion 80% ; Real Time Pantic and Rothkrantz  Self Still i mages(front and profile) Posed */40 Skin color based segmentation Fiducial points on face fast direct chaining inference procedure Action U nits 86% Cohen et al.  Cohn Kanade Image s equences Posed 100/313 Piecewise Bezier volume d eformation  Head motion and local deformation HMM Action U nits Emotion 90% Tsalakanidou and Malassiotis  Self 3D image sequences 52/832 Self using active shape m odels Geometric distances from 81 landmarks & s urface deformation Rule b ased Action U nits Emotion 85.5% Tian et al.  Cohn Kanade and Ekman Hager Image s equences Posed 210/1917 Neural n etwork based Hybrid Neural n etworks Action U nits 93.3% Essa and Pentland  Self Image sequences 8/52 View based and modular e igenspace Holistic face Optical f low Action U nits + temporal 98% Not applicable / Not provided
14 CHAPTER 3 BACKGROUND 3.1 Optical F low Optical f  The motion in question may be caused due to the actual motion of the object, change in camera position or a change in illumination tha t mi mics an object motion. Optical f low is a popular motion estimation technique that represents temporal changes in an image using multiple vectors. Each vector begins at the location of a pixel in the initial image and ends at the pixel location in the time shifted image. Optical flow utilizes spatial and temporal changes of intensity to find a matching pixel in the target image  Because this method depends heavily on intensities, it is highly sensitive to cha nge s in lighting and any such changes would be perceived as actual movement. Hence o ptical f low only works under the assumption of constant lighting. Optical f low can be determined by several techniques. One well known technique is the gradient based techn ique. In this approach, t he o ptical f low equation can be expressed as: ( 3 1 ) w here, I(x,y,t) is the intensity of a pixel at location(x, y) at time t. d x is the displacement of the pixel along the x direction and d y is the displacement of the pixel along the y direction.
15 A direct consequence of the above equation is that flow vectors cannot be determined in regions of constant brightness. (a) (b) (c) Figure 3 1 Errors in o ptical f low caused by change in illumination a) Original Image b) Same image with one source of illumination changed c) the generated o ptical f low between a) and b). Note how there is a huge optic flow due to the apparent motion caused by the change in illumination, even though there is no motion between the two figures. (a) (b) (c) Figure 3 2 Errors in o ptical f low due to rotation of the object a) Original Image b) The object in a) rotated by 180 c) the generated o ptical f low between a) and b). Note how the only optical flow is corresponding to the center because of the lettering, whereas there should have been a rotational flow over the entire object. Again, several implementations exist for the gradient based technique. A good insight and a comparison of some of these methods is given in  Some of the methods discussed are: Horn Schunck method  Lucas Kanade method  and Nagel method  After experimentation wit h the various methods available, we decided upon Robust o ptical f low  for its reliable and consistent results.
16 3.2 Stress and Strain To completely define a system in which a force acts on a body, we need to define two components: The force acting on the body and the changes undergone by the object while experiencing the force. The force acting on the object could be external or internal. Stress is defined as the force per unit area and is made u p of normal and shear components. Normal stress acts in the normal direction to the plane of the object and causes a change in volume. Shear stress acts in the plane of the object in both directions and it tends to deform the object being acted upon. The changes undergone by a system are of two types rigid body motion and deformation Rigid body motion is one in which the body is either translated or rotated without any change in the shape of the object i.e. the distance between any two points on the surface of the object remains the same before and after the application of the force. Deformation is when the there is a change in the shape of the object i.e. the separation between at least two points on the surface of the object is different after the a pplication of the force. Figure 3 3 Types of motio n in an object when acted upon by an external force The original object is shown i n dashed lines in the background. a) Original position b) Translation c) Rotation d) Deformation (a) (b) (c) (d)
17 The relative amount of deformation of the material is known as the strain of the material. Strain is a property of the material and is independent of the sh ape of the object If the amount of displacement of a point on the material is know, the strain can be computed as: ( 3 2 ) w here, is the original length and is the deformed length If is the displacement vector that describes a deformed object, the an infinitesimal strain tensor can be defined as ( 3 3 ) which can be expanded to: ( 3 4 ) w here ) are normal strain components and ) are shear strain components. 3.3 Strain Magnitude Characteristics As was discussed in Section 3.1 u sing o ptical s train to determine deformation results in a more stable system compared to other methods that use static images. T o verify this effectiveness of o ptical s train over optical fl ow, videos were shot with subjects moving back and forth while enacting an expression. These videos are not a part of the dataset and were used only for verifying the effectiveness of Strain over optical flow. Figure 3 4 shows a s ample of these sequence s and Figure 3 5 shows the resulting optical flow and o ptical s train. While an i nspection of the optical flow plots does not reveal much information about the start or end of the expressions, a simple visual examination of the o ptical s train plot clearly shows the peaks
18 corresponding to the three expressions. By selecting the appropri ate threshold, we could easily detect the beginning, peak and end of each expression. (a) (b) (c) (d) Figure 3 4 Sample frames from the stability test video a) Neutral frame. b) Face comes forward while performing an expression. c) Face goes back to neutral position d) Face remains in the neutral position, but becomes expressive.
19 (a) (b) Figure 3 5 Opt ical flow and optical s train with out of plane movement a) The optical flow is highly erratic and inconclusive. b) The optical strain shows peaks where the expression occurs and is more readily interpretable.
20 3.4 Computation of Strain The strain describ ed in Section 3.2 varies depending on the material being deformed. The human skin is an elastic material an elastic material being one which returns to its origi nal state once the external force has been removed. An important property of an elastic material is its modulus of elasticity ( 3 5 ) Once stress is known, we can calculate strain by solving the inverse problem. Solving inverse problems is computationally expensive owing to the fact that inverse problems are ill posed and highly non linear  Anot her approach is to estimate strain from (3.3). Although, this approach, in its nascent state, involves computing higher order derivatives of the displacement vector, we can simplify it by approximating it to the 1 st order derivatives. If is the displace ment vector obtained from the optical flow as described in (3.1) ( 3 6 ) w here is the elapsed time between two image frames For a constant frame interval, is a constant and hence we can obtain the partial derivatives for (3.6) as ( 3 7 ) ( 3 8 ) The displacement derivatives are computed by finding the slope of the tangent of the displacement function using the Finite Difference Approximation as:
21 ( 3 9 ) ( 3 10 ) where ( are preset distances of 2 3 pixels. The magnitude of the strain is given by: ( 3 11 ) 3.5 The OpenCV Haar Classifier The OpenCV Haar classifier  is an efficient implementation of the Viola Jones boosted cascade classifier  It is a supervised classifier, capable of detecting a specific class of objects, based on an expressive over complete set of basis funct ions. The features used are thresholds applied to sums and differences of rectangular image regions of varying sizes. A basis set is complete, if there is no linear dependence between the basis features and has the same number of elements as the image spac e  If the number of elements is more than the image space, the basis set is over complete. In this case, the number of elements is much greater than the image space. OpenCV uses the integral image representation [30, 65] which is an intermediate image representation that assists in the faster computation of the rectangular features. The value at a location i, j of the integral image is equal to the sum of all t he pixels above and to the left of i, j inclusive such that ( 3 12 ) Using the integral image helps speed up the computation by reducing the number of references needed to calculate the sums and differences of pixels within the rectangular region.
22 Further, the calculation of the integral image is a onetime process and can b e computed with one pass over the image. Figure 3 6 Representations of some Haar like features fr om the OpenCV library Although the adopted approach provides for fast computation of the features, the total number of rectangular features for the whole image is prohibitively large. To reduce the feature size and to speed up the process, the AdaBoost learning algorithm  is used to select a small subset of features that best separates the positive and negative samples. These classifiers are known as weak classifiers as they have a true positive selection rate of about 99.9% with a false positive rate of about 50%. These weak classifiers are organized in the form of a rejection cascade wherein a series of classifiers are applied to sub region s of varying size in the image. Only regions that make it through the entire cascade are considered as detections. Although the false positive rate for a single classifier, is higher than normal, the overall detection rate after the cascade of classifiers is about 98% with a false positive rate of 0.0001% The nodes are arr anged such that the initial classifier is the least complex with increasing complexity along the cascade.
23 This helps to further reduce the computation as the number of probable regions that is forwarded to the more complex classifiers reduce s at each node level.
24 CHAPTER 4 ALGORITHM AND IMPLEMENTATION 4.1 Overview The primary input to the system is a video sequence of the face of the subject, which need not necessarily contain microexpressions. Constant illumination for the duration of the sequence, is assumed. The system is capable of handling small rotational and translational movements of the face, but out of plane rotation must be avoided. The system expects only one face at a time but can be modified to handle more than one face, by adjusting t he face alignment step. The input video sequence is split into frames. The initial frame is assumed to be a neutral expression and all deformations are measured with respect to this image. The algorithm comprises of the following steps: i. Face alignment and registration. ii. Optical f low computation between the neutral face and all other image s iii. Computation of optical strain from the displacements obtained in step ii. Above. iv. Split face into regions based on face visibility. v. Compute and normalize strain for each r egion. vi. Threshold based on magnitude, duration and sp at ial locality of the strain pattern. Tabl e 4 1 provides a list of parameters used and Figure 4 1 shows the flowchart of the proposed algorithm. Tabl e 4 1 List of parameters used Sl. No. Parameter Implemented as Automated / Manual 1. Threshold % local peak strain magnitude Automated 2. Neutral Frame 1 st frame in sequence Manual 3. Maximum number of frames 9 Manual 4. Minimum number of frames 2 Manual
25 Figure 4 1 Flowchart of proposed algorithm
26 4.2 Face Alignment The images in the input sequence are usually not aligned because of movement of the face It is essential that the faces be aligned to ensure that the computed flow an d strain are true values from the deformation and not caused by rigid body movement. We use the Viola Jones face detector  to detect and crop out faces from the whole frames. This offers the multiple advantages of faster computation and removal of translational motion. If the input comprises of more than one face or other objects that may prove a hindrance, they ma y be handled in this step. All ou r videos comprise of only a single full frontal close up of the face and hence, this has not been incorporated in the implementation. Next, to remove rotation of the face, we perform the following steps ( Figure 4 2 ) : i. Determine location of the eyes us ing the OpenCV Haar classifier. ii. Calculate centroid of the eye locations. iii. Construct the line joining the centroids of the two detected eyes. iv. Rotate the image so as to align this line with the line corresponding to the neutral image. Every subsequent image is cropped to the same size as the initial (neutral) image after matching the first detected skin pixel on the top left corner of the image. We use the skin detection implementation in  for the skin pixel determination. Manual oversight is required to correct errors which may occur in the process. Some issues to look out for are: i. The eye detection alg orithm may result in erroneous eye detections such as wrong location, more than two eyes or less than two eyes. ii. Some dress or background pixels may be threshold as skin pixels or change in angle may impede visibility of skin pixels in subsequent frames.
27 (a) (b) (c) (d) (e) (f) (g) Figure 4 2 Illustration of the face alignment process The detected eyes are in red rectangles. The centroids of the detections are in green and the line joining the two centroids is in white The lines and centroids have been drawn for the purpose of illustration and are not exact measurements. a)Initial im age. b) Face is rotated c) C ropped face fro m a. d) C ropped face from b. e), f) I nterim images in the alignment process. g) The second image is rotated to align the two lines.
28 4.3 Optical Flow The dense optical f low between the neutral and the target frame is computed using the Black a nd Anandan method described in S ection 3.1 By computing o ptical f low after aligning the images, we ensure that it does not fail. Determin ation of optical f low involves solving robust formulations using optimization technique s  It is known that when th e spatial information used for o ptical f low determination spans a motion boundary, it might r esult in violations of the constant illumination assumption and also the assumption that the motion between two frames is smooth. These issues are overcome by using the Lorentzian and Geman McCure error norm functions as they provide a gradual transition b etween the motion boundaries. Further, the approach uses the coarse to fine strategy  by determining the o ptical f low in levels. The algorithm begins at the coarse, spatially filtered level and uses the flow estima te to warp the image to the next, finer level until full resolution is reached. Hence the Black and Anand an method generates a robust, dense flow of vectors corresponding to the facial skin deformation. Despite the robust c omputation, the optical f low can still fail for a few pixels. These pixels must be handled properly and not be taken into account for the strain computation in the next step. Figure 4 3 and Figure 4 4 g ive the dense optical flow for some input frames. 4.4 Computation of Optical Strain Once we have the displacement vectors from the optical flow data, we can c ompute the optical strain. The optical strain is calculated based on the Finite Difference Method as described in Section 3.4 The strain maps for the optical flow computed in Figure 4 3 are shown in Figure 4 5 and those for Figure 4 4 are shown in Figure 4 6
29 (a) (b) (c) (d) Figure 4 3 Dense o ptical f low on two frames of the Alex Rodriguez video a) The neutral frame. b) The peak frame corresponding to the mi croexpression. c) Illustration of the computed dense optical flow obtained by th e Matlab implementation provided by MJ Black where each colour corresponds to a different direction and the saturation corresponds to the intensity of the motion. d) The normalized optical flow showing only magnitude. As can be seen, the peak flow is at t he mouth area to the right.
30 (a) (b) (c) (d) Figure 4 4 Com puted d ense o ptical f low on two frames of the USF dataset a) The neutral frame. b) The peak frame corresponding to the mi croexpression. c) Illustration of the computed dense optical flow obtained by the Matlab implementation provided by MJ Black where each colour corresponds to a different direction and the saturation corresponds to the intensity of the motion. d) The norma lized optical flow showing only magnitude. As can be seen, the peak flow is at both sides of the mouth area
31 (a) (b) (c) (d) (e) Figure 4 5 Strain maps for the optical f low computed in Figure 4 3 a) The neutral frame. b) The peak frame corresponding to the microexpression. c) The vertical strain computed along the y axis d) The hor izontal strain computed along the x axis and e) The combined strain magnitude.
32 (a) (b) (c) Figure 4 6 Strain map for the optical f low computed in Figure 4 4 a) The neutral frame. b) The peak frame corresponding to the microexpression. c) The combined optical strain magnitude
33 4.5 Split i nto Regions As discussed in Section 1.3 FACS the most commonly used facial expression coding system represents expressions in terms of A ction U nit s. Most of these Action Unit s are based on muscular movement. We divide the face into eight regions viz., forehead, left of eye, right of eye, left cheek, right cheek, left of mouth right of mouth and below mouth. Together, these regions cover 37 of the 43 Action Unit s Including the eye region would have enabled u s to cover more of the Action Unit s, but the blinking of eyes need to be accounted for, before it can be used. In order to account for every possible Action Unit, we would have to measure the deformation at every facial muscle. However, we have observed in our experiments that, the regions we have covered are sufficient to spot a majority of the microexpression s Also, for faces with only partial visibility, not all eight regions are available. Presently, we watch the video sequence and decide which regions to utilize. The region classification is automated with manual intervention as required. We gather the strain components calculated in the previous step, corresponding to each region, normalize them within the region and arrive at a single value of strain magnitude for each region. Figure 4 7 Split of face into separate regions Each colored area corresponds to a region. We do not use the eye and nose area corresponding to the bricked region in the figure
34 4.6 Thresholding The strain plots obtained for each region are then subjected to thresholding in order to spot the microexpressions. There are two criteria to select a frame sequence as a region of microexpression: i. The strain must be significantly larger than the surrounding regions. ii. The duration for which the strain is larger than the surrounding regions must be less than 1/5 th of a second In orde r to arrive at a threshold that satisfies both the conditions, we extracted a portion of the data set Th is set comprised entirely of USF data as it is more pristine and devoid of movement when compared to the other datasets. We ran the strain computation algorithm on the ground truthed sequences and checked the region of the strain plots which satisfied the above conditions. We arrived at the conclusion that the best results were achieved when the threshold was selected such that : i. Strain Magnitude > upper _threshold Local Peak Strain Magnitude and ii. Strain Magnitude > lower_threshold Global Average Strain Magnitude. The first condition ensures that the measurement of duration does not happen at the base of the peak (which would result in missing of microe xpression) and the second condition ensures that the strain is large enough to signify a deformation. If both the conditions are met for a sequence of frames no greater than nine we declare that sequence as a microexpression. The algorithm for the thresholding is given in Figure 4 8 Figure 4 9 shows the plot for the Alex Rodriguez sequence. The strain magnitudes for each region are plotted in a different colour. Each color corresponds to a region of the face. The hexagons at the top indicate the peak of the microexpression as detected by the proposed algorithm. When detected in more than one region, the detection is shown for the first detected region only. Multiple detections within an interval of 6 frames are consolidated into a single detection.
35 1. Input: Strain magnitude for all regions of the face 2. Output: Peak frame of sequences corresponding to Microexpressions 3. Avg < Average strain for current region 4. Detect peak 5. While peak_ mag > 2 x Avg 6. While strain_mag > upper_threshold x peak_mag and strain_m ag > lower threshold x Avg 7. I ncrement counter 8. Move to next frame 9. End 10. Move back to peak 11. While strain_mag > upper_threshold x peak_mag and strain_mag > lower threshold x Avg 12. Increment counter 13. Move to previous frame 14. End 15. If counter > 1 and counter < 10 16. Declare Microexpression at peak 17. Endif 18. Remove visited frames 19. Detect next peak 20. End 21. Merge and Manage results from all regions Figure 4 8 Algorithm for thresholding strain magnitudes
36 (a) (b) (c) Figure 4 9 Example strain thresholding a) Strain plots for various regions of the face and the detected microexpression corresponding to frame number 802 for the right of mouth region ( shown in black in the graph), marked with a hexagon at the top of the graph. The dashed lines have been added for illustration. The legend shows the regions as described in Section 4.5 b) The neutral image for the sequence and c) The frame showing the peak of the detected microexpression.
37 CHAPTER 5 DATASETS Due to the lack of availability of appropriate databases for evaluating our algorithm, we have gathered our own database of microexpressions. Our database comprises of three datasets. This section provides a brief description of each of the dataset along with its features and collection methodology. Table 5 1 gives the statistics of the datasets used. 5.1 USF Dataset The USF dataset comprises of two datasets which are described below. 5.1.1 Feigned Dataset Th is dataset comprises of 12 videos, with each vide o containing eight microexpressions for a total of 96 microexpressions The microexpressions span across the entire facial region. The subjects were aware of the theory behind microexpressions and were shown some real life videos and TV shows containing mi croexpressions They were then asked to repeat them in any order they liked, while avoiding out of plane rotation of the head. Sample frames from the USF dataset are shown in Figure 4 2 5.1.2 Live Questioning Dataset Another experimental dataset comprises of four videos. Each video contains the questioning sequence of one s ubject. This dataset contains 4 sequences with one microexpression each for a total of 4 microexpression s. Following were the instructions given to the subject: i. There are four envelopes on the table in front of you.
38 ii. One of those contains a letter. We do not know which one contains the letter. The contents of the letter are irrelevant. iii. You are to pick one env elope and check the contents without revealing them. iv. You will later be questioned. If you are able to convince us that the envelope you picked contained he letter, you will receive a gift card for $10. The subjects were later question ed to see if t he y rec eived the letter. The session was video recorded and lasted 55 6 5 seconds. The subjects would all claim to have received the letter for monetary benefits, thereby generating spontaneous microexpressions. The subjects were unaware that their expressions were the point of focus in this experiment. All the videos in the USF dataset were recorded using a Panasonic camcorder in either SD or HD The lighting and background have a slight variation between subjects, but are constant for the entirety of the video The videos show full frontal face with the camera at a distance of 2 meters from the subject. The subjects were asked to avoid out of plane rotations of the face. The ground truthing involved identifying microexpressions and the region of occurrence from the videos The task was performed by 2 students in the group who were well versed in microexpressions, though not formally trained. 5.2 Canal 9 Dataset The Canal 9 political debate corpus  is a collection of 72 p olitical debates recorded by the Canal 9 local TV station and broadcast in Valais, Switzerland. This corpus includes a total of roughly 42 hours of edited hi quality audiovisual recordings. Each debate involves up to five French speaking participants (one moderator and up to four guest participants) and is focused on a single question with a straight yes/no answer, you favorable to the new education laws? The guest participants firmly state their positions at the beginning of the debate, forming two clearly defined groups of opponents. The videos were carefully screened for microexpressions and divided into sequences of about 6 seconds each, with each sequence containing one microexpression for a total of 24 microexpressions The videos comprise o f faces
39 in a variety of angles and rotations. The clips that were chosen were such that the position of the face did not change by much within the sequence. (a) (b) Figure 5 1 Canal 9 d ataset 5.3 Found Videos Dr. Paul Ekman, a renowned researcher in microexpressions, has, in his books, talks and blogs, mentioned several examples of microexpressions in public scenarios and speeches. These are classic examples of microexpressions and we have obtained some of those from the internet. These include : English spy Kim Katie Couric where he denies taking drugs. This dataset comprises a total of 4 microexpressions. Each of these clips is about two seconds long The use of these videos is covered by the fair use act that allows use of 10% of publicly available videos for non commercial r esearch.
40 (a ) (b ) (c) Figure 5 2 Found v ideo dataset a) A microexpression of scorn from the O.J. Simpson trial b ) interview with a microexpression of duping delight c) Alex Rodriguez interview with Katie Couric Table 5 1 Characteristics of datasets used for evaluation Dataset Name Number of Sequences Approximate Duration per sequence(s) Microexpressions per sequence Total Resolution USF feigned 12 140 8 96 SD / HD USF questioning 4 65 1 4 H D Canal9 dataset 6 300 400 4 24 HD Found videos 3 30 40 1 4 Low
41 CHAPTER 6 EXPERIMENTAL RESULTS 6.1 Determination of Threshold The USF dataset was used to determine t he threshold. About 20% of the g round truthed microexpressions were used in the determination. The threshold selection process is fully automated. Varying thresholds are applied on the training set and the strain threshold value that detects maximum number of mi croexpressions is selected as the thres hold value The Receiver Operating Characteristic (ROC curve) for t he applied threshold values and the resulting true detections is shown in Figure 6 1 and the values are tabulated in Table 6 1 Figure 6 1 ROC curve used for threshold determination The %true positive detection pea ks at 77.2% and corresponds to a threshold of 35% of the peak strain value 0 10 20 30 40 50 60 70 80 90 100 75 50 35 30 25 % True Positive threshold as % Peak Strain Threshold Selection
42 Table 6 1 Results of thresholding on the training set with 22 microexpressions Sl. No. Threshold as percentage of peak strain % True positives % False positives 1 75 31.8 0 2 50 50 0 3 35 77.2 22.7 4 30 54.5 36.4 5 25 1 3 .6 0 Figure 6 2 Typical strain pattern for a microexpression Too high a threshold and too low a threshold result in erroneous detection. A threshold of 35% gives the optimum detection rate.
43 When the threshold is set to be high (75% of the detected peak), we spot only about 32% of t he microexpressions with no false spots. The low number of spots is because, only one or two frames at the most have the peak strain and they correspond to the peak of the expression. The remaining frames have a lower strain value. This theory is supported by the fact that as the threshold is reduced, there is an increase in the spotted microexpression s. The spotted microexpression count peaks at a threshold of 35% with a significant number of false spots. The false spots are because of noise introduced into the image from fractional improper alignment of the images which i s equivalent to a false rigid body movement as far as optical f low determination is concerned. Further reduction of the threshold results in an insignificant number of spots. This is owing to the threshold falling almost to the average strain level which i ncreases the duration of the sequence to well beyond the limits of a microexpression. Figure 6 2 shows a sample strain pattern of a single microexpression extracted from the training set for the purpose of illustration of the impact of threshold selection. It shows various thresholds an d the corresponding number of frames covered for each threshold. As can be seen, the duration is too less at high threshold values and too high when the threshold is very low. An optimal value of 35% is chosen for the threshold. 6.2 Results w ith Only Macroexpr essions In order to test the ability of the proposed algorithm to reject microexpressions we ran our algorithm on two videos from the USF expression dataset  T he algorithm was run on the entire face instead of spl itting it into regions. The algorithm did not spot any microexpressions showing that, when an expression is exhibited exclusively and explicitly, the proposed algorithm rejects that sequence successfully. Figure 6 3 shows the strain map and thresholding results for one of the sequences.
44 Figure 6 3 Strain map and thresholding with only macroexpressions As can be seen, no sequence is flagged as a microexpression, validating the evaluation for this section of the proposed algorithm. 6.3 Microexpression Spotting Results Until now, we have seen the suitability and stability o f microexpressions as compared to other method s. Now, we present the results of evaluating the proposed algorithm on the datasets described in the previous section. Table 6 2 Microexpression spotting results Dataset Name Number of sequences Number of Mic roexpressions % True positive % Missed % False Positive USF enacted 12 96 80.2 19.8 37.5 USF questioning 4 4 75 25 0 Political 6 24 50 50 75 Found videos 3 4 75 25 0 Total 25 128 70.2 29.8 42.18
45 Figure 6 4 Strain plot s for select USF sequences Each color corresponds to a region of the face. The hexagons at the top indicate the peak of the microexpression as detected by the proposed algorithm. When detected in more than one r egion, the detection is shown for the first detected region only. Multiple d etection s within an interval of 6 frames are consolidated into a single detection.
46 Figure 6 4 shows strain plots for select sequences in the USF dataset. Each color corresponds to a region of the face. The hexagons at the top indicate the peak of the microexpression as detected by the proposed algorithm. When detected in more than one region, the detection is shown for the first detected region only. Multiple detections within an interval of 6 frames are consolidated into a single detection. Table 6 2 gives the results of the proposed spotting algorithm on various datasets. The true positive detection percentage and the false positive detection percentage were calculated from the obtained results ( Figure 6 5 ). Figure 6 5 Microexpression spotting results The USF enacted dataset has the highest true positive rate with the lowest corresponding false positive rate. Thi s was expected due to the constant illumination and stable posture with very little head movement. Also, the only movements were the macro and microexpressions that did not include any talking. The questioning dataset showed accurate results with a very lo w false positive rate. However, the number of subjects is low The political dataset resulted in a large number of false positives. This is owing to talking and the excessive head movements resulting in occlusions of regions of the face Talking causes qui ck and fine movements of skin along the jaw 0 20 40 60 80 100 USF enacted USF questioning Canal9 Found Videos Spotting Results % True Positive % False Positive
47 line in the mouth region and these were being misclassified as microexpressions. The f ound v ideo dataset shows promising results with a 75% detection rate The worst performance of the algorithm resulted when the subject was talking, leading to a high f alse p ositive rate. Performing voice detection to identify when the subject is talking and excluding the mouth region during talking sequences would help reduc e the number of false positives.
48 CHAPTER 7 CONCLUSION AND FUTURE WORK 7.1 Conclusions A novel method for spotting facial microexpressions in video sequences using optical strain has been presented. This method determines the o ptic al f low between two frames and uses it to compute the strain in facial skin tissue. This strain has been shown to be a measure of deformation. The principle behind using deformation is that maximum deformation occurs at the peak of the microexpression. By dete cting deformation that last s for the duration of an microexpression, we are able to identify the boundaries of the microexpression. As we compute the den s e o ptical f low, we are able to use the Finite Difference Method approach described in Section 3.4 to determine the optical strain. Finding appropriate datasets that comprise of spontaneous microexpressions was a big challenge. We conducted experiments on three sep arate datasets the USF dataset containing both spontaneous and enacted sequences, the Canal 9 dataset which was gleaned for microexpressions and the found video dataset that comprises of videos from the internet. The results on the USF dataset are very p romising with very good detection acc uracy. There is a significant false p ositive rate, but this was anticipated because the threshold arrived at during the training phase was selected liberally, so as not to miss any true microexpr essions, at the cost of a high false p ositive rate. The found video dataset comprises of real time videos and hence is the true test of the system. Most sequences comprised of only one microexpression and had at most two microexpressions. The detection rate was high for this da taset. The algorithm performed poorly on the Canal 9 dataset because of frequent talking and out of plane rotation of the head resulting in partial occlusion of the face.
49 Resolution and frame size had no impact on the accuracy of the system as long as the face was clearly visible The performance time was affected though, as the o ptical f low took longer to calculate for the HD images. Adding macro expression spotting reduced the time taken as it helped shorten the sequence to be tested. This was only run as a trial and needs to be fully integrated into the system to get significant and meaningful results. 7.2 Future Work Availability of datasets was the biggest challenge faced. Hence, future work will begin with collecting an appropriate number of spontaneous v ideo sequences. The approach described in this work as well as those used by other researchers  will be used for this purpose. T he main reasons for f ailure in the Canal 9 dataset are the frequent talking and out of plane movement of the f ace. Performing voice detection to identify when the subject is talking and excluding the region around the mouth for those durations could help reduce the number of false positives. Increasing the effectiveness of th e alignment should help increase the overall accuracy of the algorithm. An interesting approach of using the o ptic f low vectors of rigid points on the face to determine rotation/translation and using them to register and align the face is discussed in  Detection and splitting of face into regions involves some manual intervention at present. An algorithm for automatic portioning of faces into regions is presented in  I ncorporation of this method into our algorithm could help in reducing the amount of manual intervention in our algorithm.
50 REFERENCES  P. Ekman, Telling Lies: Clues to Deceit in the Marketplace, Politics, and Marriage (Revised and Updated Edition). W. W. Norton & Company, 2001.  P. Ekman, E. T. Rolls, D. I. Perrett and H. D. Ellis, "Facial Expressions of Emotion: An Old Controversy and New Findings [and Discussion]," Philosophical Transactions: Biological Sciences, vol. 335, pp. 63 69, Jan. 29, 1 992.  S. Porter and L. ten Brinke, "Reading Between the Lies," Psychological Science, vol. 19, pp. 508 514, 05, 2008.  G. C. Littlewort, M. S. Bartlett and K. Lee, "Automatic coding of facial expressions displayed during posed and genuine pain," Image Vision Comput., vol. 27, pp. 1797 1803, 11, 2009.  B. Schuller, R. Muller, F. Eyben, J. Gast, B. Hornler, M. Wollmer, G. Rigoll, A. Hothker and H. Konosu, "Being bored? Recognising natural interest by extensive audiovisual integration for real li fe application," Image Vision Comput., vol. 27, pp. 1760 1774, 2009.  T. Ehrenfeld, "What's In Your Face: Are 'Microexpressions' The Key To Better Security?" Newsweek, Jun 9, 2003.  M. Gladwell, "The naked face," The New Yorker, pp. 38, 2002.  P. Ekman and E. L. Rosenberg, What the Face Reveals: Basic and Applied Studies of Spontaneous Expression using the Facial Action Coding System(FACS). Oxford University Press, 1997.  K. Scherer and P. Ekman, "Methods for measuring facial action," in Han dbook of Methods in Nonverbal Behavior Research Anonymous Cambridge University Press, 1982, pp. 45 90.  M. Shreve, S. Godavarthy, V. Manohar, D. Goldgof and S. Sarkar, "Towards macro and micro expression spotting in video using strain patterns," in Ap plications of Computer Vision (WACV), 2009 Workshop on, 2009, pp. 1 6.  Vasant Manohar, D. Goldgof, Sudeep Sarkar and Yong Zhang, "Facial strain pattern as a soft forensic evidence," in Applications of Computer Vision, 2007. WACV '07. IEEE Workshop on 2007, pp. 42  V. Manohar, M. Shreve, D. Goldgof and S. Sarkar, "Finite element modeling of facial deformation in videos for computing strain pattern," in Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, 2008, pp. 1 4.  V Manohar. Facial skin motion properties from video: Modeling and applications Ph D dissertation, University of South Florida, 2009
51  E. A. Haggard and K. S. Isaacs, "Micro momentary facial expressions as indicators of ego mechanisms in psychotherapy, Methods of Research in Psychotherapy, pp. 154, 1966.  P. Eckman, "Darwin, Deception, and Facial Expression," Ann. N. Y. Acad. Sci., vol. 1000, pp. 205 221, 2003.  C. Darwin, Expression of the Emotions in Man and Animals, the. Oxford University Press Inc, 2002.  P. Ekman and W. Friesen, "Constants across cultures in the face and emotion," J. Pers. Soc. Psychol., vol. 17, pp. 124 129, 1971.  P. Ekman and W. V. Friesen, Unmasking the Face: A Guide to Recognizing Emotions from Facial Clues Prentice Hall, 1975.  P. Ekman, "Lie catching and micro expressions," in The Philosophy of Deception C. Martin, Ed. Oxford University Press, 2009.  G. Peleg, G. Katzir, O. Peleg, M. Kamara, L. Brodsky, H. Hel Or, D. Keren and E. Nevo, "Heredit ary family signature of facial expression," Proceedings of the National Academy of Sciences, vol. 103, pp. 15921 15926, October 24, 2006.  M. Pantic, A. Pentland, A. Nijholt and T. Huang, "Human computing and machine understanding of human behavior: A survey," in ACM SIGCHI Proceedings Eighth International Conference on Multimodal Interfaces, 2006, pp. 239 248.  S. Z. Li and A. K. Jain, Handbook of Face Recognition. Springer, Berlin, 2005.  Ming Hsuan Yang, D. J. Kriegman and N. Ahuja, "Detec ting faces in images: a survey," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, pp. 34 58, 2002.  G. Yang and T. S. Huang, "Human face detection in a complex background," Pattern Recognit, vol. 27, pp. 53 63, 1, 1994.  Kin Choong Yow and R. Cipolla, "A probabilistic framework for perceptual grouping of features for human face detection," in Automatic Face and Gesture Recognition, 1996., Proceedings of the Second International Conference on, 1996, pp. 16 21.  S. A. S irohey Human face segmentation and identification Master's thesis, University of Maryland 1993.  K. Yow, K. Yow, R. Cipolla and R. Cipolla, "Feature Based Human Face Detection," Image Vision Comput., vol. 15, pp. 713 735, 1996.  S. Fahlman and C. Lebiere, "The cascade correlation learning architecture," in Advances in Neural Information Processing Systems 2, 1990, pp. 524 532.  Y. Dai, "Face texture model based on SGLD and its application in face detection in a color scene," Pattern Re cognit, vol. 29, pp. 1007 1017, 06, 1996.  P. Viola and M. Jones, "Robust Real Time Face Detection," International Journal of Computer Vision, vol. 57, pp. 137 154, 05/01, 2004.
52  H. Kobayashi and F. Hara, "Facial interaction between animated 3D f ace robot and human beings," in Systems, Man, and Cybernetics, 1997. 'Computational Cybernetics and Simulation', 1997 IEEE International Conference on, 1997, pp. 3732 3737 vol.4.  D. Terzopoulos and K. Waters, "Analysis and synthesis of facial image s equences using physical and anatomical models," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 15, pp. 569 579, 1993.  M. Black and Y. Yacoob, "Recognizing Facial Expressions in Image Sequences Using Local Parameterized Models o f Image Motion," International Journal of Computer Vision, vol. 25, pp. 23 48, 1997.  M. Pantic, "Automatic Analysis of Facial Expressions: The State of the Art," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, pp. 1424 1445, 12/01, 2000.  A. B. Ashraf, S. Lucey, J. F. Cohn, T. Chen, Z. Ambadar, K. M. Prkachin and P. E. Solomon, "The painful face Pain expression recognition using active appearance models," Image Vision Comput., vol. 27, pp. 1788 1796, 2009.  P. Luce y, J. Cohn, S. Lucey, S. Sridharan and K. M. Prkachin, "Automatically detecting action units from faces of pain: Comparing shape and appearance features," in 2009, pp. 12 18.  P. Lucey, J. Cohn, S. Lucey, I. Matthews, S. Sridharan and K. M. Prkachin, "Automatically detecting pain using facial actions," in Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd International Conference on, 2009, pp. 1 8.  G. Lu, X. Li and H. Li, "Facial expression recognition for neonata l pain assessment," in Neural Networks and Signal Processing, 2008 International Conference on, 2008, pp. 456 460.  P. Ekman and W. Friesen, Facial Action Coding System: A Technique for the Measurement of Facial Movement. Palo Alto: Consulting Psychol ogists Press, 1978.  M. S. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel and J. Movellan, "Recognizing facial expression: Machine learning and application to spontaneous behavior," in Computer Vision and Pattern Recognition, 2005. CVPR 200 5. IEEE Computer Society Conference on, 2005, pp. 568 573 vol. 2.  M. Pantic and I. Patras, "Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences," Systems, Man, and Cybernetics, Pa rt B: Cybernetics, IEEE Transactions on, vol. 36, pp. 433 449, 2006.  M. Pantic and L. Rothkrantz, "Facial Action Recognition for Facial Expression Analysis from Static Face Images," IEEE Trans Syst Man Cybern., vol. 34, pp. 1449 1461, 2004.  G. Donato, M. Bartlett, J. Hager, P. Ekman and T. Sejnowski, "Classifying Facial Actions," IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, pp. 974 989, 1999.  I. A. Essa and A. P. Pentland, "Coding, analysis, interpretation, and recognition of facial e xpressions," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 19, pp. 757 763, 1997.
53  J. P. Skelly. Experiments in expression recognition Massachusetts Institute of Technology, 2005  T. Kanade, J. F. Cohn a nd Yingli Tian, "Comprehensive database for facial expression analysis," in Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on, 2000, pp. 46 53.  M. Lyons, S. Akamatsu, M. Kamachi and J. Gyoba, "Coding f acial expressions with gabor wavelets," in Automatic Face and Gesture Recognition, 1998. Proceedings. Third IEEE International Conference on, 1998, pp. 200 205.  A. J. O'Toole, J. Harms, S. L. Snow, D. R. Hurst, M. R. Pappas, J. H. Ayyad and H. Abdi, "A video database of moving faces and people," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, pp. 812 816, 2005.  T. Sim, S. Baker and M. Bsat, "The CMU pose, illumination, and expression database," Pattern Analysis and Mach ine Intelligence, IEEE Transactions on, vol. 25, pp. 1615 1618, 2003.  M. Pantic, M. Valstar, R. Rademaker and L. Maat, "Web based database for facial expression analysis," in Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on, 200 5, pp. 5  M. Pantic, "Machine analysis of facial behaviour: naturalistic and dynamic behaviour," Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 364, pp. 3505 3513, 12/12, 2009.  J. Rothwell, Z. Bandar, J. O'Shea and D. McLean, "Silent talker: a new computer based system for the analysis of facial cues to deception," Applied Cognitive Psychology, vol. 20, pp. 757 777, 2006.  I. Cohen, N. Sebe, L. Chen, A. Garg and T. Huang, "Facial expression recognition from vid eo sequences: Temporal and static modelling," in Computer Vision and Image Understanding, 2003, pp. 160 187.  Hai Tao and T. S. Huang, "Connected vibrations: A modal analysis approach for non rigid motion tracking," in Computer Vision and Pattern Reco gnition, 1998. Proceedings. 1998 IEEE Computer Society Conference on, 1998, pp. 735 740.  F. Tsalakanidou and S. Malassiotis, "Real time 2D+3D facial action and expression recognition," Pattern Recognit, vol. 43, pp. 1763 1775, 5, 2010.  Y. Tian T. Kanade and J. F. Cohn, "Recognizing action units for facial expression analysis," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 23, pp. 97 115, 2001.  Y. Wang, J. Ostermann and Y. Zhang, Video Processing and Communications Prentice Hall, 2002.  R. Jain, R. Kasturi and B. G. Schunck, Machine Vision. McGraw Hill Inc., 1995.  J. L. Barron, D. J. Fleet, S. S. Beauchemin and T. A. Burkitt, "Performance Of Optical Flow Techniques," International Journal of Computer Vis ion, vol. 12, pp. 43 77, 1994.
54  B. Horn and B. Schunck, "Determining Optical Flow," ARTIFICAL INTELLIGENCE, vol. 17, pp. 185 203, 1981.  B. D. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision," in Vancouver, BC, Canada, 1981, pp. 674 679.  H. Nagel, "On the estimation of optical flow: Relations between different approaches and some new results," Artif. Intell., vol. 33, pp. 299 324, 11, 1987.  M. Black and P. Anandan, "The robust estima tion of multiple motions: parametric and piecewise smooth flow fields," Comput. Vis. Image Underst., vol. 63, pp. 75 104, 1996.  G. Bradski and A. Kaehler, Learning OpenCV. O'Reilly Media, 2008.  P. Viola and M. Jones, "Rapid object detection usi ng a boosted cascade of simple features," in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, 2001, pp. I 511 I 518 vol.1.  Y. Freund and R. E. Schapire, "A decision theoretic gener alization of on line learning and an application to boosting," Journal of Computer and System Sciences, vol. 55, pp. 119 139, 1997.  C. O. Conaire, N. E. O'Connor and A. F. Smeaton, "Detector adaptation by maximising agreement between independent data sources," in Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on, 2007, pp. 1 6.  P. Anandan, "A computational framework and an algorithm for the measurement of visual motion," International Journal of Computer Vision, vol. 2, pp. 283 310, 01/21, 1989.  A. Vinciarelli, A. Dielmann, S. Favre and H. Salamin, "Canal9: A database of political debates for analysis of social interactions," in Affective Computing and Intelligent Interaction and Workshops, 2009. ACII 2009. 3rd Int ernational Conference on, 2009, pp. 1 4.  A. S. M. Sohail and P. Bhattacharya, "Detection of Facial Feature Points Using Anthropometric Face Model," Signal Processing for Image Enhancement and Multimedia Processing, pp. 189 200, 2008.
xml version 1.0 encoding UTF-8 standalone no
record xmlns http:www.loc.govMARC21slim xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.loc.govstandardsmarcxmlschemaMARC21slim.xsd
leader nam 22 Ka 4500
controlfield tag 007 cr-bnu---uuuuu
008 s2010 flu s 000 0 eng d
datafield ind1 8 ind2 024
subfield code a E14-SFE0004553
Microexpression spotting in video using optical strain
h [electronic resource] /
by Sridhar Godavarthy.
[Tampa, Fla] :
b University of South Florida,
Title from PDF of title page.
Document formatted into pages; contains X pages.
Thesis (MSCS)--University of South Florida, 2010.
Includes bibliographical references.
Text (Electronic thesis) in PDF format.
Mode of access: World Wide Web.
System requirements: World Wide Web browser and PDF reader.
ABSTRACT: Microexpression detection plays a vital role in applications such as lie detection and psychological consultations. Current research is progressing in the direction of automating microexpression recognition by aiming at classifying the microexpressions in terms of FACS Action Units. Although high detection rates are being achieved, the datasets used for evaluation of these systems are highly restricted. They are limited in size usually still pictures or extremely short videos; motion constrained; containing only a single microexpression and do not contain negative cases where microexpressions are absent. Only a few of these systems run in real time and even fewer have been tested on real life videos. This work proposes a novel method for automated spotting of facial microexpressions as a preprocessing step to existing microexpression recognition systems. By identifying and rejecting sequences that do not contain microexpressions, longer sequences can be converted into shorter, constrained, relevant sequences which comprise of only single microexpressions, which can then be passed as input to existing systems, improving their performance and efficiency. This method utilizes the small temporal extent of microexpressions for their identification. The extent is determined by the period for which strain, due to the non-rigid motion caused during facial movement, is impacted on the facial skin. The subject‟s face is divided into sub-regions, and facial strain is calculated for each of these regions. The strain patterns in individual regions are used to identify subtle changes which facilitate the detection of microexpressions. The strain magnitude is calculated using the central difference method over the robust and dense optical flow field of each subject‟s face. The computed strain is then thresholded using a variable threshold. If the duration for which the strain is above the threshold corresponds to the duration of a microexpression, detection is reported. The datasets used for algorithm evaluation are comprised of a mix of natural and enacted microexpressions. The results were promising with up to 80% true detection rate. Increased false positive spots in the Canal 9 dataset can be attributed to talking by the subjects causing fine movements in the mouth region. Performing speech detection to identify sequences where the subject is talking and excluding the mouth region during those periods could help reduce the number of false positives.
Advisor: Dmitry B. Goldgof, Ph.D.
x Computer Science & Engineering
t USF Electronic Theses and Dissertations.