Automatic Facial Action Units Recognition

 

What are Action units?

 

Action units (AUs) represent the muscular activity that produces facial appearance changes --defined in Facial Coding System  by Ekman and Friesen[1].

 

Figure 1. Examples of some action units extracted from Cohn and Kanade’s database [2]

 

Why recognize facial actions?

 

Many approaches on facial expression analysis attempt to recognize a small set of motion categories such as happy, sadness, surprise, disgust, fear and anger. However, human emotion is consisted of thousands of expressions, which differ in subtle changes due to a few facial features or blending of several emotions. The Facial Action Coding System (FACS) decomposes the facial behavior into 46 action units (AUs), each of which is anatomically related to the contraction of a specific set of facial muscles movement.

 

FACS provides a powerful means for detecting and measuring large numbers of facial expressions by detecting a small set of muscular actions which comprise the facial expression. Over 7000 AU combinations have been observed.

 

Why we need an automatic system to detect AUs?

 

The effort for training human experts and manually score the AUs is expensive and time consuming. An automatic system to detect action units can be applied in many application fields, including automated tool for behavioral research, videoconferencing, affective computing, perceptual human-machine interfaces, 3D face reconstruction and animation ….

 

AU Relationships Modeling by A Dynamic Bayesian Network

 

Developing an automatic system for AU recognition is always challenging due to the richness, ambiguity, and the dynamic nature of facial actions. In this research, we propose a novel approach that systematically accounts for the relationships among AUs and their temporal evolutions for AU recognition.

 

In a spontaneous facial behavior, there are some relationships among AUs:

*     Groups of AUs often appear together to show meaningful expression

*     Co-occurrence relationships such as AU1 (inner brow raiser) and AU2 (outer brow raiser)

*     Mutual exclusive relationships, e.g. AU24 (lip presser) and AU25 (lips apart)

 

We use a dynamic Bayesian network (DBN) to model the relationships among different AUs. The DBN as shown in Figure 2 provides a coherent and unified hierarchical probabilistic framework to represent probabilistic relationships among various AUs and to account for the temporal changes in facial action development. Within our system, robust computer vision techniques are used to obtain AU measurements. And such AU measurements are then applied as evidence to the DBN for inferring various AUs.

 

Figure 2. The Dynamic BN for AU modeling. The self-arrow at each AU node indicates the temporal relationship of a single AU from the previous time step to the current time step. The arrow from  at time t-1 to  () at time t indicates the temporal relationship between different AUs. The shaded circle indicates the measurement for each AU.

 

 

Automatic Facial Action Unit Recognition by Exploiting the Dynamic and Semantic Relationships Among Action Units

 

Figure 3 gives the flowchart of our automatic AU recognition system, which consists of an offline training phase and an online AU recognition phase.

*     The system training includes training an AdaBoost classifier for each AU, and the dynamic Bayesian network modeling in order to correctly model the AU relationships. Advanced learning techniques are applied to learn both the structure and parameters of the DBN based on both the training data and the domain knowledge.

*     The online AU recognition consists of two independent but collaborative components: AU measurement extraction by AdaBoost classification and DBN inference. First, the face and eyes are detected in live video automatically. Then the face region is divided into upper and lower parts, which are aligned, based on the detected eye positions, and are convolved with a set of multi-scale and multi-orientation Gabor filters respectively to produce wavelet features for each pixel. After that, the AdaBoost classifier combines the wavelet features to produce a measurement score for each AU. These scores are then used as evidence in a dynamic Bayesian network for AU inference.

 

Figure 3. The flowchart of the real-time AU recognition system: (a) the offline system training for AdaBoost classifier and dynamic Bayesian network, and (b) the online AU recognition

 

Experimental Results:

 

(1) Evaluation on Cohn-Kanade Database

 

We first evaluate our system on the Cohn and Kanade's DFAT-504 database [2] using leave-one-subject-out cross-validation for the 14 target AUs. Cohn and Kanade's database consists of more than 100 subjects covering different races, ages and genders. This database is collected under controlled illumination and background and has been widely used for evaluating facial action unit recognition system. In order to extract the temporal relationships, the Cohn-Kanade database is coded into AU labels frame by frame in our work. The positive samples are chosen as the training images containing the target AU at different intensity levels, and the negative samples are selected as those images without the target AU regardless the presence of other AUs. Figure 4 shows the performance for generalization to novel subjects in Cohn-Kanade Database of using the AdaBoost classifiers alone and using the dynamic BN respectively. With only the Adaboost classifiers, our system achieves an average recognition rate 91.2% with positive rate 80.7% and false alarm rate 7.7% for the 14 AUs, where the average rate is defined as the percent of examples recognized correctly. With the use of the dynamic BN, the system achieves the overall average recognition rate of 93.33% with a significant improvement of positive rate of 86.3% and improved false alarm rate of 5.5%.

 

As shown in Figure4, for the AUs that are difficult to be recognized by the AdaBoost classifier, the improvements are impressive, which exactly demonstrates the benefit of using the DBN. For example, recognizing AU23 (lip tighten) and AU24 (lip presser) is difficult, since the two actions occur rarely and the appearance changes caused by these actions are relatively subtle. Fortunately, the probability of these two actions co-occurrence is very high, since they are contracted by the same set of facial muscles. By employing such relationship in DBN, the positive recognition rate of AU23 increases from 54% to 80.6% and that of AU24 increases from 63.4% to 82.3%. Similarly, by employing the co-absence relationship between AU25 (lips apart) and AU15 (lip corner depressor), the false alarm rate of AU25 reduces from 13.3% to 5.7%.

 

                                                (a)                                                                                           (b)

Figure 4. Comparison of AU recognition results on novel subjects in Cohn-Kanade database using the AdaBoost classifier and dynamic BN respectively: (a) average positive rates (b) average false positive rates.

 

(2) Experimental Results under Real-world Condition

 

The system is also evaluated on ISL database to demonstrate the system robustness for real-world environment. The ISL database consists of 42 image sequences from 10 subjects displaying facial expressions undergoing a neutral-apex-neutral evolution. The subjects are instructed to perform the single AUs and AU combinations as well as the 6 basic expressions. The database is collected under real-world conditions with uncontrolled illumination and background as well as moderate head motion. The image sequences are recorded with a frame rate of 30 frames per second. The ISL database is also coded into AU labels frame by frame manually.

 

The system performance is reported in Figure 5. The average recognition rate of DBN inference is 93.27% with an average positive rate of 80.8% and a false alarm rate of 4.47%. Compared with the AU recognition results by the frame-by-frame AdaBoost classification, the AU recognition is improved significantly. The overall correct recognition rate is improved by 5.1% with an 11% increase in positive recognition rate, and a 4.4% decrease in false alarm. Especially for the AUs that are difficult to be recognized, the system performance is greatly improved. For example, the recognition rate of AU7 (lid tighten) is increased from 84% to 94.8%, the recognition rate of AU15 (lip corner depressor) is improved from 71.5% to 82.9%, and that of AU23 (lip tighten) is increased from 82.3% to 94.4%.

 

                                                (a)                                                                                                       (b)

Figure 5. Comparison of AU recognition results on novel subjects under real-world circumstance using the AdaBoost classifier and dynamic BN respectively: (a) average positive rates (b) average false positive rates.

 

 

 

Publications:

 

1.     Yan Tong, Wenhui Liao , and Qiang Ji, “Facial Action Unit Recognition by Exploiting Their Dynamic and Semantic Relationships”, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 29, No. 10, pp. 1683-1699, October 2007.

2.     Yan Tong, Wenhui Liao, and Qiang Ji, "Inferring Facial Action Unit with Causal Relations", the IEEE Conference on Computer Vision and Pattern Recognition (CVPR06), New York city, June, 2006.

 

 

Demos (click the image to play video)

 

 

 

Useful Links for AUs recognition:

 

 

 

Facial expression image database

 

 

 

Reference:

    1. P. Ekman and W. Friesen, Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, CA, 1978.
    2. T. Kanade, J. F. Cohn and Y. Tian, Comprehensive database for facial expression analysis, Proc. of FG00, pages 46-53, 2000.