Real-Time Facial Behaviour Understanding For Human Computer Interaction

A major task for the Human Computer Interaction (HCI) community is to equip the computer with the ability to recognize the user's affective states, intentions and needs from a set of non-verbal cues. Hence, the interaction between human and computer can be enhanced significantly. With the use of video cameras together with a set of computer vision techniques to interpret and understand the human's behaviors, vision-based human sensing technology has the advantages of non-intrusiveness and naturalness. Since the human face is a rich and powerful source of communicative information about human behavior, it has been extensively studied. Eye gaze, identifying a user's focus of attention, can provide useful visual cues about the user's needs. Head gesture, a kind of non-verbal interaction among people, also can reveal the user's feelings and cognitive states. Facial expression, another kind of non-verbal interaction among people, can deliver the emotional states of the user directly.

The research is about the developments of non-intrusive computer vision techniques to analyze the human's face with the use of video cameras. Through analyzing the video images of the face via the proposed computer vision techniques, a set of useful visual information about the face, such as the eye gaze, face orientation, facial expression, etc., can be extracted accurately. Hence, based on these extracted visual information, the computer can identify and understand their users successfully so that more effective and friendly human-computer interface can be built.

First, a new real time eye detection and tracking methodology that works under variable and realistic lighting conditions and various face orientations is proposed. Second, an accurate gaze estimation method is developed so that the gaze information can be estimated accurately under natural head movements. Third, a novel visual tracking framework based on Case Based Reasoning with Confidence is proposed so that the face can be tracked under significant facial expressions and various face orientations. Fourth, twenty-eight prominant facial features are detected and tracked in real-time. Fifth, based on a set of detected facial features, a framework is proposed to recover the rigid and non-rigid facial motions successfully from a monocular image sequence. Subsequently, A Dynamic Bayesian Network is utilized to model and understand the six basic facial expression successfully from the recovered non-rigid facial motions.

All of these techniques are tested with subjects of different ethnic backgrounds, genders and ages, as well as subjects with/without glasses. Moreover, they are tested under different illumination conditions. Experimental study shows significant improvement of our techniques over the existing techniques.