Wenhui (Wendy) Liao

I am currently a research scientist at the R&D in Thomson Reuters. I got my PhD from the Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute in 2006. My primary research interests are probabilistic graphical models, machine learning, and their applications in information retrieval, information extraction, information fusion, and computer vision.

Research Interests:
  • Uncertainty in Artificial Intelligence: Probabilistic graphical models and their applications in modeling, reasoning, learning, and decision-making under uncertainty
  • Machine learning
  • Information Retrieval, Information Extraction
  • Computer Vision, Human Computer Interaction
  • Active Information Fusion, Sensor Modeling and Selection

    Research Projects

    Probabilistic Graphical Models

      Learning Bayesian Networks (BN) when Data is Incomplete

    Many real applications using BNs need to automatically learn BN parameters from data due to the difficulty and time requirement of manually setting up the parameters. However, when data is incomplete, which happens frequently with real-world applications, even the state-of-art learning techniques such as the Expectation-Maximization (EM) algorithm and Gibbs sampling could fail. EM suffers from having too many local maxima, and Gibbs sampling suffers from slow convergence. This motivates us to propose a BN parameter-learning algorithm to escape local maxima by systematically combining domain knowledge during learning. The key idea is to impose two qualitative constrains, the relative relationships between local parameters, and the ranges of local parameters, into EM. In the meantime, sensitivity analysis is used to further refine the search space. The proposed algorithm can achieve state-of-the-art performance in various real applications.

    Non-myopic Value-of-Information Computation

    Influence diagrams (IDs) have been widely used to model decision-making under uncertainty. A common scenario in decision-making modeled by an ID is that a decision maker must decide whether some information is worth collecting, and what information should be acquired first, given several information sources available. Each set of information sources is usually evaluated by value-of-information (VOI). However, due to the exponential time complexity of exactly computing VOI of multiple information sources, decision analysts and expert-system designers focus on the myopic VOI, which requires certain assumptions that are not satisfied in most applications. This motivates us to propose an approximate algorithm to compute non-myopic VOI efficiently by exploiting the central-limit theorem. The efficiency and accuracy of the algorithm make it a feasible approach in a variety of applications where efficiently evaluating a large amount of information sources is necessary.

    Bayesian Network Inference

    This research is motivated by the need to find efficient inference algorithms for several real-world applications ranging from medical diagnosis and situation assessment to object tracking and user affect recognition. The BNs describing these applications have one thing in common: both the query variables and evidence variables are restricted to a subset of the variables. Therefore, we exploit this fact to develop a factor tree inference (FTI) algorithm. FTI successfully combines the advantages of two most popular BN inference algorithms, the VE (Variable Elimination) algorithm that allows one to remove the irrelevant nodes, and the Junction-tree algorithm that allows one to share computations among multiple queries. The efficiency of the FTI algorithm makes it fit well with online BN inference.

    Information Extraction

    Named Entity Extraction

    Named entity recognition (NER) or tagging is the task of finding names such as organizations, persons, locations, etc. in text. Since whether or not a word is a name and the entity type of a name are determined mostly by the context of the word as well as by the entity type of its neighbors, NER is often posed as a sequence classification problem and solved by methods such as hidden Markov models (HMM) and conditional random fields (CRF). Automatically tagging named entities (NE) with high precision and recall requires a large amount of handannotated data, which is expensive to obtain. This problem presents itself time and again because tagging the same NEs in different domains usually requires different labeled data. However, in most domains one often has access to large amounts of unlabeled text. This fact motivates semi-supervised approaches for NER. We present a simple semi-supervised learning algorithm for named entity recognition (NER) using conditional random fields (CRFs). The algorithm is based on exploiting evidence that is independent from the features used for a classifier, which provides high-precision labels to unlabeled data. Such independent evidence is used to automatically extract high-accuracy and nonredundant data, leading to a much improved classifier at the next iteration.

    Asian Language Segementation

    For many text processing applications, such as information retrieval, word processors, spell-checkers, speech recognition, automatic translation systems and so on, it is necessary to know where the words are in a line of text. For western languages, this is a relatively easy task because words are separated by white space and punctuation. However, for Asian languages, such as Chinese and Japanese, finding word boundaries is difficult because they don't delimit words by white-space or other word delimiters. The task of segmentation is to find word boundaries in sentences. The existing methods of segmentation fall roughly into two categories: heuristic dictionary-based methods and statistical machine learning methods. Among these statistical approaches, we are especially interested with conditional random fields (CRFs). CRFs are arbitrary undirected graphical models trained to maximize the conditional probability of the desired outputs given the corresponding inputs. We applie CRFs to Chinese and Japanese Segmentation and achieve state-of-art performance.

    User Query-Log Analysis

    Ongoing ...

    Active Information Selection and Fusion

    A Unified ID Framework for Information Selection, Information Fusion, and Decision-making

    One common issue in many real-world applications is how to choose and integrate multiple information sources (sensors) for solving a problem efficiently, especially when the information could be ambiguous, dynamic, and have multiple-modality. We present a general mathematical framework based on influence diagrams to actively fuse information for timely decision-making. Such a model provides a coherent and fully unified hierarchical probabilistic framework for realizing three main functions: choosing a sensory action set that achieves optimal trade-off between the cost and benefit of sensors, applying a fusion model to efficiently combine the information from the selected sensor set, and making decisions based on the fusion results. The parameters of the model can be automatically learnt with the proposed learning algorithm. This model has been applied to recognize user affective states and provide user assistance, as well as battlefield situation assessment.

    Sensor Selection Algorithms

    Two typical sensor selection scenarios appear in many applications. The first one is to choose a sensor set with maximum information gain given a budget limit; and another one is to choose a sensor set with optimal tradeoff between information gain and cost. Unfortunately, both of them are computationally intractable due to the exponential search space of sensor subsets. Based on the proposed ID framework, we propose efficient sensor selection algorithms for both of the two scenarios. The algorithms exploit the theory of sub-modular functions and the probabilistic dependency among sensors embedded in the ID model. For the budget-limit case, the proposed algorithm provides a constant factor of (1-1/e) guarantee to the optimal performance. Also the computational efficiency of the algorithm is improved by a partitioning procedure. For the optimal trade-off case, a submodular-supermodular procedure is embedded with the proposed sensor selection algorithm to choose the optimal sensor set in a polynomial-time complexity.

    Computer Vision and Human Computer Interaction

     Affective State Recognition and User Assistance

    Increasingly, HCI researchers are interested in user抯 emotional and mental states, since affective states directly influence a user抯 performance, especially negative user affect. Therefore, recognizing such negative user affect and providing appropriate interventions is important for various HCI systems. In this study, we apply the proposed ID framework (2.2) to simultaneously model both affective state recognition (stress, frustration, fatigue, etc.) and user assistance in HCI systems. Affective state recognition is achieved through active probabilistic inference from the available sensory data of multiple-modality sensors. User assistance is automatically accomplished through a decision-making process that balances the benefit of keeping the user in productive affective states and the cost of performing user assistance. To validate the model, we build a non-invasive real-time prototype system to recognize different user affective states (stress and fatigue) from four-modality user measurements, including visual appearance features (facial expression, eye gaze, eye movement, head gesture, etc.), physiological measures (heart rate, GSR, temperature, etc.), user performance, and behavioral data. To our knowledge, this integration from four-modality evidence, together with the probabilistic framework, is unique in user affect research.

     Visual Object Tracking

    Real-time object tracking is essential for video surveillance. One well-know problem is the drifting issue during tracking. We propose a simple but robust framework to automatically maintain and update the object templates for tracking, so that the drifting issue can be well handled. Compared to the existing tracking techniques, the proposed technique has three significant contributions. First, a case-based reasoning (CBR) method is introduced to track non-rigid objects robustly under significant appearance changes without drifting away. Second, an automatic case-base maintenance algorithm is proposed to dynamically update the case base, managing the case base to be representative and concise. Third, it can provide an accurate confidence measurement for each tracked object so that the tracking failures can be identified. With the proposed framework, we implemented a real-time face tracker that can track human faces robustly at 26 frames per second under various face appearance changes.

     Facial Activity Modeling and Recognition

    Facial activities are the most natural and powerful means of human communication. A spontaneous facial activity is characterized by the rigid head movements, the non-rigid facial muscular movements, and their interactions. Current research in facial activity analysis is limited to recognizing rigid or non-rigid motion separately, often ignoring their interactions. Hence, these approaches cannot always recognize facial activities reliably. We propose to explicitly exploit the prior knowledge about facial activities and systematically combine the prior knowledge with image measurements to achieve an accurate, robust, and consistent facial activity understanding. Specifically, we propose a unified probabilistic framework based on the dynamic Bayesian network to simultaneously and coherently represent the rigid and non-rigid facial motions, their interactions, and their image observations, as well as to capture the temporal evolution of the facial activities. Robust computer vision methods are employed to obtain measurements of both rigid and non-rigid facial motions. Finally, facial activity recognition is accomplished through a probabilistic inference by systemically integrating the visual measurements with the facial activity model.

     Video Content Analysis

    For most multimedia retrieval systems, it is essential to organize video based on scenes in multimedia database. This motivates us to propose an effective algorithm to automatically break a video sequence into various scenes. This method systematically combines both audio and visual features extracted from the video sequence. Specifically, an unsupervised segmentation algorithm together with the technique of object tracking is used to identify candidate scene boundaries. And then the audio features are used to further refine the candidates. In addition to video segmentation, content-based audio classification is also a valuable step in multimedia content analysis. Most current systems for classifying audio signals either focus on speech recognition or simply classify audio signals into limited groups such as music and speech. We extract multiple audio features and classify audio content into seven categories using support vector machines.


    Book Chapters:

    Yan Tong, Wenhui Liao, and Qiang Ji, Automatic Facial Action Unit Recognition By Modeling Their Semantic And Dynamic Relationships, Affective Information Processing, Editor Tienniu Tan and Jianhua Tao, Springer, 2008.

    Journal Papers:

    Wenhui Liao and Qiang Ji, 揕earning Bayesian Network Parameters under Incomplete Data with Qualitative Domain Knowledge, to appear at Pattern Recognition, 2008.

    Wenhui Liao, Qiang Ji, and W. A. Wallace, 揂pproximate Nonmyopic Sensor Selection Via Submodularity and Partitioning, to appear at the IEEE transactions on Systems, Man, and Cybernetics, 2008

    Wenhui Liao and Qiang Ji, 揈fficient Non-myopic Value-of-information Computation for Influence Diagrams, International Journal of Approximate Reasoning, vol. 49, no. 2, pp. 436-450, 2008.

    Wenhui Liao, Weihong Zhang, Zhiwei Zhu, Qiang Ji, and Wayne Gray, "Toward a Decision-Theoretic Framework for Affect Recognition and User Assistance",  International Journal of Human-Computer Studies, vol.64, no.9, pp.847-873, 2006.

      Yan Tong, Wenhui Liao, and Qiang Ji, 揊acial Action Unit Recognition by Exploiting their Spatial-temporal Relationships, IEEE Transactions on Pattern Recognition and Machine Intelligence (PAMI), Vol. 29, No. 10, pp. 1683-1699, 2007.

    Conference Papers:

      Wenhui Liao and Qiang Ji, 揈xploiting Domain Knowledge for Learning Bayesian Network Parameters with Incomplete Data, the 19th International Conference on Pattern Recognition (ICPR), 2008.

      Wenhui Liao, Zhiwei Zhu, Yan Tong, and Qiang Ji, 揜obust Object Tracking with a Case-base Updating Strategy, the Twentieth International Joint Conference on Artificial Intelligence (IJCAI-07), January 2007 (acceptance rate: 15.7%).

     Yan Tong, Wenhui Liao and Qiang Ji, 揂 Unified Probabilistic Framework for Spontaneous Facial Activity Modeling and Understanding, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR07), 2007 (acceptance rate: 28.2%). .

     Wenhui Liao and Qiang Ji, 揈fficient Active Fusion for Decision-making via VOI Approximation, the Twenty-First National Conference on Artificial Intelligence (AAAI-06) (acceptance rate: 22.1%).

     Wenhui Liao, 揇ynamic and Active Information Fusion for Decision Making under Uncertainty, the Twenty-First National Conference on Artificial Intelligence (AAAI) Doctoral Consortium, July 2006.

     Zhiwei Zhu, Wenhui Liao, and Qiang Ji, 揜obust Visual Tracking Using Case-based Reasoning with Confidence, the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR06), pp. 806-813, June 2006 (acceptance rate: 28.1%).

     Yan Tong, Wenhui Liao, and Qiang Ji, 揑nferring Facial Action Units with Causal Relations, the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR06), pp. 1623-1630, June 2006 (acceptance rate: 28.1%).

     Wenhui Liao, Weihong Zhang, Zhiwei Zhu, and Qiang Ji, 揂 Decision Theoretic Model for Stress Recognition and User Assistance, the Twentieth National Conference on Artificial Intelligence (AAAI-05), pp. 529-534, July 2005 (acceptance rate: 18.4%).

     Wenhui Liao, Weihong Zhang, Zhiwei Zhu, and Qiang Ji, 揂 Real-time Human Stress Monitoring System Using Dynamic Bayesian Networks, IEEE Workshop on Vision for Human Computer Interaction, in Conjunction with CVPR, vol.3, pp.70-77, June 2005.

     Markus Guhe, Wenhui Liao, Zhiwei Zhu, Qiang Ji, and Wayne Gray, 揘on-intrusive Measurement of Workload in Real-time, Human Factors and Ergonomics Society 49th Annual Meeting, 2005.

     Wenhui Liao, Weihong Zhang, and Qiang Ji, "A Factor Tree Inference Algorithm for Bayesian Networks and its Application", Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 652-656, Boca Raton, FL, 2004.

     Shu-Ching Chen, Mei-Ling Shyu, Wenhui Liao, and Chengcui Zhang, "Scene Change Detection by Audio and Video Clues", Proceedings of the IEEE International Conference on Multimedia and Expo (ICME2002), vol.2, pp.365-368, August 2002, Switzerland.