Knowledge Augmented Machine Learning

1. Knowledge augmented visual learning

Substantial progresses have been made in computer vision recently as a result of the latest algorithmic advances in deep learning. Despite these successes, current data-driven computer vision algorithms are data inefficient, perform poorly on unseen samples, and lack of interpretability. Parallel to data, there often exists prior knowledge that governs the behaviors of the target objects and the underlying mechanism that produces the data. Such knowledge, if utilized properly, can not only enhance visual learning performance but also reduce its dependence on data as well as improve its interpretability. Unfortunately, current data-driven visual learning approaches lack a mechanism to effectively capture and encode the prior knowledge.

To address this challenge, we propose to develop a hybrid computer vision framework that systemically integrates the well-established prior knowledge with the image data to achieve data-efficient, generalizable, and interpretable visual learning. Specifically, we propose to investigate two issues: knowledge identification and knowledge integration. Knowledge identification systemically identifies prior knowledge from two sources: computer vision models and target knowledge. Computer vision models represent the formal theories/principles that control the generation of the image observations of the 3D scenes. Target knowledge represents the theories or studies from different disciplines that govern the behaviors and properties of the target objects we intend to understand. Knowledge integration involves developing different methods to systematically encode the identified knowledge into different stages of deep learning models. Specifically, we introduce three integration schemes: decision level integration, training level integration, and architecture level integration. Through the proposed framework, the prior knowledge and data are effectively integrated and they work synergically to gain vision algorithms that are data efficient, robust, and generalizable across datasets/domains. To demonstrate the proposed framework, we apply it to different computer vision tasks, including 3D human body reconstruction from monocular videos, human action recognition, and 3D shape reconstruction from single images.

Publications

  • Knowledge Augmented Deep Learning for Data Efficient, Generalizable, and Interpretable Visual Understanding, IEEE Computer Society Distinguished Lecturer, Nov. 10, 2022

  • Zijun Cui, Tengfei Song, Yuru Wang, and Qiang Ji, Knowledge Augmented Deep Neural Networks for Joint Facial Expression and Action Unit Recognition, 34th Conference on Neural Information Processing Systems(NeurIPS), 2020.

  • Shangfei Wang, Longfei Hao, and Qiang Ji, Knowledge-augmented multimodal deep regression bayesian networks for emotion video tagging, IEEE Transactions on Multimedia 22.4 (2019): 1084-1097, 2019.

  • Qiang Ji, Combining knowledge with data for efficient and generalizable visual learning, Pattern Recognition Letters 124, 31-38, 2019

  • Ziheng Wang and Qiang Ji, Knowledge augmented visual learning, Handbook of Pattern Recognition and Computer Vision. 2016. 255-274, 2016.

  • Xiaoyang Wang, Qiang Ji, "Incorporating Contextual Knowledge to Dynamic Bayesian Networks for Event Recognition", in Proceedings of the 21st International Conference on Pattern Recognition (ICPR), pp. 3378-3381, 2012, (Oral Presentation). [Piero Zamperoni Best Student Paper Award]

  • Xiaoyang Wang, Qiang Ji, "A Novel Probabilistic Approach Utilizing Clip Attributes as Hidden Knowledge for Event Recognition", in Proceedings of the 21st International Conference on Pattern Recognition (ICPR), pp. 3382-3385, 2012, (Oral Presentation).

  • Ziheng Wang, Yongqiang Li, Shangfei Wang, and Qiang Ji, Capturing Global Semantic Relationships for Facial Action Unit Recognition, International Conference on Computer Vision (ICCV), 2013.

  • Ziheng Wang, Shangfei Wang, and Qiang Ji, Capturing Complex Spatio-Temporal Relations among Facial Muscles for Facial Expression Recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.

2. Knowledge Augmented Deep Learning

Despite the impressive performance that current deep learning models have achieved in various fields, they suffer from several serious deficiencies, including high data dependency, poor generalization, low interpretability, and lack of performance assurance. These deficiencies originate primarily from the data-driven nature of the deep models and their inability to effectively exploit the readily available prior knowledge. In contrast, the traditional symbolic AI models effectively encode different types of prior knowledge, are interpretable, and generalize well. They, however, suffer from knowledge acquisition bottleneck and cannot scale up well. To effectively address these deficiencies, we propose to integrate deep learning, probabilistic graphical models (PGMs), and Bayesian learning to yield a hybrid AI model that is data efficient, generalizable, interpretable, and performance assurable.

Major tasks of the proposed research include knowledge identification, knowledge representation, knowledge encoding, and knowledge integration. Knowledge identification involves identifying the domain knowledge for a specific task. The domain knowledge can be divided into two categories: scientific knowledge and experiential knowledge. Scientific knowledge is prescriptive and mainly refers to well-formulated mathematical theories or physics laws. Experiential knowledge is descriptive and mainly refers to well-known facts from daily life, indicating semantic properties of an entity or semantic relationships among multiple entities. Knowledge can be further divided into target knowledge and measurement knowledge. Target knowledge represents the prior knowledge that governs the behaviors and properties of the target entities that we want to infer. Measurement knowledge captures the underlying data generation mechanism that produces the observations about the target entities.

Knowledge representation involves representing the identified domain knowledge in a well-organized and structured format. The appropriate representation depends on the type of domain knowledge. Scientific knowledge is usually expressed using equations, and experiential knowledge can be represented through probabilistic relationships, logic rules or knowledge graphs. Knowledge integration entails integrating domain knowledge into deep models.

For knowledge encoding, we propose to employ PGMs as the unified symbolic model to simultaneously capture different types of prior knowledge, including both theoretical and experiential prior knowledge about the target variables and the underlying data generation mechanisms. We introduce methods to accurately and compactly encapsulate into a PGM model multiple prior knowledge in different representations, including math equations, semantic relationships, graphs, and logic rules.

For knowledge integration, we propose methods to integrate the PGM model with a deep model in four levels: decision-level, training-level, architecture-level, and data level.

Finally, for performance assurance, we propose to incorporate Bayesian learning into the integrated model to quantify the model prediction uncertainty in terms of both aleartoic and epistemic uncertainties, based on which we can quantify, assure, and improve the model performance under different operating conditions.

In summary, by simultaneously leveraging the deep learning's power in representation learning, PGM's power in knowledge encoding and uncertainty modeling, and Bayesian learning’s power in uncertainty quantification, the proposed hybrid AI model is expected to yield significant improvement in data efficiency, model interpretability, generalization, and performance assurance.

Publications