Probabilistic Reasoning using Graphical Models

1. Bayesian network parameter learning

Bayesian networks have gained increasing attention in recent years. One key issue in Bayesian networks (BNs) is parameter learning. When training data is incomplete or sparse or when multiple hidden nodes exist, learning parameters in Bayesian networks (BNs) becomes extremely difficult. Under these circumstances, the learning algorithms are required to operate in a high-dimensional search space and they could easily get trapped among copious local maxima. We present a learning algorithm to incorporate domain knowledge into the learning to regularize the otherwise ill-posed problem, to limit the search space, and to avoid local optima. Unlike the conventional approaches that typically exploit the quantitative domain knowledge such as prior probability distribution, our method systematically incorporates qualitative constraints on some of the parameters into the learning process. Specifically, the problem is formulated as a constrained optimization problem, where an objective function is defined as a combination of the likelihood function and penalty functions constructed from the qualitative domain knowledge. Then, a gradient-descent procedure is systematically integrated with the E-step and M-step of the EM algorithm, to estimate the parameters iteratively until it converges. The experiments with both synthetic data and real data for facial action recognition show our algorithm improves the accuracy of the learned BN parameters significantly over the conventional EM algorithm. < p>

Publications

Wenhui Liao and Qiang Ji, Learning Bayesian Network Parameters Under Incomplete Data with Qualitative Domain Knowledge, Pattern Recognition, Volume 42, Issue 11, p3046-3056, 2009. [PDF]
Cassio de Campos and Qiang Ji, Improving Bayesian Network Parameter Learning using Constraints, International Conference in Pattern Recognition (ICPR), 2008. [PDF]
Wenhui Liao and Qiang Ji, Exploiting Qualitative Domain Knowledge for Learning Bayesian Network Parameters with Incomplete Data, International Conference in Pattern Recognition (ICPR), 2008. [PDF]
Yan Tong and Qiang Ji, Learning Bayesian Networks with Qualitative Constraints, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008. [PDF]

2. Bayesian network structure learning

1) We address the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-andbound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the benefits of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before

Publications

Cassio de Campos and Qiang Ji, Efficient Structure Learning of Bayesian Networks using Constraints, Journal of Machine Learning Research 12, p663-689, 2011. [PDF] [Software]
Cassio de Campos and Qiang Ji, Efficient Structure Learning of Bayesian Networks using Constraints, Journal of Machine Learning Research 12, p663-689, 2011. [PDF] [Software]
Cassio de Campos and Qiang Ji, Properties of Bayesian Dirichlet scores to learn Bayesian network structures, AAAI, 2010. [PDF]
Cassio de Campos, Zhi Zeng, Qiang Ji, An Improved Structural EM to Learn Dynamic Bayesian Nets, International Conference on Pattern Recognition (ICPR), 2010. [PDF]
Cassio de Campos, Zhi Zeng, and Qiang Ji, Structure Learning of Bayesian Networks using Constraints, International Conference on Machine Learning (ICML), 2009. [PDF]

2) Bayesian Network Structure Learning with Bounded Treewidth

Bounding the tree-width of a Bayesian network can reduce the chance of overfitting, and allows exact inference to be performed efficiently. Several existing algorithms tackle the problem of learning bounded tree-width Bayesian networks by learning from k-trees as super-structures, but they do not scale to large domains and/or large tree-width. We propose a guided search algorithm to find k-trees with maximum Informative scores, which is a measure of quality for the k-tree in yielding good Bayesian networks. The algorithm achieves close to optimal performance compared to exact solutions in small domains, and can discover better networks than existing approximate methods can in large domains. It also provides an optimal elimination order of variables that guarantees small complexity for later runs of exact inference. Comparisons with well-known approaches in terms of learning and inference accuracy illustrate its capabilities.

Publications

Siqi Nie, Cassio P. de Campos, and Qiang Ji, Learning Bayesian Networks with Bounded Tree-width via Guided Search, In AAAI Conference on Artificial Intelligence (AAAI), 2016. Oral Presentation. [PDF] [supp]
Siqi Nie, Denis D. Mauá, Cassio P. de Campos, and Qiang Ji, Advances in Learning Bayesian Networks of Bounded Treewidth, In Advances in Neural Information Processing Systems (NIPS), 2014. Spotlight Presentation. [PDF] [supp] [arXiv]

3) Recently directed acyclic graph (DAG) structure learning is formulated as a constrained continuous optimization problem with continuous acyclicity constraints and was solved iteratively through subproblem optimization. To further improve efficiency, we propose a novel learning framework to model and learn the weighted adjacency matrices in the DAG space directly. Specifically, we first show that the set of weighted adjacency matrices of DAGs are equivalent to the set of weighted gradients of graph potential functions, and one may perform structure learning by searching in this equivalent set of DAGs. To instantiate this idea, we propose a new algorithm, DAG-NoCurl, which solves the optimization problem efficiently with a two-step procedure: 1) first we find an initial cyclic solution to the optimization problem, and 2) then we employ the Hodge decomposition of graphs and learn an acyclic graph by projecting the cyclic graph to the gradient of a potential function. Experimental studies on benchmark datasets demonstrate that our method provides comparable accuracy but better efficiency than baseline DAG structure learning methods on both linear and generalized structural equation models, often by more than one order of magnitude.

Publications

Yue Yu, Tian Gao, Naiyu Yin, and Qiang Ji DAGs with no Curl: An Efficient DAG Structure Learning Approach, International Conference on Machine Learning, 2021,[PDF ]

3. Interval Temporal Bayesian Network

Complex activities typically consist of multiple primitive events happening in parallel or sequentially over a period of time. Understanding such activities requires recognizing not only each individual event but, more importantly, capturing their spatiotemporal dependencies over different time intervals. Most of the current graphical model-based approaches have several limitations. First, time-sliced graphical models such as hidden Markov models (HMMs) and dynamic Bayesian networks are typically based on points of time and they hence can only capture three temporal relations: precedes, follows, and equals. Second, HMMs are probabilistic finitestate machines that grow exponentially as the number of parallel events increases. Third, other approaches such as syntactic and description-based methods, while rich in modeling temporal relationships, do not have the expressive power to capture uncertainties. To address these issues, we introduce the interval temporal Bayesian network (ITBN), a novel graphical model that combines the Bayesian Network with the interval algebra to explicitly model the temporal dependencies over time intervals. Advanced machine learning methods are introduced to learn the ITBN model structure and parameters. Experimental results show that by reasoning with spatiotemporal dependencies, the proposed model leads to a significantly improved performance when modeling and recognizing complex activities involving both parallel and sequential events.

Publications

Ziheng Wang, Shangfei Wang and Qiang Ji. Capturing Complex Spatio-Temporal Relations among Facial Muscles for Facial Expression Recognition, Computer Vision and Pattern Recognition (CVPR), 2013.
Yongmian Zhang, Yifan Zhang, Eran Swears, Natalia Larios, Ziheng Wang, and Qiang Ji. Modeling Temporal Interactions with Interval Temporal Bayesian Networks for Complex Activity Recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 35, No. 10, 2013.

4. Feature Learning using Bayesian Network

Data representation plays a key role in many machine learning tasks. Specific domain knowledge can help design some features, but it often needs a long time to handcraft them. On the other hand, unsupervised learning can automatically learn a good representation of either labeled or unlabeled data. Currently one of the dominant approaches is the restricted Boltzmann machine (RBM). In our work, we investigate an alternative approach for feature learning, which is based on Bayesian linear regression model. This model can also be denoted as Factor analysis, which is a statistical method for modeling the covariance structure of high dimensional data, but has not been used for feature learning. We will compare the proposed framework with RBM on different kinds of computer vision applications. Experiment results on different datasets are reported to demonstrate the effectiveness of the proposed feature learning framework.

Publications

Siqi Nie and Qiang Ji. Feature Learning using Bayesian Linear Regression Model, 22nd International Conference on Pattern Recognition (ICPR), 2014.