Causal Discovery

Introduction

Since the beginning of modern science, revealing causal relations has been a fundamental and widely-studied topic in many disciplines. The underlying causal relations of a distribution play an increasingly important role in various machine learning applications, including out-of-distribution (OOD) generalization, domain adaptation, and transfer learning. The intrinsic, stable, and interpretable causal relations are robust to exogenous factors, invariant under distribution shifts, and provide natural solutions to explainability and fairness. Causality is a powerful tool for solving the limitations, such as OOD generalization, explainability, and fairness in modern AI.

The traditional method of revealing the causal relations is through a series of randomized hypothesis experiments. However, it is expensive, time-consuming, or sometimes impossible to conduct. More research focuses on learning causal relations from purely observational data, usually in the format of a directed acyclic graph (DAG) over a set of random variables. Many well-established methods have been proposed to learn the causal DAG from observational data. Conventional methods either identify the statistical independence between variables or search for a causal graph associated with the optimal score in the DAG space. The functional causal model-based method learns the DAG by encoding the statistical and causal dependencies via structural equation models (SEM). Zheng et al., 2018 introduce a continuous acyclicity constraint and convert the original discrete causal discovery problem into a constrained continuous problem, that can be solved by a series of continuous optimization techniques.

However, in practice, existing local and global causal discovery methods may encounter their limitations. First, the causal discovery/DAG learning problems are NP-hard, hence usually have high time complexity and fail to scale up to large models with thousands of variables. Secondly, the SOTA methods are developed assuming the causal sufficiency is satisfied, i.e., all the variables of interest are observed. Moreover, the performance of state-of-the-art methods usually is compromised on limited and noisy data. Our recent works aim to tackle those limitations and develop efficient, robust, advanced causal discovery methods.

Recent Works

Efficient Causal Discovery Methods and Its Applications

There are two general approaches for addressing the efficiency and scalability issues for causal discovery problem. For one direction, the proposed methods first learn the local structure for each variable, then combine the local structures to construct the global structure. For the other direction, the proposed methods aim to construct the posterior distribution given data. We focus on improving the efficiency via reducing the time-consuming steps of constraint-based methods and continuous optimization-based methods. For constraint-based methods, we propose to develop the efficient Markov Blanket (MB) discovery and local causal structure discovery methods by reducing the number of independence tests to perform. For continuous optimization-based methods, we propose a method that does not require the explicit enforcement of the time-consuming acyclicity constraint. Empirical results have show that the proposed methods are effective in improving the efficiency while remaining comparable accuracy.

Markov Blanket Discovery

This research focuses on developing novel efficient algorithms to learn local structures in the graphical models, in terms of Markov Blankets. These local structures provide an efficient approach for Bayesian network structure learning/causal discovery. The novelties of this work lie in 1) discovering a new coexistence property between the possible false positive parent-child sets and the spouses in a Markov Blanket. 2) develop the novel efficient constraint-based and score-based algorithms that improve the SOTA methods. Please refer to the published papers for further details.

Local Causal Strcuture Discovery

This research proposes a novel algorithm to efficiently learn the direct causes and effects of a variable. Compared to the Markov Blanket discovery, the proposed method further identifies the causal identities within the Markov Blanket via additional tests. It drastically improves the efficiency of the algorithm from SOTA methods. The novelties of this work lie in 1) discovering that independence relation variations among neighbors of a variable can assist the detection of its direct causes and effects. 2) proposing a novel approach for local causal structure discovery for identifying the direct causes and effects of a variable. Please refer to the published paper for further details.

Feature Selection with Efficient Local Causal Discovery

This work applies the Markov Blanket algorithms for feature selection tasks. By considering structured information among variables, the proposed method demonstrates several optimalities for feature selection. The novelties of this work lie in 1) proving several optimalities of structured feature selection using Markov Blanket. 2) developing algorithms for large-scale and hierarchical feature selection. Please refer to the published paper for further details.

An Efficient Differentiable Global Causal Discovery Method

This research addresses the time-consuming computation and low-efficiency issues raised by solving the constrained continuous optimization in functional causal model-based methods. It proposes a novel formulation to model and learn the weighted adjacency matrices of the causal DAG directly. This work theoretically proves that the set of causal DAGs is equivalent to the set of weighted gradients of graph potential functions, and hence one may perform causal DAG learning by searching the equivalent space of DAGs without explicitly posing the acyclicity constraint. Hence a two-step procedure is proposed to first obtain the cyclic solution through a few iterations of soft-constrained optimization. The acyclic graph can then be obtained by projecting the cyclic solution to the equivalent space. The novelties of this work lie in 1) theoretically demonstrating that the weighted adjacency matrix of a DAG can be represented as the product of a skew-symmetric matrix and the gradient of a potential function on graph variables. 2) proposing the novel formulation and algorithm for learning the causal DAG without explicit constraint. 3) achieving substantial improvement in efficiency with comparable accuracy compared to SOTA functional causal model-based causal discovery methods. Please refer to the published paper for further details.

Causal Discovery under Latent variables

The existence of latent, unobserved variables may induce the learning of spurious causal dependencies among variables. In the real-world applicatiosn, it is hard to guarantee that all the variables of interest are accounted for and eligible for collecting data. We propose to perform causal discovery under latent variables with a EM framework.

Latent Markov Blanket Discovery

This research addresses the existence of latent variables for local causal discovery. This work proposes a method to identify local latent variables and to determine their structural relations with the observed variables. The local latent variable discovery is formulated as discovering the Markov Blanket (MB) of a target variable. A constrained structure expectation-maximization algorithm is employed to greedily learn the MB with latent variables. The novelties of this work lie in 1) developing an algorithm to discover the latent variables locally to one target variable. 2) evaluating the performance of the proposed algorithm on synthetic data and demonstrating its effectiveness in identifying the correct latent variables. 3) applying the proposed algorithm to feature discovery and selection problems. 4) showing how the learned latent variables can help improve the classification accuracy in benchmark datasets. Please refer to the published paper for further details.

Causal Discovery under Challenging Data

Despite the age of big data, there are still domains in which the availability of data is limited. The data can be severely insufficient due to the high cost or lack of occurrence. For the available data, different levels of noise may be embedded due to the variation in the locations, time, or even the means of the collecting process. Hence, developing causal discovery methods which are robust under those extreme situations is crucial. We focus on two directions: 1) We address the insufficient data issue by improving the independence test reliability via employing Bayesian methods. 2) We relax and generalize the SEM assumption on noise variance and develop a formulation on the general SEM.

Robust Constraint-based Global Causal Discovery under Insufficient Data

This research improves global causal discovery under insufficient data by enhancing constraint-based methods with Bayesian-augmented frequentist independence tests. It applies these tests to the PC algorithm and evaluates on datasets with limited samples, showing significant improvements in accuracy and efficiency over state-of-the-art methods. Key contributions include introducing a Bayesian approach to estimate mutual information, performing Bayesian estimation of hypothesis likelihood, and proposing a robust statistical testing-based independence test. Please refer to the published paper for further details.

Effective and Identifiable Causal Discovery Heteroscedastic Additive Noise

This research addresses causal discovery under heteroscedastic data, where noise variances vary across variables and observations. Current causal discovery methods often assume homoscedastic noise in SEMs, which is unrealistic for real-world data. We propose a SEM formulation with heteroscedastic additive noise to model this variation explicitly. Our contributions include introducing relaxed identifiability conditions for a class of multivariate SEMs and proposing a novel DAG learning formulation that considers noise variance variability. We also present a two-phase DAG learning algorithm that effectively estimates noise variances and the DAG structure. Empirical results show that our method achieves comparable accuracy on synthetic homoscedastic data and significantly outperforms state-of-the-art methods on synthetic heteroscedastic and real data. Please refer to the published paper for further details.

Publications

Naiyu Yin, Yue Yu, Tian Gao and Qiang Ji. Effective and Identifiable Causal Discovery under Heteroscedastic Noise Model. Association for the Advancement of Artificial Intelligence (AAAI), 2024. [PDF]
Zijun Cui, Naiyu Yin, Yuru Wang, Qiang Ji. Empirical Bayesian Approaches for Robust Constraint-based Causal Discovery under Insuffcient Data. International Joint Conference on Artificial Intelligence (IJCAI), 2022. [PDF]
Yu Yue, Tian Gao, Naiyu Yin, Qiang Ji. DAGs with No Curl: An Efficient DAG Structure Learning Approach. International Conference on Machine Learning (ICML), 2021. [PDF]
Tian Gao, Qiang Ji. Efficient Markov Blanket Discovery and Its Applications. IEEE Transactions on Cybernetics, 2016. [PDF]
Tian Gao, Qiang Ji. Efficient Score-Based Markov Blanket Discovery. Int. J. Approx. Reason., 2016. Submitted. [PDF]
Tian Gao, Qiang Ji. Constrained Local Latent Variable Discovery. International Joint Conference on Artificial Intelligence (IJCAI), 2016. [PDF]
Tian Gao, Qiang Ji. Structured Feature Selection. International Conference on Computer Vision (ICCV), 2015. [PDF]
Tian Gao, Qiang Ji. Local Causal Discovery for Direct Causes and Effects. Neural Information Processing Systems (NIPS), 2015. [PDF]