The causal revolution has spurred interest in understanding complex relationships in various fields. Most existing methods aim to discover causal relationships among all variables in a large-scale complex graph. However, in practice, only a small number of variables in the graph are relevant for the outcomes of interest. As a result, causal estimation with the full causal graph -- especially given limited data -- could lead to many falsely discovered, spurious variables that may be highly correlated with but have no causal impact on the target outcome. In this paper, we propose to learn a class of necessary and sufficient causal graphs (NSCG) that only contains causally relevant variables for an outcome of interest, which we term causal features. The key idea is to utilize probabilities of causation to systematically evaluate the importance of features in the causal graph, allowing us to identify a subgraph that is relevant to the outcome of interest. To learn NSCG from data, we develop a score-based necessary and sufficient causal structural learning (NSCSL) algorithm, by establishing theoretical relationships between probabilities of causation and causal effects of features. Across empirical studies of simulated and real data, we show that the proposed NSCSL algorithm outperforms existing algorithms and can reveal important yeast genes for target heritable traits of interest.
翻译:因果关系革命激发了人们了解各个领域复杂关系的兴趣。多数现有方法的目的是在大型复杂图表中发现所有变量之间的因果关系。然而,在实践中,只有图表中为数不多的变量才与感兴趣的结果相关。因此,与完整的因果关系图表(特别是有限的数据)一起的因果关系估算可能导致许多虚假发现、虚假的变量,这些变量可能与目标结果高度相关,但没有因果关系影响。在本文件中,我们提议学习一组必要和充分的因果关系图表(NSCG),这些图表仅包含因果相关变量,我们称之为因果特征。关键的想法是利用因果关系概率系统评估因果图中特征的重要性,使我们能够确定与利益结果相关的子参数。从数据中学习NSCG,我们通过建立因果概率和因果效应之间的理论关系,我们通过模拟和真实数据实验性研究,我们展示了拟议的NSCSL变异算法中的重要目标,并显示其现有利益。