This paper deals with the problem of learning the probabilities of causation of subpopulations given finite population data. The tight bounds of three basic probabilities of causation, the probability of necessity and sufficiency (PNS), the probability of sufficiency (PS), and the probability of necessity (PN), were derived by Tian and Pearl. However, obtaining the bounds for each subpopulation requires experimental and observational distributions of each subpopulation, which is usually impractical to estimate given finite population data. We propose a machine learning model that helps to learn the bounds of the probabilities of causation for subpopulations given finite population data. We further show by a simulated study that the machine learning model is able to learn the bounds of PNS for 32768 subpopulations with only knowing roughly 500 of them from the finite population data.
翻译:本文论述根据有限的人口数据了解亚人口因果关系的概率的问题; 由天和珍珠得出的三个基本因果关系、必要性和充足性概率、充足性概率、必要性和必要性概率(PN)的紧紧界限; 然而,获得每个亚人口的界限要求每个亚人口的实验和观察分布,而根据有限的人口数据估计这些分类通常不切实际; 我们提出一个机器学习模型,帮助了解受有限人口数据影响的亚人口因果关系的界限; 我们通过模拟研究进一步表明,机器学习模型能够了解32768个亚人口的PNS界限,仅从有限的人口数据中了解大约500个亚人口。