A probabilistic expert system emulates the decision-making ability of a human expert through a directional graphical model. The first step in building such systems is to understand data generation mechanism. To this end, one may try to decompose a multivariate distribution into product of several conditionals, and evolving a blackbox machine learning predictive models towards transparent cause-and-effect discovery. Most causal models assume a single homogeneous population, an assumption that may fail to hold in many applications. We show that when the homogeneity assumption is violated, causal models developed based on such assumption can fail to identify the correct causal direction. We propose an adjustment to a commonly used causal direction test statistic by using a $k$-means type clustering algorithm where both the labels and the number of components are estimated from the collected data to adjust the test statistic. Our simulation result show that the proposed adjustment significantly improves the performance of the causal direction test statistic for heterogeneous data. We study large sample behaviour of our proposed test statistic and demonstrate the application of the proposed method using real data.
翻译:概率专家系统模仿人类专家通过方向图形模型的决策能力。 建立这种系统的第一步是理解数据生成机制。 为此,人们可以尝试将多变分布分解成若干条件的产物,并将黑盒机器学习预测模型演变为透明的因果关系发现。 大多数因果模型假设单一同质人群,这种假设在许多应用中可能无法维持。 我们表明,当同质假设被违反时,基于这种假设开发的因果模型可能无法确定正确的因果方向。 我们提议对通用的因果方向测试统计进行调整,使用以美元为单位的因果类型组合算法,根据收集的数据估算标签和组成部分的数量,以调整测试统计。我们的模拟结果表明,拟议的调整将大大改进因果方向测试数据的业绩。我们研究了我们拟议测试统计的大量抽样行为,并用真实数据展示了拟议方法的应用情况。