The conditional randomization test (CRT) was recently proposed to test whether two random variables X and Y are conditionally independent given random variables Z. The CRT assumes that the conditional distribution of X given Z is known under the null hypothesis and then it is compared to the distribution of the observed samples of the original data. The aim of this paper is to develop a novel alternative of CRT by using nearest-neighbor sampling without assuming the exact form of the distribution of X given Z. Specifically, we utilize the computationally efficient 1-nearest-neighbor to approximate the conditional distribution that encodes the null hypothesis. Then, theoretically, we show that the distribution of the generated samples is very close to the true conditional distribution in terms of total variation distance. Furthermore, we take the classifier-based conditional mutual information estimator as our test statistic. The test statistic as an empirical fundamental information theoretic quantity is able to well capture the conditional-dependence feature. We show that our proposed test is computationally very fast, while controlling type I and II errors quite well. Finally, we demonstrate the efficiency of our proposed test in both synthetic and real data analyses.
翻译:近期提出了基于条件随机化检验(CRT)来检验随机变量X和Y在给定随机变量Z条件下是否相互独立。CRT假定在零假设下,X在给定Z的条件分布是已知的,然后与原始数据的观察样本的分布进行比较。本文旨在通过使用最近邻抽样开发一个新的CRT替代方案,而不需要假定X在给定Z的条件分布的确切形式。具体而言,我们使用计算效率高的最近邻抽样来近似编码零假设的条件分布。然后,在理论上,我们证明了在总变异距离方面,所生成样本的分布非常接近真实的条件分布。此外,我们将基于分类器的条件互信息估计器作为检验统计量。检验统计量作为一种经验基础信息论量,能够很好地捕捉条件依赖特征。我们展示了我们提出的检验方法即使在控制类型I和II错误方面也非常快速。最后,我们在合成数据和实际数据分析中展示了我们提出的检验的效率。