In this article, we study the problem of high-dimensional conditional independence testing, a key building block in statistics and machine learning. We propose an inferential procedure based on double generative adversarial networks (GANs). Specifically, we first introduce a double GANs framework to learn two generators of the conditional distributions. We then integrate the two generators to construct a test statistic, which takes the form of the maximum of generalized covariance measures of multiple transformation functions. We also employ data-splitting and cross-fitting to minimize the conditions on the generators to achieve the desired asymptotic properties, and employ multiplier bootstrap to obtain the corresponding $p$-value. We show that the constructed test statistic is doubly robust, and the resulting test both controls type-I error and has the power approaching one asymptotically. Also notably, we establish those theoretical guarantees under much weaker and practically more feasible conditions compared to the existing tests, and our proposal gives a concrete example of how to utilize some state-of-the-art deep learning tools, such as GANs, to help address a classical but challenging statistical problem. We demonstrate the efficacy of our test through both simulations and an application to an anti-cancer drug dataset. A Python implementation of the proposed procedure is available at https://github.com/tianlinxu312/dgcit.
翻译:在文章中,我们研究了高维有条件独立测试的问题,这是统计和机器学习的关键基石。我们建议采用基于双重基因对抗网络(GANs)的推论程序。具体地说,我们首先采用双重GANs框架来学习两个有条件分布的生成器。我们然后将这两个生成器合并,以构建测试统计,其形式为多种变换功能的普遍共变措施的最大程度。我们还采用数据分割和交叉配置,以尽量减少发电机的条件,从而达到理想的无药可救特性,并使用倍增式靴子来获取相应的美元价值。我们显示,所构建的测试统计数据加倍坚固,由此得出的测试是控制类型I的错误,其能量接近于一个无药可治的。此外,我们还在与现有测试相比更弱、更实际可行的条件下建立了这些理论保障。我们的提案提供了一个具体的例子,说明如何利用某些最先进的深层次的学习工具,如GANs, 帮助解决一个古典但具有挑战性的统计问题。我们通过模拟/com程序展示了Acan测试我们现有的数据的有效性。