迈向可扩展且有效的条件独立性检验：基于谱表示的方法 (Toward Scalable and Valid Conditional Independence Testing with Spectral Representations)

Conditional independence (CI) is central to causal inference, feature selection, and graphical modeling, yet it is untestable in many settings without additional assumptions. Existing CI tests often rely on restrictive structural conditions, limiting their validity on real-world data. Kernel methods using the partial covariance operator offer a more principled approach but suffer from limited adaptivity, slow convergence, and poor scalability. In this work, we explore whether representation learning can help address these limitations. Specifically, we focus on representations derived from the singular value decomposition of the partial covariance operator and use them to construct a simple test statistic, reminiscent of the Hilbert-Schmidt Independence Criterion (HSIC). We also introduce a practical bi-level contrastive algorithm to learn these representations. Our theory links representation learning error to test performance and establishes asymptotic validity and power guarantees. Preliminary experiments suggest that this approach offers a practical and statistically grounded path toward scalable CI testing, bridging kernel-based theory with modern representation learning.

翻译：条件独立性（CI）是因果推断、特征选择和图模型构建的核心概念，但在许多场景下，若无额外假设则无法进行检验。现有的CI检验方法通常依赖于限制性的结构条件，这限制了它们在实际数据上的有效性。基于偏协方差算子的核方法提供了一种更具理论依据的途径，但存在适应性有限、收敛速度慢和可扩展性差的问题。在本研究中，我们探讨表示学习能否帮助解决这些局限性。具体而言，我们关注从偏协方差算子的奇异值分解中导出的表示，并利用它们构建一个简单的检验统计量，其思路类似于希尔伯特-施密特独立性准则（HSIC）。我们还提出了一种实用的双层对比学习算法来学习这些表示。我们的理论将表示学习误差与检验性能联系起来，并建立了渐近有效性和检验功效的保证。初步实验表明，该方法为可扩展的CI检验提供了一条实用且具有统计理论基础的路径，从而将基于核的理论与现代表示学习联系起来。