Bayesian 最佳高分层最佳两样测试 (Bayesian Optimal Two-sample Tests in High-dimension)

We propose optimal Bayesian two-sample tests for testing equality of high-dimensional mean vectors and covariance matrices between two populations. In many applications including genomics and medical imaging, it is natural to assume that only a few entries of two mean vectors or covariance matrices are different. Many existing tests that rely on aggregating the difference between empirical means or covariance matrices are not optimal or yield low power under such setups. Motivated by this, we develop Bayesian two-sample tests employing a divide-and-conquer idea, which is powerful especially when the difference between two populations is sparse but large. The proposed two-sample tests manifest closed forms of Bayes factors and allow scalable computations even in high-dimensions. We prove that the proposed tests are consistent under relatively mild conditions compared to existing tests in the literature. Furthermore, the testable regions from the proposed tests turn out to be optimal in terms of rates. Simulation studies show clear advantages of the proposed tests over other state-of-the-art methods in various scenarios. Our tests are also applied to the analysis of the gene expression data of two cancer data sets.

翻译：我们建议采用最佳的巴耶斯双抽样测试来测试两种人群之间高维中值矢量和共变矩阵的平等性。在包括基因组和医学成像在内的许多应用中,自然地假设只有两个中度矢量或共变矩阵的少数几个条目是不同的。许多依靠将实验手段或共变矩阵之间的差异相加在一起的现有测试并不是最佳的,或者在这种设置下产生低功率。受此驱动,我们开发了巴耶斯双抽样测试,采用分裂和共变模型的想法,这种想法特别在两种人群之间差异很大的情况下是强大的。拟议的两样测试显示海湾因素的封闭形式,并允许进行可缩放的计算,即使是在高位数矩阵中。我们证明拟议的测试在相对比较温和的条件下与文献中的现有测试相一致。此外,拟议测试的可测试区域在比率方面是最佳的。模拟研究显示,所提议的测试明显优于不同情景中的其他状态方法。我们的测试还用于分析两种癌症的基因表现数据。