Two-sample hypothesis testing for high-dimensional data is ubiquitous nowadays. Rank-based tests are popular nonparametric methods for univariate data. However, they are difficult to be extended to high-dimensional data. In this paper, we propose a new non-parametric two-sample testing procedure, Rank In Similarity graph Edge-count two-sample test (RISE). The new test statistic is constructed on a rank-weighted similarity graph, such as the $k$-nearest neighbor graph. As a result, RISE can also be applied to non-Euclidean data. Theoretically, we prove that, under some mild conditions, the new test statistic converges to the $\chi_2^2$ distribution under the permutation null distribution, enabling a fast type-I error control. RISE exhibits good power under a wide range of alternatives compared to existing methods, as shown in extensive simulations. The new test is illustrated on the New York City taxi data for comparing travel patterns in consecutive months and a brain network dataset in comparing male and female subjects.
翻译:目前,对高维数据进行两个模样的假设测试是无处不在的。 级基测试对单维数据来说是流行的非参数性方法。 但是,它们很难推广到高维数据。 在本文中,我们提出了一个新的非参数性双层测试程序。 在类似图图图中, 边对两层抽样测试(RISE) 。 新的测试统计数据建于一个重量级的相近性图上, 如最接近的邻居图。 因此, RISE 也可以适用于非欧洲裔数据。 从理论上讲,我们证明在一些温和的条件下,新的测试统计数据与在全称分布下分布的$\chi_2 ⁇ 2美元分布相交汇,使得能够快速类型I错误控制。 RISE 展示了与广泛模拟所显示的现有方法相比,各种替代方法下的良好能力。 新的测试在纽约市出租车数据上进行,用于比较连续几个月的旅行模式和在比较男女主题时的大脑网络数据集。