Testing the equality in distributions of multiple samples is a common task in many fields. However, this problem for high-dimensional or non-Euclidean data has not been well explored. In this paper, we propose new nonparametric tests based on a similarity graph constructed on the pooled observations from multiple samples, and make use of both within-sample edges and between-sample edges, a straightforward but yet not explored idea. The new tests exhibit substantial power improvements over existing tests for a wide range of alternatives. We also study the asymptotic distributions of the test statistics, offering easy off-the-shelf tools for large datasets. The new tests are illustrated through an analysis of the age image dataset.
翻译:在许多领域,测试多样样本分布的平等性是一项共同的任务。然而,对于高维或非欧几里德数据的问题,还没有很好地探讨。在本文中,我们建议根据根据多样样本综合观察结果得出的类似性图表进行新的非参数测试,同时利用沙面内边缘和沙面间边缘,这是一个直截了当但尚未探索的想法。新的测试显示,现有各种替代品的测试相比现有测试有很大的功率改进。我们还研究了测试统计数据的无药可治性分布,为大型数据集提供了容易的现成工具。新的测试通过年龄图像数据集分析来说明。