Kernel two-sample tests have been widely used for multivariate data in testing equal distribution. However, existing tests based on mapping distributions into a reproducing kernel Hilbert space are mainly targeted at specific alternatives and do not work well for some scenarios when the dimension of the data is moderate to high due to the curse of dimensionality. We propose a new test statistic that makes use of a common pattern under moderate and high dimensions and achieves substantial power improvements over existing kernel two-sample tests for a wide range of alternatives. We also propose alternative testing procedures that maintain high power with low computational cost, offering easy off-the-shelf tools for large datasets. The new approaches are compared to other state-of-the-art tests under various settings and show good performance. The new approaches are illustrated on two applications: The comparison of musks and non-musks using the shape of molecules, and the comparison of taxi trips started from John F.Kennedy airport in consecutive months. All proposed methods are implemented in an R package kerTests.
翻译:在测试平等分布时,对多变量数据广泛使用两色内核的两色试验;然而,现有基于测绘分布的测试主要针对特定替代品,对于由于维度的诅咒而使数据尺寸中度至高度的某些假设情况来说,这些测试效果不佳;我们提出了一个新的测试统计数据,在中度和高度下使用共同模式,对现有的两色内核的多种替代品进行大幅改进;我们还提出了以低计算成本保持高功率的替代测试程序,为大型数据集提供了简易的现成工具;新办法与各种环境中其他最先进的测试相比较,表现良好;新办法在两种应用上作了说明:用分子形状比较肌肉和非肌肉,以及连续几个月从John F.Kennedy机场开始对出租车旅行进行比较。所有拟议方法都在一套“红外试验”中实施。