Graph-based tests are a class of non-parametric two-sample tests useful for analyzing high-dimensional data. The framework offers both flexibility and power in a wide-range of testing scenarios. The test statistics are constructed from similarity graphs (such as $K$-nearest neighbor graphs) and consequently, their performance is sensitive to the structure of the graph. When the graph has problematic structures, as is common for high-dimensional data, this can result in poor or unstable performance among existing graph-based tests. We address this challenge and develop graph-based test statistics that are robust to problematic structures of the graph. The limiting null distribution of the robust test statistics is derived. We illustrate the new tests via simulation studies and a real-world application on Chicago taxi trip-data.
翻译:暂无翻译