Kernel two-sample tests have been widely used, and the development of efficient methods for high-dimensional, large-scale data is receiving increasing attention in the big data era. However, existing methods, such as the maximum mean discrepancy (MMD) and recently proposed kernel-based tests for large-scale data, are computationally intensive and/or ineffective for some common alternatives in high-dimensional data. In this paper, we propose a new test that exhibits high power across a wide range of alternatives. Furthermore, the new test is more robust to high dimensions than existing methods and does not require optimization procedures for choosing kernel bandwidth and other parameters through data splitting. Numerical studies demonstrate that the new approach performs well on both synthetic and real-world data.
翻译:暂无翻译