We consider a data-driven robust hypothesis test where the optimal test will minimize the worst-case performance regarding distributions that are close to the empirical distributions with respect to the Wasserstein distance. This leads to a new non-parametric hypothesis testing framework based on distributionally robust optimization, which is more robust when there are limited samples for one or both hypotheses. Such a scenario often arises from applications such as health care, online change-point detection, and anomaly detection. We study the computational and statistical properties of the proposed test by presenting a tractable convex reformulation of the original infinite-dimensional variational problem exploiting Wasserstein's properties and characterizing the radii selection for the uncertainty sets. We also demonstrate the good performance of our method on synthetic and real data.
翻译:我们认为,一种数据驱动的稳健假设测试,在这种测试中,最佳测试将最大限度地减少与瓦瑟斯坦距离方面经验分布相近的分布最差的性能,从而导致一个新的非参数假设测试框架,以分布强的优化为基础,当对一种或两种假设都有有限的样本时,这一假设框架就更加健全,这种假设往往来自保健、在线变更点检测和异常检测等应用。我们通过对利用瓦瑟斯坦特性的原始无限变量问题进行可移植的重新拟订,并将射线选择描述为不确定组合,从而研究拟议测试的计算和统计特性。我们还展示了我们合成数据和实际数据方法的良好性能。