This paper considers the use of recently proposed optimal transport-based multivariate test statistics, namely rank energy and its variant the soft rank energy derived from entropically regularized optimal transport, for the unsupervised nonparametric change point detection (CPD) problem. We show that the soft rank energy enjoys both fast rates of statistical convergence and robust continuity properties which lead to strong performance on real datasets. Our theoretical analyses remove the need for resampling and out-of-sample extensions previously required to obtain such rates. In contrast the rank energy suffers from the curse of dimensionality in statistical estimation and moreover can signal a change point from arbitrarily small perturbations, which leads to a high rate of false alarms in CPD. Additionally, under mild regularity conditions, we quantify the discrepancy between soft rank energy and rank energy in terms of the regularization parameter. Finally, we show our approach performs favorably in numerical experiments compared to several other optimal transport-based methods as well as maximum mean discrepancy.
翻译:本文考虑使用最近提出的基于运输的最佳多变量测试统计数据,即等级能源及其变式,即来自全人类正常化最佳运输的软等级能源,用于无监督的非参数变化点检测问题。我们表明,软等级能源既具有快速的统计趋同率,又具有稳健的连续性特性,从而导致在真实数据集上取得强劲的性能。我们的理论分析排除了以前为获得这种比率而需要重新采样和外延的扩大。相比之下,等级能源受到统计估计中维度的诅咒,而且可以表明任意的小规模扰动的改变点,导致CPD出现高度的虚假警报。此外,在温度常态条件下,我们用正规参数来量化软等级能源与等级能源之间的差异。最后,我们展示了我们的方法在数字实验方面优于其他基于运输的最佳方法,以及最大平均差异。