In this paper, we study statistical inference on the similarity/distance between two time-series under uncertain environment by considering a statistical hypothesis test on the distance obtained from Dynamic Time Warping (DTW) algorithm. The sampling distribution of the DTW distance is too complicated to derive because it is obtained based on the solution of a complicated algorithm. To circumvent this difficulty, we propose to employ a conditional sampling distribution for the inference, which enables us to derive an exact (non-asymptotic) inference method on the DTW distance. Besides, we also develop a novel computational method to compute the conditional sampling distribution. To our knowledge, this is the first method that can provide valid $p$-value to quantify the statistical significance of the DTW distance, which is helpful for high-stake decision making. We evaluate the performance of the proposed inference method on both synthetic and real-world datasets.
翻译:在本文中,我们研究关于不确定环境中两个时间序列之间的相似/距离的统计推论,方法是考虑对动态时间扭曲算法的距离进行统计假设测试。DTW距离的抽样分布过于复杂,无法得出,因为它是根据复杂的算法的解决方案获得的。为避免这一困难,我们提议对推论采用有条件的抽样分布,从而使我们能够得出对DTW距离的精确(非被动)推论方法。此外,我们还开发了一种新的计算方法来计算有条件的抽样分布。据我们所知,这是能够提供有效的美元价值以量化DTW距离的统计重要性的第一种方法,有助于高决策。我们评估了拟议的合成和真实世界数据集的推论方法的性能。