Proximity Forest 2.0: 一种新的高效可扩展的基于相似度的时间序列分类器 (Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series)

Time series classification (TSC) is a challenging task due to the diversity of types of feature that may be relevant for different classification tasks, including trends, variance, frequency, magnitude, and various patterns. To address this challenge, several alternative classes of approach have been developed, including similarity-based, features and intervals, shapelets, dictionary, kernel, neural network, and hybrid approaches. While kernel, neural network, and hybrid approaches perform well overall, some specialized approaches are better suited for specific tasks. In this paper, we propose a new similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), which outperforms previous state-of-the-art similarity-based classifiers across the UCR benchmark and outperforms state-of-the-art kernel, neural network, and hybrid methods on specific datasets in the benchmark that are best addressed by similarity-base methods. PF 2.0 incorporates three recent advances in time series similarity measures -- (1) computationally efficient early abandoning and pruning to speedup elastic similarity computations; (2) a new elastic similarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost function tuning. It rationalizes the set of similarity measures employed, reducing the eight base measures of the original PF to three and using the first derivative transform with all similarity measures, rather than a limited subset. We have implemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PF framework more efficient.

翻译：时间序列分类（TSC）是一项具有挑战性的任务，由于不同分类任务可能涉及的特征类型的多样性，包括趋势、方差、频率、幅度和各种模式。为了解决这个挑战，已经开发出了几种替代方法，包括基于相似度、特征和间隔、形状、字典、内核、神经网络和混合方法。虽然内核、神经网络和混合方法的整体表现良好，但某些专业方法更适合特定任务。在本文中，我们提出了一种新的基于相似度的分类器—— Proximity Forest 2.0（PF 2.0），它在UCR基准测试中优于以前最先进的基于相似度的分类器，并且在基准测试中针对最适合基于相似度方法的特定数据集，优于最先进的内核、神经网络和混合方法。PF 2.0将时间序列相似性度量的三个最新进展结合起来——（1）计算效率高的早期放弃和修剪，以加速弹性相似性计算；（2）一种新的弹性相似性度量——Amerced Dynamic Time Warping（ADTW）；（3）成本函数调整。它合理化采用的相似性度量集，将原始PF的八个基本度量减少到三个，并对所有相似性度量使用第一导数变换，而不是有限的子集。我们已经在一个单一的C++框架中实现了PF 1.0和PF 2.0，使PF框架更高效。

相关内容

相似性度量

关注 0

相似性度量，即综合评定两个事物之间相近程度的一种度量。两个事物越接近，它们的相似性度量也就越大，而两个事物越疏远，它们的相似性度量也就越小。相似性度量的给法种类繁多，一般根据实际问题进行选用。常用的相似性度是有：相关系数(衡量变量之间接近程度)，相似系数(衡量样品之间接近程度)，若样品给出的是定性数据，这时衡量样品之间接近程度，可用样本的匹配系数、一致度等。

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

专知会员服务

11+阅读 · 2022年10月20日

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日

机器学习组合优化

专知会员服务

110+阅读 · 2021年2月16日

【Google】具有秩-1因子的高效可扩展贝叶斯神经网络，Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors

专知会员服务

14+阅读 · 2020年5月19日