衡量时间序列分类方法在区分传播方面的最佳性 (Benchmarking optimality of time series classification methods in distinguishing diffusions)

Performance benchmarking is a crucial component of time series classification (TSC) algorithm design, and a fast-growing number of datasets have been established for empirical benchmarking. However, the empirical benchmarks are costly and do not guarantee statistical optimality. This study proposes to benchmark the optimality of TSC algorithms in distinguishing diffusion processes by the likelihood ratio test (LRT). The LRT is optimal in the sense of the Neyman-Pearson lemma: it has the smallest false positive rate among classifiers with a controlled level of false negative rate. The LRT requires the likelihood ratio of the time series to be computable. The diffusion processes from stochastic differential equations provide such time series and are flexible in design for generating linear or nonlinear time series. We demonstrate the benchmarking with three scalable state-of-the-art TSC algorithms: random forest, ResNet, and ROCKET. Test results show that they can achieve LRT optimality for univariate time series and multivariate Gaussian processes. However, these model-agnostic algorithms are suboptimal in classifying nonlinear multivariate time series from high-dimensional stochastic interacting particle systems. Additionally, the LRT benchmark provides tools to analyze the dependence of classification accuracy on the time length, dimension, temporal sampling frequency, and randomness of the time series. Thus, the LRT with diffusion processes can systematically and efficiently benchmark the optimality of TSC algorithms and may guide their future improvements.

翻译：性能基准是时间序列分类(TSC)算法设计的关键组成部分,而且为实证基准设定了快速增长的数据集数量,然而,实证基准成本昂贵,不能保证统计的最佳性。本研究报告建议用概率比量测试(LRT)来区分传播过程,以衡量TSC算法的优化性。LRT在Neyman-Pearson Lemma意义上是最佳的:在具有受控制的假负率水平的分类者中,LRT具有最小的假正率。LRT要求对时间序列的概率进行可比较化。Stocharical差异方程式的传播过程提供了这种时间序列,在设计生成线性或非线性时间序列时具有灵活性。我们用三种可缩放的TSC算法最优化性标准算法来显示基准:随机森林、ResNet和RocketET。测试结果表明,它们能够实现LRT对单向时间序列和多变数值测算法进程的最佳性。然而,这些模型算法的算法是用于将非线性梯级的Slovelyal-imal Asimimalalalalalalal imalalallialalalalalalalalalal lialalalalal 和Lestal lialalalalalal lixalal 提供它们在对准的系统上,它们从对准的精确级的精确度的精确度的精确性、对准性、对准性、对准性、对等级数级数级数级数级数级数级数级数级数级码、对数级数级数、对准性级数、对数、对数级数、对数级数级数级数级数的精确制制制制制制制制的精确性、制制的精确制的精确制的精确制制制制制制、制、制、制的精确制的精确制、制、制的精确制、制的精确基数。

相关内容

TSC

关注 0

服务范围涵盖服务创新研发的所有计算和软件科学技术方面。IEEE服务计算事务强调算法、数学、统计和计算方法，这些方法是服务计算的核心，是面向服务的体系结构、Web服务、业务流程集成、解决方案性能管理、服务操作和管理的新兴领域。官网地址：http://dblp.uni-trier.de/db/journals/tsc/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

专知会员服务

39+阅读 · 2020年11月3日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf