There have recently been significant advances in the accuracy of algorithms proposed for time series classification (TSC). However, a commonly asked question by real world practitioners and data scientists less familiar with the research topic, is whether the complexity of the algorithms considered state of the art is really necessary. Many times the first approach suggested is a simple pipeline of summary statistics or other time series feature extraction approaches such as TSFresh, which in itself is a sensible question; in publications on TSC algorithms generalised for multiple problem types, we rarely see these approaches considered or compared against. We experiment with basic feature extractors using vector based classifiers shown to be effective with continuous attributes in current state-of-the-art time series classifiers. We test these approaches on the UCR time series dataset archive, looking to see if TSC literature has overlooked the effectiveness of these approaches. We find that a pipeline of TSFresh followed by a rotation forest classifier, which we name FreshPRINCE, performs best. It is not state of the art, but it is significantly more accurate than nearest neighbour with dynamic time warping, and represents a reasonable benchmark for future comparison.
翻译:最近,在为时间序列分类提议的算法的准确性方面取得了显著进展。然而,现实世界的实践者和不熟悉研究专题的数据科学家普遍提出的一个问题,是算法的复杂性是否真正必要。建议的第一种方法是简单的简要统计管道或其他时间序列的抽取方法,如TSFresh,这本身就是一个合理的问题;在为多种问题类型所概括的关于TSC算法的出版物中,我们很少看到这些方法得到考虑或比较。我们用基于矢量的分类器试验基本特征提取器,这些分类器在目前最先进的时间序列分类器中显示具有连续特性的有效性。我们在UCR时间序列数据集档案上测试这些方法,看看TCS文献是否忽略了这些方法的有效性。我们发现,由轮换式森林分类器(我们称之为FreshPRINCE)的管道是最好的。它不是艺术的状态,但是它比动态时间扭曲的近邻要精确得多,是未来比较的合理基准。