The variety of complex algorithmic approaches for tackling time-series classification problems has grown considerably over the past decades, including the development of sophisticated but challenging-to-interpret deep-learning-based methods. But without comparison to simpler methods it can be difficult to determine when such complexity is required to obtain strong performance on a given problem. Here we evaluate the performance of an extremely simple classification approach -- a linear classifier in the space of two simple features that ignore the sequential ordering of the data: the mean and standard deviation of time-series values. Across a large repository of 128 univariate time-series classification problems, this simple distributional moment-based approach outperformed chance on 69 problems, and reached 100% accuracy on two problems. With a neuroimaging time-series case study, we find that a simple linear model based on the mean and standard deviation performs better at classifying individuals with schizophrenia than a model that additionally includes features of the time-series dynamics. Comparing the performance of simple distributional features of a time series provides important context for interpreting the performance of complex time-series classification models, which may not always be required to obtain high accuracy.
翻译:过去几十年中,处理时序分类问题的复杂算法方法变得越来越多,包括了复杂但难以解释的基于深度学习的方法。然而,如果没有将这些算法与更简单的方法进行比较,我们很难确定这种复杂性是否必要才能在某个问题上获得强大的性能。在本研究中,我们评估了一种极其简单的分类方法的表现——在两个忽略数据顺序的简单特征的空间中使用线性分类器:时序值的均值和标准差。在一个包含128个单变量时序分类问题的大型代码库中,这种简单的分布矩特征方法在69个问题上优于随机分类,并在两个问题上达到了100%的准确率。通过神经影像时序病例研究,我们发现一个以均值和标准差为基础的简单线性模型比一个还包含动态时序特征的模型更好地分类了带有精神分裂症的个体。比较时序的简单分布特征的性能为解释复杂的时序分类模型的性能提供了重要的上下文环境,并不总是需要复杂模型才能获得高准确率。