Accuracy is a key focus of current work in time series classification. However, speed and data reduction in many applications is equally important, especially when the data scale and storage requirements increase rapidly. Current MTSC algorithms need hundreds of compute hours to complete training and prediction. This is due to the nature of multivariate time series data, which grows with the number of time series, their length and the number of channels. In many applications, not all the channels are useful for the classification task; hence we require methods that can efficiently select useful channels and thus save computational resources. We propose and evaluate two methods for channel selection. Our techniques work by representing each class by a prototype time series and performing channel selection based on the prototype distance between classes. The main hypothesis is that useful channels enable better separation between classes; hence, channels with the higher distance between class prototypes are more useful. On the UEA Multivariate Time Series Classification (MTSC) benchmark, we show that these techniques achieve significant data reduction and classifier speedup for similar levels of classification accuracy. Channel selection is applied as a pre-processing step before training state-of-the-art MTSC algorithms and saves about 70\% of computation time and data storage, with preserved accuracy. Furthermore, our methods enable even efficient classifiers, such as ROCKET, to achieve better accuracy than using no channel selection or forward channel selection. To further study the impact of our techniques, we present experiments on classifying synthetic multivariate time series datasets with more than 100 channels, as well as a real-world case study on a dataset with 50 channels. Our channel selection methods lead to significant data reduction with preserved or improved accuracy.
翻译:准确性是当前时间序列分类工作的关键焦点。然而,在许多应用程序中,速度和数据减少是同样重要的,特别是在数据规模和储存要求迅速增加的情况下。当前的MDC算法需要数百个计算小时来完成培训和预测。这是因为多变时间序列数据的性质,随着时间序列的数量、时间长度和频道数量而增加。在许多应用程序中,并非所有渠道都对分类任务有用;因此,我们需要有效选择有用渠道的方法,从而节省计算资源。我们提议和评价两种频道选择方法。我们的技术工作是通过原型时间序列代表每个类别,并根据原型班级之间的距离进行频道选择。主要假设是有用的渠道可以更好地区分班级;因此,班级原型之间距离较高的频道更为有用。在UEA多变时间序列分类(MTSC)基准中,我们显示这些技术在类似分类精度的精确度方面实现了显著的数据减少和分解速度加快。 频道选择是作为预处理步骤,在培训最先进的时间序列中代表每个类类类组进行技术选择。 使用比我们的数据精确性数据序列的精度更精确性计算方法, 将数据转换为我们的数据精度更精确性数据序列进行第70次的升级的计算。