Symbolic representations of time series have proven to be effective for time series classification, with many recent approaches including SAX-VSM, BOSS, WEASEL, and MrSEQL. The key idea is to transform numerical time series to symbolic representations in the time or frequency domain, i.e., sequences of symbols, and then extract features from these sequences. While achieving high accuracy, existing symbolic classifiers are computationally expensive. In this paper we present MrSQM, a new time series classifier which uses multiple symbolic representations and efficient sequence mining, to extract important time series features. We study four feature selection approaches on symbolic sequences, ranging from fully supervised, to unsupervised and hybrids. We propose a new approach for optimal supervised symbolic feature selection in all-subsequence space, by adapting a Chi-squared bound developed for discriminative pattern mining, to time series. Our extensive experiments on 112 datasets of the UEA/UCR benchmark demonstrate that MrSQM can quickly extract useful features and learn accurate classifiers with the classic logistic regression algorithm. Interestingly, we find that a very simple and fast feature selection strategy can be highly effective as compared with more sophisticated and expensive methods. MrSQM advances the state-of-the-art for symbolic time series classifiers and it is an effective method to achieve high accuracy, with fast runtime.
翻译:时间序列的象征表示方式已证明对时间序列的分类有效,许多最近的方法包括SAX-VSM、BOSS、WESEL和MSEQL。关键的想法是将数字时间序列转换为时间或频率域的象征性表示形式,即符号序列,然后从这些序列中提取特征。虽然实现了高精度,但现有的象征性分类方法在计算上非常昂贵。在本文中,我们介绍了一个新的时间序列分类方法MSQM,它使用多种象征性表示和有效序列采矿,以提取重要的时间序列特征。我们研究了从完全监督、不受监督到混合的象征序列的四种特征选择方法。我们提出了一个新的方法,即通过调整为歧视性模式采矿开发的奇方形界限到时间序列,将数字序列中受监督的象征特征选择方式优化到所有后继空间的象征性表示方式。我们在112个UEA/UCR基准数据集上进行的广泛实验表明,SQM能够迅速提取有用的特征,并学习精确的分类方法,以及典型的逻辑回归算法。有趣的是,我们发现,一个非常简单、快速和快速的特征选择战略可以非常有效的方法,可以实现。