In the time series classification domain, shapelets are small time series that are discriminative for a certain class. It has been shown that classifiers are able to achieve state-of-the-art results on a plethora of datasets by taking as input distances from the input time series to different discriminative shapelets. Additionally, these shapelets can easily be visualized and thus possess an interpretable characteristic, making them very appealing in critical domains, such as the health care domain, where longitudinal data is ubiquitous. In this study, a new paradigm for shapelet discovery is proposed, which is based upon evolutionary computation. The advantages of the proposed approach are that (i) it is gradient-free, which could allow to escape from local optima more easily and to find suited candidates more easily and supports non-differentiable objectives, (ii) no brute-force search is required, which drastically reduces the computational complexity by several orders of magnitude, (iii) the total amount of shapelets and length of each of these shapelets are evolved jointly with the shapelets themselves, alleviating the need to specify this beforehand, (iv) entire sets are evaluated at once as opposed to single shapelets, which results in smaller final sets with less similar shapelets that result in similar predictive performances, and (v) discovered shapelets do not need to be a subsequence of the input time series. We present the results of experiments which validate the enumerated advantages.
翻译:在时间序列分类域中,形状是小时间序列,对某一类具有歧视性。已经表明,分类者能够通过从输入时间序列到不同的有区别形状作为输入距离,从输入时间序列到不同的有区别形状,在大量数据集上实现最先进的结果。此外,这些形状可以很容易地视觉化,因而具有可解释的特征,因此在保健领域,纵向数据无处不在,因此在关键领域,如保健领域,这些纵向数据具有很强的吸引力。在本研究中,根据进化计算,提出了显示形状发现的新模式。拟议方法的优点是:(一) 无梯度,可以更容易地从本地的选取时间序列中逃脱,找到合适的候选人,并更容易地支持非区别的目标;(二) 不需要粗力搜索,从而大大降低计算的复杂性,在几个数量级(三) 这些形状数据的总量和长度与形状本身一起演进,从而减轻了事先指定这一特性的需要,(四) 整个系列的优点是无梯度,这样可以更容易地从本地的选取,找到合适的人选,支持不区分的目标;(二)不需要粗体搜索,而要大幅度地将结果在一种形状上,我们所发现的顺序上,在所发现的顺序上加以评估。