Time series classification (TSC) aims to predict the class label of a given time series, which is critical to a rich set of application areas such as economics and medicine. State-of-the-art TSC methods have mostly focused on classification accuracy and efficiency, without considering the interpretability of their classifications, which is an important property required by modern applications such as appliance modeling and legislation such as the European General Data Protection Regulation. To address this gap, we propose a novel TSC method - the Randomized-Supervised Time Series Forest (r-STSF). r-STSF is highly efficient, achieves state-of-the-art classification accuracy and enables interpretability. r-STSF takes an efficient interval-based approach to classify time series according to aggregate values of discriminatory sub-series (intervals). To achieve state-of-the-art accuracy, r-STSF builds an ensemble of randomized trees using the discriminatory sub-series. It uses four time series representations, nine aggregation functions and a supervised binary-inspired search combined with a feature ranking metric to identify highly discriminatory sub-series. The discriminatory sub-series enable interpretable classifications. Experiments on extensive datasets show that r-STSF achieves state-of-the-art accuracy while being orders of magnitude faster than most existing TSC methods. It is the only classifier from the state-of-the-art group that enables interpretability. Our findings also highlight that r-STSF is the best TSC method when classifying complex time series datasets.
翻译:时间序列分类(TSC)旨在预测一个特定时间序列的等级标签,这对于经济学和医学等一系列丰富的应用领域至关重要。 最新TSC方法主要侧重于分类准确性和效率,而没有考虑分类的可解释性,而分类的可解释性是现代应用,如欧洲通用数据保护条例等设备模型和立法所要求的重要属性。为了弥补这一差距,我们建议采用新型TSC方法----随机超导时间序列森林(r-STSF),r-STSF非常高效,达到最新的最新分类准确性,并能够进行解释。r-STSF采用高效的间歇性方法,根据歧视性子系列(间距)的总值对时间序列进行分类。为了实现最新设计准确性,r-STSF用歧视性的子系列描述、9个集成功能和受监督的二进制搜索,并配有识别高度歧视性的子序列的特征排序指标。r-STSF采用最具歧视性的亚序列分类方法,而目前可解释的亚序数据序列则是现有可解释性数据等级。