Data series classification is an important and challenging problem in data science. Explaining the classification decisions by finding the discriminant parts of the input that led the algorithm to some decisions is a real need in many applications. Convolutional neural networks perform well for the data series classification task; though, the explanations provided by this type of algorithm are poor for the specific case of multivariate data series. Addressing this important limitation is a significant challenge. In this paper, we propose a novel method that solves this problem by highlighting both the temporal and dimensional discriminant information. Our contribution is two-fold: we first describe a convolutional architecture that enables the comparison of dimensions; then, we propose a method that returns dCAM, a Dimension-wise Class Activation Map specifically designed for multivariate time series (and CNN-based models). Experiments with several synthetic and real datasets demonstrate that dCAM is not only more accurate than previous approaches, but the only viable solution for discriminant feature discovery and classification explanation in multivariate time series. This paper has appeared in SIGMOD'22.
翻译:数据序列的分类是数据科学中的一个重要和具有挑战性的问题。 通过找到导致某些决定的算法输入的不同部分来解释分类决定是许多应用中真正需要的。 进化神经网络在数据序列分类任务方面表现良好; 但是,这种算法所提供的解释对于多变量数据序列的具体案例来说是差的。 解决这一重要的局限性是一个重大挑战。 在本文件中,我们提出了一个新颖的方法,通过突出时间和维度差异信息来解决这个问题。 我们的贡献有两个方面: 我们首先描述一个能够比较维度的共变结构; 然后, 我们提出一种返回 dCAM的方法, 即专门为多变量时间序列(和CNN模式)设计的多维维度分类激活地图。 与几个合成和真实数据集的实验表明, dCAM 不仅比以前的方法更准确,而且对于多变量时间序列中不同特征发现和分类解释的唯一可行解决办法。 这份文件出现在 SIGMOD 22 中。