Saliency methods are used extensively to highlight the importance of input features in model predictions. These methods are mostly used in vision and language tasks, and their applications to time series data is relatively unexplored. In this paper, we set out to extensively compare the performance of various saliency-based interpretability methods across diverse neural architectures, including Recurrent Neural Network, Temporal Convolutional Networks, and Transformers in a new benchmark of synthetic time series data. We propose and report multiple metrics to empirically evaluate the performance of saliency methods for detecting feature importance over time using both precision (i.e., whether identified features contain meaningful signals) and recall (i.e., the number of features with signal identified as important). Through several experiments, we show that (i) in general, network architectures and saliency methods fail to reliably and accurately identify feature importance over time in time series data, (ii) this failure is mainly due to the conflation of time and feature domains, and (iii) the quality of saliency maps can be improved substantially by using our proposed two-step temporal saliency rescaling (TSR) approach that first calculates the importance of each time step before calculating the importance of each feature at a time step.
翻译:这些方法大多用于愿景和语言任务,对时间序列数据的应用相对来说是没有探索的。在本文中,我们准备广泛比较不同神经神经结构,包括经常性神经网络、时空变动网络和变异器在合成时间序列数据新基准中的各种显著解释方法的性能。我们提议并报告多项衡量标准,以便用精确度(即所查明的特征是否包含有意义的信号)和回顾(即信号被确认为重要的特征的数目)来对发现特征重要性的显著方法的性能进行经验性评价。我们通过若干试验,我们表明:(一) 一般来说,网络结构和突出方法无法可靠和准确地确定时间序列数据中一段时间里的特点重要性,(二) 这一失败主要是由于时间和特征区域之间的混杂,以及(三) 利用我们提议的两步跨的显著特征在计算每一步骤之前计算重要程度的方法,可以大大改进突出特征地图的质量。