Attention-based models have been widely used in many areas, such as computer vision and natural language processing. However, relevant applications in time series classification (TSC) have not been explored deeply yet, causing a significant number of TSC algorithms still suffer from general problems of attention mechanism, like quadratic complexity. In this paper, we promote the efficiency and performance of the attention mechanism by proposing our flexible multi-head linear attention (FMLA), which enhances locality awareness by layer-wise interactions with deformable convolutional blocks and online knowledge distillation. What's more, we propose a simple but effective mask mechanism that helps reduce the noise influence in time series and decrease the redundancy of the proposed FMLA by masking some positions of each given series proportionally. To stabilize this mechanism, samples are forwarded through the model with random mask layers several times and their outputs are aggregated to teach the same model with regular mask layers. We conduct extensive experiments on 85 UCR2018 datasets to compare our algorithm with 11 well-known ones and the results show that our algorithm has comparable performance in terms of top-1 accuracy. We also compare our model with three Transformer-based models with respect to the floating-point operations per second and number of parameters and find that our algorithm achieves significantly better efficiency with lower complexity.
翻译:在计算机视觉和自然语言处理等许多领域广泛使用关注模式,但在时间序列分类(TSC)中的相关应用尚未深入探讨,导致大量TSC算法仍受到一般关注机制(如二次复杂)的一般问题的影响。在本文件中,我们通过提出灵活的多头线性关注(FMLA)来提高关注机制的效率和性能,这通过与可变变相的连流块和在线知识蒸馏的分层互动,提高了对地点的认识。此外,我们提议了一个简单而有效的掩码机制,通过按比例掩盖每个给定序列的某些位置,帮助减少对时间序列的噪音影响和减少拟议FMLA的冗余。为了稳定这一机制,我们通过随机遮罩层的模型传送样本,多次将其产出汇总,以便用普通遮罩层教授同样的模型。我们对85 UCRR2018数据集进行了广泛的实验,将我们的算法与11个众所周知的组合和在线知识蒸馏。结果显示,我们的算法在头一序列中具有可比性性功能。我们还将我们的模型与3个变换式和更复杂度模型与每组测算的更低的模型进行比较。