Unsupervised detection of anomaly points in time series is a challenging problem, which requires the model to derive a distinguishable criterion. Previous methods tackle the problem mainly through learning pointwise representation or pairwise association, however, neither is sufficient to reason about the intricate dynamics. Recently, Transformers have shown great power in unified modeling of pointwise representation and pairwise association, and we find that the self-attention weight distribution of each time point can embody rich association with the whole series. Our key observation is that due to the rarity of anomalies, it is extremely difficult to build nontrivial associations from abnormal points to the whole series, thereby, the anomalies' associations shall mainly concentrate on their adjacent time points. This adjacent-concentration bias implies an association-based criterion inherently distinguishable between normal and abnormal points, which we highlight through the \emph{Association Discrepancy}. Technically, we propose the \emph{Anomaly Transformer} with a new \emph{Anomaly-Attention} mechanism to compute the association discrepancy. A minimax strategy is devised to amplify the normal-abnormal distinguishability of the association discrepancy. The Anomaly Transformer achieves state-of-the-art results on six unsupervised time series anomaly detection benchmarks of three applications: service monitoring, space & earth exploration, and water treatment.
翻译:对时间序列中的异常点的未经监督的检测是一个具有挑战性的问题,要求模型得出一个可辨别的标准。过去的方法主要通过学习点代表制或对称关联来解决这个问题,但是,这两种方法都不足以解释复杂的动态。最近,变异器在统一点代表制和对称关联的模型中表现出巨大的力量。我们发现,每个时间序列的自我注意重量分布可以体现与整个序列的丰富关联。我们的主要观察是,由于异常点的罕见性,从异常点到整个序列建立非三角协会极为困难,因此,异常协会应主要集中于其相邻的时间点。这种相邻的集中偏差意味着基于关联的标准在正常点和异常点之间有着内在的区别,我们通过“IMph{协会差异”来强调这一点。从技术上讲,我们建议每个时间点的自我注意重量分布能够体现与整个序列的丰富关联。我们的主要观察是,由于异常点的罕见性,因此很难从整个序列中建立非三角联系联系,因此,因此,异常的协会应主要集中于其邻近的时间点。这种相邻的集中偏差意味着基于关联性标准的标准标准,我们通过“协会”强调。在正常-异常地测量和变异变异变异地”