Anomaly detection on time series data is increasingly common across various industrial domains that monitor metrics in order to prevent potential accidents and economic losses. However, a scarcity of labeled data and ambiguous definitions of anomalies can complicate these efforts. Recent unsupervised machine learning methods have made remarkable progress in tackling this problem using either single-timestamp predictions or time series reconstructions. While traditionally considered separately, these methods are not mutually exclusive and can offer complementary perspectives on anomaly detection. This paper first highlights the successes and limitations of prediction-based and reconstruction-based methods with visualized time series signals and anomaly scores. We then propose AER (Auto-encoder with Regression), a joint model that combines a vanilla auto-encoder and an LSTM regressor to incorporate the successes and address the limitations of each method. Our model can produce bi-directional predictions while simultaneously reconstructing the original time series by optimizing a joint objective function. Furthermore, we propose several ways of combining the prediction and reconstruction errors through a series of ablation studies. Finally, we compare the performance of the AER architecture against two prediction-based methods and three reconstruction-based methods on 12 well-known univariate time series datasets from NASA, Yahoo, Numenta, and UCR. The results show that AER has the highest averaged F1 score across all datasets (a 23.5% improvement compared to ARIMA) while retaining a runtime similar to its vanilla auto-encoder and regressor components. Our model is available in Orion, an open-source benchmarking tool for time series anomaly detection.
翻译:对时间序列数据的异常探测在监测指标以防止潜在事故和经济损失的不同工业领域日益常见。然而,缺乏标签数据以及异常点的模糊定义可能会使这些努力复杂化。最近未经监督的机器学习方法在利用单一时间戳预测或时间序列重建来解决这一问题方面取得了显著的进展。虽然传统上分开审议,但这些方法并不相互排斥,能够提供异常点探测的互补观点。本文件首先着重介绍了预测和重建方法的成功和局限性,并附有可视化的时间序列信号和异常分数。我们然后提议AER(自动计算与回归相结合),这是一个将香草自动编码和LSTM回溯式相结合的联合模型,以纳入成功之处并解决每种方法的局限性。我们的模型可以产生双向预测,同时通过优化联合目标功能来重建原始时间序列。此外,我们提出了几种模式,通过一系列可视化的时间序列信号序列信号将预测和重建的错误与可视化时间序列(自动编码与两种基于预测的自动编码的自动编码编码) 和亚马尼亚平均数据序列(亚历12年) 对比,一个可辨的系统数据序列比,一个可辨测算算出最高时间序列,一个最新数据序列,一个数据序列和亚马萨里亚平均结果。