Time series data are often corrupted by outliers or other kinds of anomalies. Identifying the anomalous points can be a goal on its own (anomaly detection), or a means to improving performance of other time series tasks (e.g. forecasting). Recent deep-learning-based approaches to anomaly detection and forecasting commonly assume that the proportion of anomalies in the training data is small enough to ignore, and treat the unlabeled data as coming from the nominal data distribution. We present a simple yet effective technique for augmenting existing time series models so that they explicitly account for anomalies in the training data. By augmenting the training data with a latent anomaly indicator variable whose distribution is inferred while training the underlying model using Monte Carlo EM, our method simultaneously infers anomalous points while improving model performance on nominal data. We demonstrate the effectiveness of the approach by combining it with a simple feed-forward forecasting model. We investigate how anomalies in the train set affect the training of forecasting models, which are commonly used for time series anomaly detection, and show that our method improves the training of the model.
翻译:时间序列数据往往被外部线或其他种类的异常现象所腐蚀。 识别异常点本身可能是一个目标(异常检测),或是一个改进其他时间序列任务(例如预测)绩效的手段。 最近对异常点探测和预测的深层学习方法通常假定,培训数据中的异常点比例小到足以忽略,并将未贴标签的数据视为来自名义数据分布。 我们提出了一个简单而有效的技术,用于扩大现有时间序列模型,以便明确说明培训数据中的异常点。 通过增加潜在异常指标变量的培训数据,该变量的分布被推断出来,同时用蒙特卡洛EM培训基本模型,我们的方法同时推断异常点,同时改进名义数据的模型性能。我们通过将其与简单的饲料前向预测模型结合起来,来证明这一方法的有效性。 我们调查了火车组中的异常点如何影响预报模型的培训,这些模型通常用于时间序列异常现象的检测,并表明我们的方法改善了模型的培训。