我们是否确定它是异常的？ (Are we certain it's anomalous?)

The progress in modelling time series and, more generally, sequences of structured data has recently revamped research in anomaly detection. The task stands for identifying abnormal behaviors in financial series, IT systems, aerospace measurements, and the medical domain, where anomaly detection may aid in isolating cases of depression and attend the elderly. Anomaly detection in time series is a complex task since anomalies are rare due to highly non-linear temporal correlations and since the definition of anomalous is sometimes subjective. Here we propose the novel use of Hyperbolic uncertainty for Anomaly Detection (HypAD). HypAD learns self-supervisedly to reconstruct the input signal. We adopt best practices from the state-of-the-art to encode the sequence by an LSTM, jointly learned with a decoder to reconstruct the signal, with the aid of GAN critics. Uncertainty is estimated end-to-end by means of a hyperbolic neural network. By using uncertainty, HypAD may assess whether it is certain about the input signal but it fails to reconstruct it because this is anomalous; or whether the reconstruction error does not necessarily imply anomaly, as the model is uncertain, e.g. a complex but regular input signal. The novel key idea is that a \emph{detectable anomaly} is one where the model is certain but it predicts wrongly. HypAD outperforms the current state-of-the-art for univariate anomaly detection on established benchmarks based on data from NASA, Yahoo, Numenta, Amazon, and Twitter. It also yields state-of-the-art performance on a multivariate dataset of anomaly activities in elderly home residences, and it outperforms the baseline on SWaT. Overall, HypAD yields the lowest false alarms at the best performance rate, thanks to successfully identifying detectable anomalies.

翻译：最近，在建模时间序列和更一般的结构化数据序列方面取得的进展重新振兴了异常检测领域的研究。这项任务旨在识别金融序列、IT系统、航空测量和医学领域等的异常行为，在这些领域，异常检测有助于隔离抑郁症和关注老年人的案例。时间序列中的异常检测是一项复杂的任务，因为异常很少出现，原因在于高度非线性的时间相关性，以及由于异常的定义有时具有主观性。在这里，我们提出了超几何不确定性在异常检测中的新应用（HypAD）。HypAD以自我监督的方式学习重建输入信号。我们采用最先进的最佳实践将序列编码为LSTM，与编码器一起学习通过GAN批评者重构信号。通过超几何神经网络来端到端地估计不确定性。通过使用不确定性，HypAD可以评估是否对输入信号有把握，但它无法重构它，因为这是异常的；或者是模型不确定性，例如一个复杂但正常的输入信号，重构误差不一定意味着异常。新的关键思想是 \emph{可检测的异常} 是模型有把握，但预测错误的异常。在来自NASA、Yahoo、Numenta、亚马逊和Twitter的数据的已建立基准测试中，HypAD优于目前的单变量异常检测的最新技术。它还展示了在老年人住宅异常活动的多变量数据集上最先进的性能，并在SWaT上优于基线。总体而言，由于成功识别可检测的异常，HypAD以最佳性能率提供了最低的虚假警报。

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

【干货书】工程和科学中的概率和统计，

专知会员服务

58+阅读 · 2022年12月24日

我们真的需要深度学习模型来预测时间序列吗? Do We Really Need Deep Learning Models for Time Series Forecasting?

专知会员服务

37+阅读 · 2022年3月13日

【CVPR 2022】单黑箱和多黑箱预测的领域适应，DINE: Domain Adaptation from Single and Multiple Black-box Predictors

专知会员服务

14+阅读 · 2022年3月12日

生成式对抗网络异常检测，GANs for Anomaly Detection

专知会员服务

34+阅读 · 2021年9月16日