用于异常检测的概率强势自动校正器 (Probabilistic Robust Autoencoders for Anomaly Detection)

Anomalies (or outliers) are prevalent in real-world empirical observations and potentially mask important underlying structures. Accurate identification of anomalous samples is crucial for the success of downstream data analysis tasks. To automatically identify anomalies, we propose Probabilistic Robust AutoEncoder (PRAE). PRAE is designed to simultaneously remove outliers and identify a low-dimensional representation for the inlier samples. We first present the Robust AutoEncoder (RAE) intractable objective as a minimization problem for splitting the data to inlier samples from which a low dimensional representation is learned via an AutoEncoder (AE), and anomalous (outlier) samples that are excluded as they do not fit the low dimensional representation. RAE minimizes the autoencoder's reconstruction error while incorporating as many samples as possible. This could be formulated via regularization by subtracting from the reconstruction term an $\ell_0$ norm counting the number of selected samples. Unfortunately, this leads to an intractable combinatorial problem. Therefore, we propose two probabilistic relaxations of RAE, which are differentiable and alleviate the need for a combinatorial search. We prove that the solution to the PRAE problem is equivalent to the solution of RAE. We use synthetic data to show that PRAE can accurately remove outliers in a wide range of contamination frequencies. Finally, we demonstrate that using PRAE for anomaly detection leads to state-of-the-art results on various benchmark datasets.

翻译：异常点( 或异常点) 在现实世界实证观测中十分普遍, 并有可能掩盖重要的基本结构。精确地识别异常点样本对于下游数据分析任务的成功至关重要。为了自动识别异常点, 我们建议进行概率性强自动自动编码器( PRAE ) 。 PRAE 旨在同时移除异常点, 并找出隐性样本的低维代表面。我们首先将“ 机器人自动编码器( RAE) ” 的棘手目标作为一个最小化问题, 将数据分解为离谱的异常点, 通过Auto Encoder (AE) 和异常( exter) 样本( ) 来学习低维度代表面的样本对于下游数据分析任务的成功率至关重要。 RAE 将自动编码器的重建错误最小化, 同时尽可能多的样本。可以通过从重建期中减去 $\ ell_ 0 标准计算所选样本的数量。不幸的是, 这会导致一个棘手的分类问题。因此, 我们提议两次精确地对 RAE 的检测结果进行精确的解析变换,, 以显示我们所要用的方法显示的精确度。

相关内容

自编码器

关注 140

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日