Identifying whether a given sample is an outlier or not is an important issue in various real-world domains. This study aims to solve the unsupervised outlier detection problem where training data contain outliers, but any label information about inliers and outliers is not given. We propose a powerful and efficient learning framework to identify outliers in a training data set using deep neural networks. We start with a new observation called the inlier-memorization (IM) effect. When we train a deep generative model with data contaminated with outliers, the model first memorizes inliers before outliers. Exploiting this finding, we develop a new method called the outlier detection via the IM effect (ODIM). The ODIM only requires a few updates; thus, it is computationally efficient, tens of times faster than other deep-learning-based algorithms. Also, the ODIM filters out outliers successfully, regardless of the types of data, such as tabular, image, and sequential. We empirically demonstrate the superiority and efficiency of the ODIM by analyzing 20 data sets.
翻译:确定特定样本是否为外部数据,是真实世界不同领域的一个重要问题。 本研究旨在解决培训数据包含外部线的未经监督外部线探测问题,但并未给出任何关于内离子和外部线的标签信息。 我们提出了一个强大而高效的学习框架,以便利用深神经网络在培训数据集中识别外部线。 我们首先用名为“内离子模拟效应(IM)”的新观测开始。 当我们用外离子污染的数据来训练一个深层的基因化模型时, 模型在外离子之前先是第一个回忆录。 探索这一发现时, 我们开发了一种叫做通过IM效应(ODIM)进行外部探测的新方法。 ODIM只需要进行几次更新; 因此, 它在计算上效率很高, 比其他基于深学习的算法快数十倍。 另外, ODIM过滤器成功地将外部线切出, 不论数据类型如何, 如表格、 图像 和 顺序 。 我们通过分析 20 数据集, 实证地展示ODIM 的优势和效率 。