关于异常、新颖、开放和分配外探测:解决办法和今后挑战的统一调查 (A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges)

Machine learning models often encounter samples that are diverged from the training distribution. Failure to recognize an out-of-distribution (OOD) sample, and consequently assign that sample to an in-class label significantly compromises the reliability of a model. The problem has gained significant attention due to its importance for safety deploying models in open-world settings. Detecting OOD samples is challenging due to the intractability of modeling all possible unknown distributions. To date, several research domains tackle the problem of detecting unfamiliar samples, including anomaly detection, novelty detection, one-class learning, open set recognition, and out-of-distribution detection. Despite having similar and shared concepts, out-of-distribution, open-set, and anomaly detection have been investigated independently. Accordingly, these research avenues have not cross-pollinated, creating research barriers. While some surveys intend to provide an overview of these approaches, they seem to only focus on a specific domain without examining the relationship between different domains. This survey aims to provide a cross-domain and comprehensive review of numerous eminent works in respective areas while identifying their commonalities. Researchers can benefit from the overview of research advances in different fields and develop future methodology synergistically. Furthermore, to the best of our knowledge, while there are surveys in anomaly detection or one-class learning, there is no comprehensive or up-to-date survey on out-of-distribution detection, which our survey covers extensively. Finally, having a unified cross-domain perspective, we discuss and shed light on future lines of research, intending to bring these fields closer together.

翻译：机器学习模型往往遇到不同于培训分布的样本。不承认分配外(OOOD)样本,因此将样本划为类内标签,会大大损害模型的可靠性。这一问题由于安全在开放世界环境中部署模型的重要性而引起极大关注。检测OOOD样本具有挑战性,因为所有可能的未知分布模式的建模不易。迄今,若干研究领域解决了探测不熟悉样本的问题,包括异常检测、新发现、一等学习、公开确认和分配外探测。尽管存在类似和共享的概念,但分配外、公开设定和异常检测却大大削弱了模型的可靠性。因此,这些研究渠道没有交叉污染,造成了研究障碍。尽管有些调查打算概述这些方法,但似乎仅仅侧重于一个特定领域,而没有研究不同领域之间的关系。本次调查的目的是对不同领域的众多知名作品进行交叉和全面审查,同时查明它们的共性。研究人员可以从不同领域的深入研究进展概览中获益于不同领域进行更深入的研究进展的概览,在进行更深入的实地调查或未来的方法上进行更深入的考察,而我们则从一个领域进行更深入的实地的考察,最终的考察。

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。