Anomaly detection (AD) is a crucial task in machine learning with various applications, such as detecting emerging diseases, identifying financial frauds, and detecting fake news. However, obtaining complete, accurate, and precise labels for AD tasks can be expensive and challenging due to the cost and difficulties in data annotation. To address this issue, researchers have developed AD methods that can work with incomplete, inexact, and inaccurate supervision, collectively summarized as weakly supervised anomaly detection (WSAD) methods. In this study, we present the first comprehensive survey of WSAD methods by categorizing them into the above three weak supervision settings across four data modalities (i.e., tabular, graph, time-series, and image/video data). For each setting, we provide formal definitions, key algorithms, and potential future directions. To support future research, we conduct experiments on a selected setting and release the source code, along with a collection of WSAD methods and data.
翻译:异常探测(AD)是利用各种应用进行机器学习的关键任务,这些应用包括发现新出现疾病、识别金融欺诈和检测假消息。然而,由于数据注释的成本和困难,为AD任务获得完整、准确和准确的标签可能费用高昂且具有挑战性。为解决这一问题,研究人员开发了自动检测方法,这些方法可以与不完整、不准确和不准确的监督相结合,统称为监管不力的异常检测方法。在本研究中,我们通过将上述三种数据模式(如表格、图表、时间序列和图像/视频数据)的监管环境分类,对WSAD方法进行了第一次全面调查。我们为每一种环境提供了正式定义、关键算法和潜在的未来方向。为了支持未来的研究,我们进行了关于选定设置和发布源代码的实验,同时收集了WAD方法和数据。