In the fields of statistics and unsupervised machine learning a fundamental and well-studied problem is anomaly detection. Although anomalies are difficult to define, many algorithms have been proposed. Underlying the approaches is the nebulous understanding that anomalies are rare, unusual or inconsistent with the majority of data. The present work gives a philosophical approach to clearly define anomalies and to develop an algorithm for their efficient detection with minimal user intervention. Inspired by the Gestalt School of Psychology and the Helmholtz principle of human perception, the idea is to assume anomalies are observations that are unexpected to occur with respect to certain groupings made by the majority of the data. Thus, under appropriate random variable modelling anomalies are directly found in a set of data under a uniform and independent random assumption of the distribution of constituent elements of the observations; anomalies correspond to those observations where the expectation of occurrence of the elements in a given view is $<1$. Starting from fundamental principles of human perception an unsupervised anomaly detection algorithm is developed that is simple, real-time and parameter-free. Experiments suggest it as the prime choice for univariate data and it shows promising performance on the detection of global anomalies in multivariate data.
翻译:在统计领域和未经监督的机器学习一个基本和研究周密的问题,是发现异常现象。虽然反常现象难以确定,但提出了许多算法。方法的基础是对异常现象是罕见的、不寻常的或与大多数数据不相符的模糊理解。目前的工作提供了一种哲学方法,以便明确界定异常现象,并开发一种算法,以便以最低程度的用户干预有效检测这些异常现象。在Gestalt心理学学院和Helmholtz人类感知原则的启发下,设想的不正常现象是大多数数据所制作的某些组别出现出乎意料的异常现象。因此,在适当随机的模型异常现象直接出现在一套数据中,在统一和独立的随机假设观测组成要素分布的情况下;异常现象与那些认为特定观点中出现异常因素的预期值为<1美元的意见相对应。从人类感知的基本原则出发,开发了一个不可靠的、实时和无参数的异常现象检测算法。实验表明,它是非易变数据的主要选择,并显示全球异常数据的可变性多变性。