In the fields of statistics and unsupervised machine learning a fundamental and well-studied problem is anomaly detection. Anomalies are difficult to define, yet many algorithms have been proposed. Underlying the approaches is the nebulous understanding that anomalies are rare, unusual or inconsistent with the majority of data. The present work provides a philosophical treatise to clearly define anomalies and develops an algorithm for their efficient detection with minimal user intervention. Inspired by the Gestalt School of Psychology and the Helmholtz principle of human perception, anomalies are assumed to be observations that are unexpected to occur with respect to certain groupings made by the majority of the data. Under appropriate random variable modelling anomalies are directly found in a set of data by a uniform and independent random assumption of the distribution of constituent elements of the observations, with anomalies corresponding to those observations where the expectation of the number of occurrences of the elements in a given view is $<1$. Starting from fundamental principles of human perception an unsupervised anomaly detection algorithm is developed that is simple, real-time and parameter-free. Experiments suggest it as a competing choice for univariate data with promising results on the detection of global anomalies in multivariate data.
翻译:在统计和未经监督的机器学习领域,一个基本和研究周密的问题是异常现象的发现。异常现象难以定义,但提出了许多算法。方法的基础是对异常现象罕见、异常或与大多数数据不相符的模糊理解。目前的工作提供了一种哲学论解,以明确界定异常现象,并发展一种以最低用户干预有效检测这些异常现象的算法。在Gestalt心理学学院和Helmholtz人类感知原则的启发下,异常现象被假定为对大多数数据所形成的某些组别而言出乎意料的观察。在适当的随机随机建模异常情况下,通过统一和独立的随机假设观测组成要素的分布,直接在一组数据中发现,与观测结果的预测相对应的异常现象为 <1美元。从人类感知基本原则开始,开发了一个简单、实时和无参数的不被人监督的异常检测算法。实验表明,它是一种竞争性的选择,在检测全球数据的多变异性上,具有可喜不测的多变性。