We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis. By testing for statistical independence of the two discriminating dataset regions, we are able exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions. The method relies on the assumption of conditional independence of anomaly score features and dataset regions, which can be ensured using existing decorrelation techniques. As a benchmark example, we consider the LHC Olympics dataset where we show that mutual information represents a suitable test for statistical independence and our method exhibits excellent and robust performance at different signal fractions even in presence of realistic feature correlations.
翻译:我们扩大使用“无标签分类”系统来探测异常点,并进行假设测试,目的是排除仅以背景为依据的假设。通过测试两个有区别的数据集区域在统计上的独立性,我们可以排除仅以背景为基础的假设,而不必依赖固定的异常点分数削减或区域间背景估计外推法。这种方法依赖于假设异常点得分特征和数据集区域有条件的独立性,这些特征和数据集可以利用现有的装饰技术加以确保。作为一个基准例子,我们认为,LHC奥林匹克数据集表明,相互信息是对统计独立性的适当测试,我们的方法显示,即使在存在现实特征相关性的情况下,在不同信号分数上表现良好和稳健。