We extend the use of Classification Without Labels for anomaly detection with a hypothesis test designed to exclude the background-only hypothesis. By testing for statistical independence of the two discriminating dataset regions, we are able to exclude the background-only hypothesis without relying on fixed anomaly score cuts or extrapolations of background estimates between regions. The method relies on the assumption of conditional independence of anomaly score features and dataset regions, which can be ensured using existing decorrelation techniques. As a benchmark example, we consider the LHC Olympics dataset where we show that mutual information represents a suitable test for statistical independence and our method exhibits excellent and robust performance at different signal fractions even in presence of realistic feature correlations.
翻译:我们扩大使用“无标签分类”系统来探测异常点,并用一种假设测试来排除只有背景的假设。通过测试两个有区别的数据集区域在统计上的独立性,我们可以排除只有背景的假设,而不必依赖固定的异常得分削减或区域间背景估计的外推法。该方法基于假设异常得分特征和数据集区域有条件的独立性,而利用现有的装饰技术可以确保这些特征和数据集的有条件独立性。作为一个基准例子,我们认为,LHC奥林匹克数据集表明,相互信息是对统计独立性的适当测试,我们的方法显示,即使在存在现实的特征相关性的情况下,在不同信号分数上表现良好和稳健。</s>