An important class of techniques for resonant anomaly detection in high energy physics builds models that can distinguish between reference and target datasets, where only the latter has appreciable signal. Such techniques, including Classification Without Labels (CWoLa) and Simulation Assisted Likelihood-free Anomaly Detection (SALAD) rely on a single reference dataset. They cannot take advantage of commonly-available multiple datasets and thus cannot fully exploit available information. In this work, we propose generalizations of CWoLa and SALAD for settings where multiple reference datasets are available, building on weak supervision techniques. We demonstrate improved performance in a number of settings with realistic and synthetic data. As an added benefit, our generalizations enable us to provide finite-sample guarantees, improving on existing asymptotic analyses.
翻译:在高能物理学中发现共振异常现象的一个重要技术类别,建立了能够区分参考数据和目标数据集的模型,只有目标数据集有明显信号,包括无拉贝(CWoLa)分类和模拟无隐形无异常现象模拟探测(SALAD)等技术依赖于单一的参考数据集,无法利用现有多数据集,因此无法充分利用现有信息。在这项工作中,我们建议以薄弱的监督技术为基础,在有多个参考数据集的环境下,对CWoLa和SALAD进行概括化。我们用现实和合成数据在一些环境中展示了更好的性能。作为额外的好处,我们的一般化使我们能够提供有限抽样保证,改进现有的无症状分析。