We describe the outcome of a data challenge conducted as part of the Dark Machines Initiative and the Les Houches 2019 workshop on Physics at TeV colliders. The challenged aims at detecting signals of new physics at the LHC using unsupervised machine learning algorithms. First, we propose how an anomaly score could be implemented to define model-independent signal regions in LHC searches. We define and describe a large benchmark dataset, consisting of >1 Billion simulated LHC events corresponding to $10~\rm{fb}^{-1}$ of proton-proton collisions at a center-of-mass energy of 13 TeV. We then review a wide range of anomaly detection and density estimation algorithms, developed in the context of the data challenge, and we measure their performance in a set of realistic analysis environments. We draw a number of useful conclusions that will aid the development of unsupervised new physics searches during the third run of the LHC, and provide our benchmark dataset for future studies at https://www.phenoMLdata.org. Code to reproduce the analysis is provided at https://github.com/bostdiek/DarkMachines-UnsupervisedChallenge.
翻译:我们描述了作为黑暗机器倡议和TeV相撞器物理2019年Les Houches讲习班的一部分而进行的数据挑战的结果,挑战的目的是利用不受监督的机器学习算法探测LHC新的物理信号信号。首先,我们建议如何执行异常分数,以界定LHC搜索中的模型独立信号区域。我们定义和描述一个大型基准数据集,其中包括 > 1亿美元模拟LHC事件,相当于10美元(rm{fb ⁇ _1美元)的质子-质质子碰撞,在13 TeV的中枢载能中心进行。然后,我们审查在数据挑战背景下开发的范围广泛的异常探测和密度估计算法,我们在一套现实的分析环境中衡量其性能。我们得出一些有用的结论,将有助于在LHC第三期开发不受监管的新物理学搜索,并在https://www.phenoMLData.org上提供我们今后研究的基准数据集。在https://girebubs/Chombakhkdata.org上提供了复制分析的代码。