In this paper, we present a method of embedding physics data manifolds with metric structure into lower dimensional spaces with simpler metrics, such as Euclidean and Hyperbolic spaces. We then demonstrate that it can be a powerful step in the data analysis pipeline for many applications. Using progressively more realistic simulated collisions at the Large Hadron Collider, we show that this embedding approach learns the underlying latent structure. With the notion of volume in Euclidean spaces, we provide for the first time a viable solution to quantifying the true search capability of model agnostic search algorithms in collider physics (i.e. anomaly detection). Finally, we discuss how the ideas presented in this paper can be employed to solve many practical challenges that require the extraction of physically meaningful representations from information in complex high dimensional datasets.
翻译:在本文中,我们展示了一种方法,将物理数据元体与测量结构嵌入低维空间,并配有更简单的度量空间,如欧几里得和双曲空间。然后我们展示了它可以成为数据分析管道中许多应用的有力步骤。我们使用在大型散子对撞机上逐渐更现实的模拟碰撞,我们展示了这种嵌入方法可以了解潜在的潜在结构。在欧几里得空间中,我们首次提供了一种可行的解决办法,用数量表示对相撞物理(即异常探测)中模型的不可知性搜索算法的真正搜索能力。最后,我们讨论了如何利用本文件中提出的想法解决许多实际挑战,需要从复杂的高维数据集中提取具有实际意义的表述。