There has been a recent explosion in research into machine-learning-based generative modeling to tackle computational challenges for simulations in high energy physics (HEP). In order to use such alternative simulators in practice, we need well defined metrics to compare different generative models and evaluate their discrepancy from the true distributions. We present the first systematic review and investigation into evaluation metrics and their sensitivity to failure modes of generative models, using the framework of two-sample goodness-of-fit testing, and their relevance and viability for HEP. Inspired by previous work in both physics and computer vision, we propose two new metrics, the Fr\'echet and kernel physics distances (FPD and KPD), and perform a variety of experiments measuring their performance on simple Gaussian-distributed, and simulated high energy jet datasets. We find FPD, in particular, to be the most sensitive metric to all alternative jet distributions tested and recommend its adoption, along with the KPD and Wasserstein distances between individual feature distributions, for evaluating generative models in HEP. We finally demonstrate the efficacy of these proposed metrics in evaluating and comparing a novel attention-based generative adversarial particle transformer to the state-of-the-art message-passing generative adversarial network jet simulation model.
翻译:最近,针对高能物理模拟的计算挑战,对基于机械学习的基因模型进行了研究,以解决高能物理模拟的计算挑战。为了在实践中使用这种替代模拟器,我们需要定义明确的衡量标准,以比较不同的基因模型,并评估它们与真实分布的差别。我们首次系统地审查和调查了评价指标及其对基因模型失败模式的敏感性,利用了两样样样的 " 最佳 " 测试框架,以及它们对高能物理模拟的关联性和可行性。在物理学和计算机视觉的先前工作启发下,我们提出了两种新的衡量标准,即Fr\'echet和内核物理距离(FFD和KPD),并进行了各种实验,以测量它们在简单的高能分布模型和模拟高能喷气式数据集上的性能。我们特别发现FPD是所有替代喷气式分配测试的最为敏感的衡量标准,并建议采用这一标准,同时建议KPD和Wasserstein之间的个人特征分布距离,以评估HEPEP的基因模型。我们最后展示了这些模拟模型的对抗性模型的效用。我们最后展示了在评估与对等压式基因模型的模型的模型进行对比。