学习发现:物理科学中多维模拟和参数推理的表达式高斯混合模型 (Learning to discover: expressive Gaussian mixture models for multi-dimensional simulation and parameter inference in the physical sciences)

We show that density models describing multiple observables with (i) hard boundaries and (ii) dependence on external parameters may be created using an auto-regressive Gaussian mixture model. The model is designed to capture how observable spectra are deformed by hypothesis variations, and is made more expressive by projecting data onto a configurable latent space. It may be used as a statistical model for scientific discovery in interpreting experimental observations, for example when constraining the parameters of a physical model or tuning simulation parameters according to calibration data. The model may also be sampled for use within a Monte Carlo simulation chain, or used to estimate likelihood ratios for event classification. The method is demonstrated on simulated high-energy particle physics data considering the anomalous electroweak production of a $Z$ boson in association with a dijet system at the Large Hadron Collider, and the accuracy of inference is tested using a realistic toy example. The developed methods are domain agnostic; they may be used within any field to perform simulation or inference where a dataset consisting of many real-valued observables has conditional dependence on external parameters.

翻译：我们显示,描述多种观测的密度模型(一)硬边界和(二)依赖外部参数的密度模型可以使用自动递增高斯混合模型来创建,该模型旨在捕捉可观测光谱如何因假设变异而变形,通过将数据投射到可配置的潜在空间而使该模型更加直观。在解释实验观测时,可以用作科学发现统计模型,例如,在限制物理模型参数或根据校准数据调整模拟参数时。该模型也可以抽样,供蒙特卡洛模拟链内使用,或用于估计事件分类的可能性比率。该方法在模拟高能粒子物理学数据中演示,考虑到在大型哈德龙相撞机上与dijet系统联合生产一个$Z$ Boson的模拟电子weak,并且用一个现实的玩具来测试推断的准确性。开发方法是域的测量;可以在任何领域使用这些方法进行模拟或推断,因为由许多真实估值的观测组成的数据集对外部参数具有有条件依赖性。