与Waldo的模拟推定推推:利用预测电算算法或反问题其他估测器进行信任区 (Simulation-Based Inference with Waldo: Confidence Regions by Leveraging Prediction Algorithms or Posterior Estimators for Inverse Problems)

Predictive algorithms, such as deep neural networks (DNNs), are used in many domain sciences to directly estimate internal parameters of interest in simulator-based models, especially in settings where the observations include images or other complex high-dimensional data. In parallel, modern neural density estimators, such as normalizing flows, are becoming increasingly popular for uncertainty quantification, especially when both parameters and observations are high-dimensional. However, parameter inference is an inverse problem and not a prediction task; thus, an open challenge is to construct conditionally valid and precise confidence regions, with a guaranteed probability of covering the true parameters of the data-generating process, no matter what the (unknown) parameter values are, and without relying on large-sample theory. Many simulator-based inference (SBI) methods are indeed known to produce biased or overly confident parameter regions, yielding misleading uncertainty estimates. This paper presents WALDO, a novel method for constructing confidence regions with finite-sample conditional validity by leveraging prediction algorithms or posterior estimators that are currently widely adopted in SBI. WALDO reframes the well-known Wald test statistic, and uses a computationally efficient regression-based machinery for classical Neyman inversion of hypothesis tests. We apply our method to a recent high-energy physics problem, where prediction with DNNs has previously led to estimates with prediction bias. We also illustrate how our approach can correct overly confident posterior regions computed with normalizing flows.

翻译：同时,现代神经密度估计器,如正常流等,正日益为不确定性量化所流行,特别是在参数和观测都是高维的情况下。但参数推断是一个反向问题,而不是预测任务;因此,一个公开的挑战是建立有条件的有效和精确的信任区域,保证有可能覆盖数据生成过程的真正参数,无论(未知)参数值是什么,也不依赖大模拟理论。许多模拟密度估计器,如正常流,的确已经为人们所了解,产生偏向或过于自信的参数区域,产生误导性的不确定性估计。本文介绍的是WALDO,这是利用预测算法或远端估测器来建立信任区域的一种新颖的有条件的方法,目前履行机构广泛采用这种方法是为了覆盖数据生成过程的真正参数,不管(未知)参数值是什么,也不依赖大模拟理论。许多模拟测算器的推算器确实正在产生偏差或过于自信的参数区域,从而产生误导性的不确定性估计值。