Many areas of science make extensive use of computer simulators that implicitly encode likelihood functions for complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, outside the asymptotic and low-dimensional regimes. Although new machine learning methods, such as normalizing flows, have revolutionized the sample efficiency and capacity of LFI methods, it remains an open question whether they produce reliable measures of uncertainty. In this paper, we present a statistical framework for LFI that unifies classical statistics with modern machine learning to: (1) construct frequentist confidence sets and hypothesis tests with finite-sample guarantees of nominal coverage (type I error control) and power, and (2) provide rigorous diagnostics for assessing empirical coverage over the entire parameter space. We refer to our framework as likelihood-free frequentist inference (LF2I). Any method that estimates a test statistic, such as the likelihood ratio, can be plugged into our framework to create powerful tests and confidence sets with correct coverage. In this work, we specifically study two test statistics (ACORE and BFF), which, respectively, maximize versus integrate an odds function over the parameter space. Our theoretical and empirical results offer multifaceted perspectives on error sources and challenges in likelihood-free frequentist inference.
翻译:科学的许多领域广泛使用计算机模拟器,这些模拟器隐含地将复杂系统的概率功能编码起来; 古老的统计方法不适合于这些所谓的无概率推断(LFI)环境,不在无症状和低维系统之外; 虽然新的机器学习方法,如正常流动,使样本效率和LFI方法的能力发生了革命性的变化,但是,它们是否产生可靠的不确定性的计量方法仍然是一个未决问题; 在本文件中,我们为LFI提供了一个统计框架,将古典统计数据与现代机器学习结合起来,以便:(1) 建立常住式信任套和假设测试,对名义覆盖(第一类错误控制)和权力进行有限的抽样保证;(2) 为评估整个参数空间的经验覆盖提供严格的诊断; 我们称我们的框架为没有概率的常态推断(LF2I)。 任何估算测试统计,例如概率,都可以插入我们的框架,以创造强有力的测试和信心,并正确覆盖。 在这项工作中,我们专门研究两种测试统计(ACORE和BFF),分别在空间的概率和频繁的参数上提供最大程度的理论结果和经常的概率。