We present extensive empirical evidence showing that current Bayesian simulation-based inference algorithms are inadequate for the falsificationist methodology of scientific inquiry. Our results collected through months of experimental computations show that all benchmarked algorithms -- (S)NPE, (S)NRE, SNL and variants of ABC -- may produce overconfident posterior approximations, which makes them demonstrably unreliable and dangerous if one's scientific goal is to constrain parameters of interest. We believe that failing to address this issue will lead to a well-founded trust crisis in simulation-based inference. For this reason, we argue that research efforts should now consider theoretical and methodological developments of conservative approximate inference algorithms and present research directions towards this objective. In this regard, we show empirical evidence that ensembles are consistently more reliable.
翻译:我们提出了广泛的实证证据表明,目前巴耶斯模拟推论算法不足以用于科学调查的伪造方法。我们通过数月的实验计算收集的结果表明,所有基准算法 -- -- (S) NPE, (S) NRE, (S) NEL, SNL和ABC的变体 -- -- 都可能产生过于自信的后方近似值,因此,如果一个人的科学目标是限制感兴趣的参数,这些近似值显然不可靠和危险。我们认为,不解决这一问题将导致在模拟推论中出现有充分根据的信任危机。为此,我们认为,研究努力现在应该考虑保守的近似推论算法的理论和方法发展,并针对这一目标提出研究方向。在这方面,我们展示了经验证据,证明共聚会始终更加可靠。