Computer simulations have proven a valuable tool for understanding complex phenomena across the sciences. However, the utility of simulators for modelling and forecasting purposes is often restricted by low data quality, as well as practical limits to model fidelity. In order to circumvent these difficulties, we argue that modellers must treat simulators as idealistic representations of the true data generating process, and consequently should thoughtfully consider the risk of model misspecification. In this work we revisit neural posterior estimation (NPE), a class of algorithms that enable black-box parameter inference in simulation models, and consider the implication of a simulation-to-reality gap. While recent works have demonstrated reliable performance of these methods, the analyses have been performed using synthetic data generated by the simulator model itself, and have therefore only addressed the well-specified case. In this paper, we find that the presence of misspecification, in contrast, leads to unreliable inference when NPE is used naively. As a remedy we argue that principled scientific inquiry with simulators should incorporate a model criticism component, to facilitate interpretable identification of misspecification and a robust inference component, to fit 'wrong but useful' models. We propose robust neural posterior estimation (RNPE), an extension of NPE to simultaneously achieve both these aims, through explicitly modelling the discrepancies between simulations and the observed data. We assess the approach on a range of artificially misspecified examples, and find RNPE performs well across the tasks, whereas naively using NPE leads to misleading and erratic posteriors.
翻译:计算机模拟被证明是了解整个科学的复杂现象的宝贵工具,然而,模拟器模拟和预测目的的效用往往受到数据质量低和对模型忠诚的实际限制的限制。为避免这些困难,我们认为,模拟器必须把模拟器视为真实数据生成过程的理想化表现,因此,应当深思熟虑地考虑模型偏差的风险。在这项工作中,我们重新审视神经外延估计(NPE)这一类算法,它能够使模拟模型中的黑盒参数推断,并考虑模拟到现实差距的影响。虽然最近的工作显示了这些方法的可靠性能,但是这些分析是使用模拟模型本身产生的合成数据进行的,因此,我们只处理了精确的事例。在本文件中,我们发现,在使用NPE(NPE)模型时,有误差的出现导致不可靠的推断。我们认为,用模拟模型进行有原则的科学调查应该包含一个典型的批评部分,以便于解释误判的准确度和精确度的比值,我们用精确的模型来评估这些精确的比值。