Recent advances in probabilistic modelling have led to a large number of simulation-based inference algorithms which do not require numerical evaluation of likelihoods. However, a public benchmark with appropriate performance metrics for such 'likelihood-free' algorithms has been lacking. This has made it difficult to compare algorithms and identify their strengths and weaknesses. We set out to fill this gap: We provide a benchmark with inference tasks and suitable performance metrics, with an initial selection of algorithms including recent approaches employing neural networks and classical Approximate Bayesian Computation methods. We found that the choice of performance metric is critical, that even state-of-the-art algorithms have substantial room for improvement, and that sequential estimation improves sample efficiency. Neural network-based approaches generally exhibit better performance, but there is no uniformly best algorithm. We provide practical advice and highlight the potential of the benchmark to diagnose problems and improve algorithms. The results can be explored interactively on a companion website. All code is open source, making it possible to contribute further benchmark tasks and inference algorithms.
翻译:概率建模方面最近的进展导致大量基于模拟的推断算法,这些算法不需要对可能性进行数字评估。然而,一直缺乏一种公共基准,对“无象”算法缺乏适当的性能衡量标准。这使得难以比较算法并找出其优缺点。我们提出填补这一差距:我们提供了一个基准,规定了推论任务和适当的性能衡量标准,初步选择了各种算法,包括最近采用神经网络和古典的巴耶西亚相近计算法的方法。我们发现,选择性能衡量法十分关键,即使是最先进的算法也有相当大的改进空间,并且顺序估计提高了抽样效率。基于神经网络的方法一般都表现出更好的性能,但没有统一的最佳算法。我们提供实用的建议,强调基准在诊断问题和改进算法方面的潜力。结果可以在一个配套的网站上交互探讨。所有代码都是开放的,因此有可能为进一步的基准任务和推论算法作出贡献。