在基于模拟的推断中处理大误差和数据缺失的蒙特卡洛技术 (Monte Carlo Techniques for Addressing Large Errors and Missing Data in Simulation-based Inference)

Upcoming astronomical surveys will observe billions of galaxies across cosmic time, providing a unique opportunity to map the many pathways of galaxy assembly to an incredibly high resolution. However, the huge amount of data also poses an immediate computational challenge: current tools for inferring parameters from the light of galaxies take $\gtrsim 10$ hours per fit. This is prohibitively expensive. Simulation-based Inference (SBI) is a promising solution. However, it requires simulated data with identical characteristics to the observed data, whereas real astronomical surveys are often highly heterogeneous, with missing observations and variable uncertainties determined by sky and telescope conditions. Here we present a Monte Carlo technique for treating out-of-distribution measurement errors and missing data using standard SBI tools. We show that out-of-distribution measurement errors can be approximated by using standard SBI evaluations, and that missing data can be marginalized over using SBI evaluations over nearby data realizations in the training set. While these techniques slow the inference process from $\sim 1$ sec to $\sim 1.5$ min per object, this is still significantly faster than standard approaches while also dramatically expanding the applicability of SBI. This expanded regime has broad implications for future applications to astronomical surveys.

翻译：即将到来的天文测量将在整个宇宙时间观测数十亿个星系,为绘制银河组装的许多路径以达到令人难以置信的高分辨率提供了一个独特的机会。然而,大量数据也带来了即时的计算挑战:目前从星系光中推算参数的工具每适用10小时需要$\gtrsim 10美元。这是极其昂贵的。以模拟为基础的推论(SBI)是一个很有希望的解决办法。然而,它需要模拟数据,其特征与所观测的数据相同,而真正的天文测量往往非常多样,缺少观测和由天空和望远镜条件决定的不确定因素。在这里,我们展示了利用标准履行机构工具处理分配计量错误和数据缺失的蒙特卡洛技术。我们表明,使用标准的履行机构评估可以近似分配计量错误,而缺失的数据可以被排斥在边际之外,而利用履行机构对培训中近处的数据实现情况的评价。虽然这些技术使推论过程从1美元到1.5美元/sim每件的推慢,但这一方法仍然大大快于标准方法,同时也大大扩展了履行机构测量的应用。这一扩大的制度对未来具有广泛的影响。