Bayesian coresets approximate a posterior distribution by building a small weighted subset of the data points. Any inference procedure that is too computationally expensive to be run on the full posterior can instead be run inexpensively on the coreset, with results that approximate those on the full data. However, current approaches are limited by either a significant run-time or the need for the user to specify a low-cost approximation to the full posterior. We propose a Bayesian coreset construction algorithm that first selects a uniformly random subset of data, and then optimizes the weights using a novel quasi-Newton method. Our algorithm is a simple to implement, black-box method, that does not require the user to specify a low-cost posterior approximation. It is the first to come with a general high-probability bound on the KL divergence of the output coreset posterior. Experiments demonstrate that our method provides significant improvements in coreset quality against alternatives with comparable construction times, with far less storage cost and user input required.
翻译:Bayesian 核心群集通过建立少量加权数据子集来接近后部分布。 任何计算成本太高而无法在完整后部运行的推论程序都可以在核心群中以廉价的方式运行, 其结果与全部数据相近。 然而, 目前的方法有相当大的运行时间或用户需要指定一个低成本近似值以达到完整后部。 我们建议一种Bayesian 核心群集构建算法, 先选择一个单一的随机数据子集, 然后再使用新的准牛顿方法优化重量。 我们的算法简单易行, 黑箱法, 不需要用户指定低成本后部近似值。 这是第一个与输出核心群集输出子集差异相联系的普遍高概率。 实验表明, 我们的方法在核心群集质量上提供了显著的改进, 而不是类似构建时间的替代方法, 存储成本和用户投入要少得多。