Recent advances in coreset methods have shown that a selection of representative datapoints can replace massive volumes of data for Bayesian inference, preserving the relevant statistical information and significantly accelerating subsequent downstream tasks. Existing variational coreset constructions rely on either selecting subsets of the observed datapoints, or jointly performing approximate inference and optimizing pseudodata in the observed space akin to inducing points methods in Gaussian Processes. So far, both approaches are limited by complexities in evaluating their objectives for general purpose models, and require generating samples from a typically intractable posterior over the coreset throughout inference and testing. In this work, we present a black-box variational inference framework for coresets that overcomes these constraints and enables principled application of variational coresets to intractable models, such as Bayesian neural networks. We apply our techniques to supervised learning problems, and compare them with existing approaches in the literature for data summarization and inference.
翻译:核心集位方法的最近进展表明,选择具有代表性的数据点可以取代大量贝耶斯人的推断数据,保存相关的统计资料,并大大加快随后的下游任务。现有的变式核心集的构造依赖于选择观测到的数据点子集,或者在观测到的空间共同进行近似推论和优化伪数据,类似于高西亚进程引出点方法。迄今为止,这两种方法都因评估其一般用途模型目标的复杂性而受到限制,需要在整个推断和测试过程中从一个通常难以解决的后方取样中生成样本。在这项工作中,我们为克服这些制约因素的核心集提供了一个黑箱变式推论框架,并使得能够将变式核心集有原则地应用于诸如巴伊西亚神经网络等棘手模型。我们运用我们的技术来监管学习问题,并将这些问题与文献中现有的数据总和和推断方法进行比较。