詹森不平等及其对粒子变异推断的适用 (Loss function based second-order Jensen inequality and its application to particle variational inference)

Bayesian model averaging, obtained as the expectation of a likelihood function by a posterior distribution, has been widely used for prediction, evaluation of uncertainty, and model selection. Various approaches have been developed to efficiently capture the information in the posterior distribution; one such approach is the optimization of a set of models simultaneously with interaction to ensure the diversity of the individual models in the same way as ensemble learning. A representative approach is particle variational inference (PVI), which uses an ensemble of models as an empirical approximation for the posterior distribution. PVI iteratively updates each model with a repulsion force to ensure the diversity of the optimized models. However, despite its promising performance, a theoretical understanding of this repulsion and its association with the generalization ability remains unclear. In this paper, we tackle this problem in light of PAC-Bayesian analysis. First, we provide a new second-order Jensen inequality, which has the repulsion term based on the loss function. Thanks to the repulsion term, it is tighter than the standard Jensen inequality. Then, we derive a novel generalization error bound and show that it can be reduced by enhancing the diversity of models. Finally, we derive a new PVI that optimizes the generalization error bound directly. Numerical experiments demonstrate that the performance of the proposed PVI compares favorably with existing methods in the experiment.

翻译：一种具有代表性的方法是粒子变异推论(PVI),它使用一组模型作为事后分配的经验近似值。 PVI反复更新了每一种模型,并配以一种反射力,以确保优化模型的多样性。然而,尽管其表现有希望,但对这种反射的理论理解及其与一般化能力的联系仍然不明确。在本文件中,我们根据PAC-Bayesian分析的方法来解决这个问题。首先,我们提供了一个新的第二阶点Jensen不平等,它具有以损失函数为基础的反射术语。由于这个反射术语,它比标准Jensen不平等更加紧密。然后,我们用一种新的概括错误来更新每一种模型,以确保优化模型的多样性。我们从理论上理解这种反射及其与一般化能力的联系,但目前还不清楚。在本文中,我们根据PAC-Bayesian的分析来解决这个问题。首先,我们提供了一个新的第二阶点Jensen不平等,它具有以损失函数为基础的反向值术语。它比标准Jensen不平等更加紧密。然后,我们用一种新的概括错误来约束并表明它与新的最佳化的实验方法,我们最后通过强化了现在的实验来缩小了现在的进度。