This paper studies the Variational Inference (VI) used for training Bayesian Neural Networks (BNN) in the overparameterized regime, i.e., when the number of neurons tends to infinity. More specifically, we consider overparameterized two-layer BNN and point out a critical issue in the mean-field VI training. This problem arises from the decomposition of the lower bound on the evidence (ELBO) into two terms: one corresponding to the likelihood function of the model and the second to the Kullback-Leibler (KL) divergence between the prior distribution and the variational posterior. In particular, we show both theoretically and empirically that there is a trade-off between these two terms in the overparameterized regime only when the KL is appropriately re-scaled with respect to the ratio between the the number of observations and neurons. We also illustrate our theoretical results with numerical experiments that highlight the critical choice of this ratio.
翻译:本文研究了用于在过度参数化制度中培训贝耶斯神经网络(BNN)的“变法推断”(VI),即当神经元数量趋向无限时。更具体地说,我们考虑多分二层BNN,并指出了中场六培训中的一个关键问题。这个问题产生于证据(ELBO)下限分解成两个术语:一个与模型的可能性功能相对应,第二个与Kullback-Libel(KL)在先前分布和变异后部的差值(KL)相对应。特别是,我们从理论上和从经验上表明,只有在KL适当调整了观测数量和神经元比例的比例时,在过度参数化制度中这两个术语之间才存在权衡。我们还用数字实验来说明我们的理论结果,以强调这一比率的关键选择。