Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution. Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed probability distribution. Thus, aggregated and randomized predictors have in common that they are not defined by a minimization problem, but by a probability distribution on the set of predictors. In statistical learning theory, there is a set of tools designed to understand the generalization ability of such procedures: PAC-Bayesian or PAC-Bayes bounds. Since the original PAC-Bayes bounds of D. McAllester, these tools have been considerably improved in many directions (we will for example describe a simplified version of the localization technique of O. Catoni that was missed by the community, and later rediscovered as "mutual information bounds"). Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayes at NIPS 2017, "(Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights", organized by B. Guedj, F. Bach and P. Germain. One of the reason of this recent success is the successful application of these bounds to neural networks by G. Dziugaite and D. Roy. An elementary introduction to PAC-Bayes theory is still missing. This is an attempt to provide such an introduction.
翻译:集成和随机化预测器通常不是由最小化问题定义的,而是由一组预测器的概率分布来定义的。在统计学习理论中,有一套工具旨在理解这类程序的一般能力:PAC-Bayesian 或PAC-Bayes 边框。自D. McAllester的原PAC-Bayes 边框以来,这些工具在许多方向上都大大改进(例如,我们将描述O. Catoni本地化技术的简化版本,而社区忽略了这种简化版本,后来又重新发现为“双向信息边框”,最近,PAC-Bayeses 界得到了相当大的关注:例如,NIPSS-2017的PAC-Bayes 边框,“(最近)Bayesian 的50个Shades:PAC-Bayesian的尝试性Mcallester尝试,这些工具在很多方向上都得到了改进(例如我们描述的PAC-Bayesian理论,以及GARE的理论性G 成功)。由B.