Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution. Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed probability distribution. Thus, aggregated and randomized predictors have in common that they are not defined by a minimization problem, but by a probability distribution on the set of predictors. In statistical learning theory, there is a set of tools designed to understand the generalization ability of such procedures: PAC-Bayesian or PAC-Bayes bounds. Since the original PAC-Bayes bounds of McAllester, these tools have been considerably improved in many directions (we will for example describe a simplified version of the localization technique of Catoni that was missed by the community, and later rediscovered as "mutual information bounds"). Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayes at NIPS 2017, "(Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights", organized by B. Guedj, F. Bach and P. Germain. One of the reason of this recent success is the successful application of these bounds to neural networks by Dziugaite and Roy. An elementary introduction to PAC-Bayes theory is still missing. This is an attempt to provide such an introduction.
翻译:集成和随机化预测器的共同特征是,它们没有被最小化问题所定义,而是被一组预测器的概率分布所定义。在统计学习理论中,有一套工具旨在理解这些程序的普及能力:PAC-Bayesian 或PAC-Bayes 边框。自最初的PAC-Bayeser 边框McAllester以来,这些工具在许多方向上都得到了很大的改进(例如,我们将描述社区错过的Catoni本地化技术的简化版本,后来又被重新发现为“共同信息边框 ” 。最近,PAC-Bayes 边框受到相当重视: PAC-Bayes 边框或PAC- Bayes 边框。自最初的McAllester Bayes 边框以来,这些工具在许多方向上都得到了很大的改进(例如,我们描述Catoni 本地技术的简化版本,后来又被重新发现为“共同信息边框 ” 。 PAC-Bmainal 理论和Bmain the the the Brual imal a Brial Ex Ex-hal in the the the the Brubalalalal real a Breval subly a Brviewal 组织的这种成功的Pably.