We present a general approach, based on an exponential inequality, to derive bounds on the generalization error of randomized learning algorithms. Using this approach, we provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios. Specifically, for the case of subgaussian loss functions, we obtain novel bounds that depend on the information density between the training data and the output hypothesis. When suitably weakened, these bounds recover many of the information-theoretic available bounds in the literature. We also extend the proposed exponential-inequality approach to the setting recently introduced by Steinke and Zakynthinou (2020), where the learning algorithm depends on a randomly selected subset of the available training data. For this setup, we present bounds for bounded loss functions in terms of the conditional information density between the output hypothesis and the random variable determining the subset choice, given all training data. Through our approach, we recover the average generalization bound presented by Steinke and Zakynthinou (2020) and extend it to the PAC-Bayesian and single-draw scenarios. For the single-draw scenario, we also obtain novel bounds in terms of the conditional $\alpha$-mutual information and the conditional maximal leakage.
翻译:我们提出了一个基于指数性不平等的一般方法,以从随机学习算法的笼统错误中得出界限。我们采用这种方法,提供了PAC-Bayesian 和单拖式情景中平均一般错误及其尾概率的界限。具体地说,对于亚加盟损失函数,我们获得了取决于培训数据与产出假设之间信息密度的新型界限。当适当削弱时,这些界限恢复了文献中许多信息-理论现有界限。我们还将拟议的指数性不平等方法扩大到Steinke和Zakynthinou(20202020年)最近引入的设定,在这些情况下,学习算法取决于随机选择的现有培训数据中的一组。对于这一设置,我们从输出假设与确定子选择的随机变量之间的有条件信息密度的角度,提出了受约束的损失功能的界限。根据所有培训数据,我们的方法是恢复了Steinke和Zakynthinou(202020年)提出的平均一般化-理论约束,并将它扩展到了PAC-Basimial 和Aximal-laus asimal asil-laus asimal asim asim asilal asild pasil asil asil asild pasild pasild pasilal fiagemental pasild pasild pasild pasild pasild pasild pasilpal fielpal fielpal fiagement.