We present a general approach, based on exponential inequalities, to derive bounds on the generalization error of randomized learning algorithms. Using this approach, we provide bounds on the average generalization error as well as bounds on its tail probability, for both the PAC-Bayesian and single-draw scenarios. Specifically, for the case of sub-Gaussian loss functions, we obtain novel bounds that depend on the information density between the training data and the output hypothesis. When suitably weakened, these bounds recover many of the information-theoretic bounds available in the literature. We also extend the proposed exponential-inequality approach to the setting recently introduced by Steinke and Zakynthinou (2020), where the learning algorithm depends on a randomly selected subset of the available training data. For this setup, we present bounds for bounded loss functions in terms of the conditional information density between the output hypothesis and the random variable determining the subset choice, given all training data. Through our approach, we recover the average generalization bound presented by Steinke and Zakynthinou (2020) and extend it to the PAC-Bayesian and single-draw scenarios. For the single-draw scenario, we also obtain novel bounds in terms of the conditional $\alpha$-mutual information and the conditional maximal leakage.
翻译:我们提出了一个基于指数性不平等的一般方法,以从随机学习算法的笼统错误中得出界限。使用这种方法,我们提供了PAC-Bayesian 和单拖式情景中平均一般错误及其尾概率的界限。具体地说,对于亚高加索损失函数,我们获得了取决于培训数据与产出假设之间信息密度的新型界限。当适当削弱时,这些界限恢复了文献中的许多信息-理论界限。我们还将拟议的指数性不平等方法扩展至Steinke和Zakynthinou(202020年)最近引入的设定,在该设置中,学习算法取决于随机选择的现有培训数据的一部分。对于这一设置,我们从输出假设与确定子选择的随机变量之间的有条件信息密度的角度,提出了约束性损失功能的界限。鉴于所有培训数据,我们的方法,我们恢复了Steinke和Zakynthinou(202020年)提出的平均一般化方法,将拟议的指数性不平等方法扩展至Steinke和Zakynthinou(2020年)最近引入的设置,而学习算法则取决于随机选定的一组培训数据组合中的部分。关于标准-Basliasimal-Basimal-harnial-lausial 和Alifal-lausial 和Ali-laview-lad 和Ali-lad 的单一条件。</s>