We present a new family of information-theoretic generalization bounds, in which the training loss and the population loss are compared through a jointly convex function. This function is upper-bounded in terms of the disintegrated, samplewise, evaluated conditional mutual information (CMI), an information measure that depends on the losses incurred by the selected hypothesis, rather than on the hypothesis itself, as is common in probably approximately correct (PAC)-Bayesian results. We demonstrate the generality of this framework by recovering and extending previously known information-theoretic bounds. Furthermore, using the evaluated CMI, we derive a samplewise, average version of Seeger's PAC-Bayesian bound, where the convex function is the binary KL divergence. In some scenarios, this novel bound results in a tighter characterization of the population loss of deep neural networks than previous bounds. Finally, we derive high-probability versions of some of these average bounds. We demonstrate the unifying nature of the evaluated CMI bounds by using them to recover average and high-probability generalization bounds for multiclass classification with finite Natarajan dimension.
翻译:我们提出了一种新的信息论泛化界的家族,在其中通过共同凸函数比较训练损失和族群损失。这个函数是使用离散化、样本级别、评估条件互信息 (CMI) 上限的,这是一种信息量度,取决于选择的假设所遇到的损失,而不是像 P 本质一样,取决于假设本身的性质。Bayesian 结果.我们通过确定性 CMI 衍生出这种框架的普遍性,并扩展了之前已知的信息论上限。此外,使用评估的 CMI,我们导出了 Seeger 的 PAC-Bayesian 基础上的样本平均值版本,其中凸函数是二元 KL 散度。在某些场景下,这个新颖的基础将比之前的基础更能够收紧对于深度神经网络的族群损失的表征。最后,我们推导出了这些平均基础的高概率版本。我们用评估的 CMI 界限恢复了多类分类与有限 Natarajan 维度的平均值和高概率泛化界,以展示界限的统一性。