Since the celebrated works of Russo and Zou (2016,2019) and Xu and Raginsky (2017), it has been well known that the generalization error of supervised learning algorithms can be bounded in terms of the mutual information between their input and the output, given that the loss of any fixed hypothesis has a subgaussian tail. In this work, we generalize this result beyond the standard choice of Shannon's mutual information to measure the dependence between the input and the output. Our main result shows that it is indeed possible to replace the mutual information by any strongly convex function of the joint input-output distribution, with the subgaussianity condition on the losses replaced by a bound on an appropriately chosen norm capturing the geometry of the dependence measure. This allows us to derive a range of generalization bounds that are either entirely new or strengthen previously known ones. Examples include bounds stated in terms of $p$-norm divergences and the Wasserstein-2 distance, which are respectively applicable for heavy-tailed loss distributions and highly smooth loss functions. Our analysis is entirely based on elementary tools from convex analysis by tracking the growth of a potential function associated with the dependence measure and the loss function.
翻译:自Russo和Zou(20162019年)、Xu和Raginsky(2017年)的著名作品(20162017年)以来,众所周知,监督学习算法的普遍错误可以局限在投入和产出之间的相互信息上,因为任何固定假设的丢失都具有亚高西尾尾巴。在这项工作中,我们除了香农相互信息的标准选择外,还概括了这一结果,以衡量投入和产出之间的依赖性。我们的主要结果表明,的确有可能用联合输入输出分布的任何强烈的连接功能来取代相互信息,而以一个适当选择的规范来取代损失的子系统性条件,即以其输入和产出的几何测量值作为约束。这使我们能够得出一系列全新的或加强以前已知的通用界限。例子包括以美元-诺尔米差异和瓦塞斯特因-2距离表示的界限,这些界限分别适用于重成份的损失分布和高度平稳的损失功能。我们的分析完全以基本工具为基础,即跟踪潜在损失函数的增长和潜在损失函数的关联性。