A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of $O(\sqrt{\lambda/n})$ where $\lambda$ is some information-theoretic quantities such as the mutual information or conditional mutual information between the data and the learned hypothesis. However, such a learning rate is typically considered to be ``slow", compared to a ``fast rate" of $O(\lambda/n)$ in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the critical conditions needed for the fast rate generalization error, which we call the $(\eta,c)$-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a fast convergence rate for specific learning algorithms such as empirical risk minimization and its regularized version. Finally, several analytical examples are given to show the effectiveness of the bounds.
翻译:近期的一系列研究表明,Russo和Xu等人发起的研究,已经证明了学习算法的泛化误差可以用信息量进行上限的量化。在大多数相关研究中,预期的泛化误差的收敛速度通常是$O(\sqrt{\lambda/n})$的形式,在这里,$\lambda$是数据和学习假设之间的一些信息理论量,例如互信息或条件互信息。然而,这种学习速度通常被认为是“慢”的,与许多学习场景下的“快速速度” $O(\lambda/n)$ 相比。在这项工作中,我们首先展示了平方根不一定意味着慢速度的收敛率,通过合适的假设,仍然可以使用这个界限得到快速率的泛化误差结果。此外,我们确定了快速率泛化误差所需的关键条件,我们称之为$(\eta,c)$-中心条件。在这个条件下,我们给出了关于泛化误差和多余风险的信息理论界限,并为特定的学习算法,如经验风险最小化及其正则化版本,提供了快速的收敛率。最后,给出了几个分析例子,以展示这些界限的有效性。