A recent line of works, initiated by Russo and Xu, has shown that the generalization error of a learning algorithm can be upper bounded by information measures. In most of the relevant works, the convergence rate of the expected generalization error is in the form of O(sqrt{lambda/n}) where lambda is some information-theoretic quantities such as the mutual information between the data sample and the learned hypothesis. However, such a learning rate is typically considered to be "slow", compared to a "fast rate" of O(1/n) in many learning scenarios. In this work, we first show that the square root does not necessarily imply a slow rate, and a fast rate (O(1/n)) result can still be obtained using this bound under appropriate assumptions. Furthermore, we identify the key conditions needed for the fast rate generalization error, which we call the (eta,c)-central condition. Under this condition, we give information-theoretic bounds on the generalization error and excess risk, with a convergence rate of O(\lambda/{n}) for specific learning algorithms such as empirical risk minimization. Finally, analytical examples are given to show the effectiveness of the bounds.
翻译:由 Russo 和 Xu 发起的最新一行工程显示,学习算法的概括性错误可以被信息计量措施所覆盖。 在大多数相关工程中,预期的概括性错误的趋同率以O(sqrt{lambda/n})为形式,即羊羔是某种信息理论数量,例如数据抽样和所学假设之间的相互信息。然而,这种学习率通常被视为“低 ”,而在许多学习情景中,O(1/n)的“快率”为O(1/n)。在这项工作中,我们首先表明,平方根不一定意味着慢速率,而快速率(O1/n)的结果也可以在适当的假设下使用这一约束方式获得。此外,我们确定了快速速率概括性差的关键条件,我们称之为(eta,c)-中央条件。在这样的条件下,我们给出了信息-理论约束性误差和超额风险的信息,而O(lambda/{n} 的趋同率则表明,对于具体学习算法的趋同率,例如实验性最小性风险,最后显示了分析示例。