We derive information-theoretic lower bounds on the Bayes risk and generalization error of realizable machine learning models. In particular, we employ an analysis in which the rate-distortion function of the model parameters bounds the required mutual information between the training samples and the model parameters in order to learn a model up to a Bayes risk constraint. For realizable models, we show that both the rate distortion function and mutual information admit expressions that are convenient for analysis. For models that are (roughly) lower Lipschitz in their parameters, we bound the rate distortion function from below, whereas for VC classes, the mutual information is bounded above by $d_\mathrm{vc}\log(n)$. When these conditions match, the Bayes risk with respect to the zero-one loss scales no faster than $\Omega(d_\mathrm{vc}/n)$, which matches known outer bounds and minimax lower bounds up to logarithmic factors. We also consider the impact of label noise, providing lower bounds when training and/or test samples are corrupted.
翻译:我们从贝雅斯风险和可实现机器学习模型的概括错误中得出关于贝雅斯风险的信息-理论下限和可实现的机器学习模型的概括错误。 特别是,我们使用模型参数的速率扭曲功能将培训样品和模型参数之间的必要相互信息捆绑起来,以便学习一个模型,直至贝雅斯风险制约。 对于可实现模型,我们显示,率扭曲功能和相互信息都承认便于分析的表达方式。 对于(小于)利普西茨参数中的模型,我们从下面绑定了率扭曲功能,而对于VC类,相互信息则由$d ⁇ mathrm{vc ⁇ log(n)来绑定。 当这些条件匹配时,贝雅斯对零一损失尺度的风险不会比$\Omega(d ⁇ mathrm{vc}/n)更快,它与已知的外部界限和小负值下界限与对数系数相符。 我们还考虑标签噪音的影响,在培训和/或测试样品腐败时提供较低界限。