We study problem-dependent rates, i.e., generalization errors that scale near-optimally with the variance, the effective loss, or the gradient norms evaluated at the "best hypothesis." We introduce a principled framework dubbed "uniform localized convergence," and characterize sharp problem-dependent rates for central statistical learning problems. From a methodological viewpoint, our framework resolves several fundamental limitations of existing uniform convergence and localization analysis approaches. It also provides improvements and some level of unification in the study of localized complexities, one-sided uniform inequalities, and sample-based iterative algorithms. In the so-called "slow rate" regime, we provides the first (moment-penalized) estimator that achieves the optimal variance-dependent rate for general "rich" classes; we also establish improved loss-dependent rate for standard empirical risk minimization. In the "fast rate" regime, we establish finite-sample problem-dependent bounds that are comparable to precise asymptotics. In addition, we show that efficient algorithms like gradient descent and first-order Expectation-Maximization can achieve optimal generalization error in several representative problems across the areas of non-convex learning, stochastic optimization, and learning with missing data.
翻译:我们研究与问题相关的比率,即,与差异、有效损失或“最佳假设”所评估的梯度标准相比,几乎最接近于差异、有效损失或梯度规范的概括性错误。我们引入了一个称为“统一地方趋同”的原则框架,并给中央统计学习问题定出与问题有关的急剧率。从方法角度看,我们的框架解决了现有统一趋同和本地化分析方法的若干基本限制。它还在研究本地化复杂性、单面统一不平等和抽样迭代算法方面提供了改进和一定程度的统一。在所谓的“低比率”制度中,我们提供了第一个(流动-惩罚性)估计数据,该估计数字在一般“富”类中达到最佳差异-依赖率;我们还为标准经验风险最小化确定了更好的损失依赖率。在“快速率”制度中,我们建立了与精确的随机性相近的有限问题依赖性界限。此外,我们表明,在诸如梯度血统和一级预期-马克化等高效的算法中,可以在一些代表性的学习领域实现最佳一般错误。