Established approaches to obtain generalization bounds in data-driven optimization and machine learning mostly build on solutions from empirical risk minimization (ERM), which depend crucially on the functional complexity of the hypothesis class. In this paper, we present an alternate route to obtain these bounds on the solution from distributionally robust optimization (DRO), a recent data-driven optimization framework based on worst-case analysis and the notion of ambiguity set to capture statistical uncertainty. In contrast to the hypothesis class complexity in ERM, our DRO bounds depend on the ambiguity set geometry and its compatibility with the true loss function. Notably, when using statistical distances such as maximum mean discrepancy, Wasserstein distance, or $\phi$-divergence in the DRO, our analysis implies generalization bounds whose dependence on the hypothesis class appears the minimal possible: The bound depends solely on the true loss function, independent of any other candidates in the hypothesis class. To our best knowledge, it is the first generalization bound of this type in the literature, and we hope our findings can open the door for a better understanding of DRO, especially its benefits on loss minimization and other machine learning applications.
翻译:在数据驱动优化和机器学习中,既定的获取通用限制的方法主要基于经验风险最小化(ERM)的解决方案,这些解决方案主要取决于假设等级的功能复杂性。在本文中,我们提出了一个获取分配强力优化(DRO)解决方案这些界限的替代途径,这是最近基于最坏情况分析的数据驱动优化框架,也是为获取统计不确定性而设定的模糊概念。与机构风险管理的假设类别复杂性相比,我们的DRO界限取决于设定的模糊性几何及其与真正损失功能的兼容性。值得注意的是,当使用统计距离,如最大平均差异、瓦瑟斯坦距离或DRO的美元等,我们的分析意味着对假设等级的依赖程度可能最小:这一界限完全取决于真正的损失功能,独立于假设类别中任何其他候选人。据我们所知,这是文献中这类类型的第一个概括性约束,我们希望我们的发现能够打开更好地了解DRO的大门,特别是其对损失最小化和其他机器学习应用的好处。