Established approaches to obtain generalization bounds in data-driven optimization and machine learning mostly build on solutions from empirical risk minimization (ERM), which depend crucially on the functional complexity of the hypothesis class. In this paper, we present an alternate route to obtain these bounds on the solution from distributionally robust optimization (DRO), a recent data-driven optimization framework based on worst-case analysis and the notion of ambiguity set to capture statistical uncertainty. In contrast to the hypothesis class complexity in ERM, our DRO bounds depend on the ambiguity set geometry and its compatibility with the true loss function. Notably, when using maximum mean discrepancy as a DRO distance metric, our analysis implies, to the best of our knowledge, the first generalization bound in the literature that depends solely on the true loss function, entirely free of any complexity measures or bounds on the hypothesis class.
翻译:在数据驱动优化和机器学习中,既定的获取通用界限的方法主要基于经验风险最小化(ERM)的解决方案,这些解决方案主要取决于假设等级的功能复杂性。在本文中,我们提出了一个获取这些界限的替代途径,这些解决方案来自分布稳健优化(DRO),这是基于最坏情况分析的最新数据驱动优化框架,也是为捕捉统计不确定性而设定的模糊概念。与机构风险管理中的假设等级复杂性不同,我们的DRO界限取决于所设定的模糊性几何及其与真实损失函数的兼容性。值得注意的是,在使用最大平均值差异作为DRO距离度量时,我们的分析意味着,根据我们的知识,文献中仅依赖真实损失函数的首个通用性,完全不受任何复杂措施或假设等级的约束。