Wasserstein distributionally robust optimization (DRO) aims to find robust and generalizable solutions by hedging against data perturbations in Wasserstein distance. Despite its recent empirical success in operations research and machine learning, existing performance guarantees for generic loss functions are either overly conservative due to the curse of dimensionality, or plausible only in large sample asymptotics. In this paper, we develop a non-asymptotic framework for analyzing the out-of-sample performance for Wasserstein robust learning and the generalization bound for its related Lipschitz and gradient regularization problems. To the best of our knowledge, this gives the first finite-sample guarantee for generic Wasserstein DRO problems without suffering from the curse of dimensionality. Our results highlight that Wasserstein DRO, with a properly chosen radius, balances between the empirical mean of the loss and the variation of the loss, measured by the Lipschitz norm or the gradient norm of the loss. Our analysis is based on two novel methodological developments that are of independent interest: 1) a new concentration inequality controlling the decay rate of large deviation probabilities by the variation of the loss and, 2) a localized Rademacher complexity theory based on the variation of the loss.
翻译:瓦森斯坦分布强力优化(DRO)的目的是通过在瓦森斯坦距离的数据扰动中进行套期保值,找到稳健和普遍适用的解决办法。尽管其最近在业务研究和机器学习方面取得了成功经验,但现有的一般损失功能绩效保障要么由于维度的诅咒过于保守,要么只在大样本无症状中才看似合理。在本文件中,我们为瓦森斯坦的强健学习开发了一个非零吸量框架,并将相关Lipschitz和梯度规范问题加以概括化。我们最了解的是,这为通用的瓦森斯坦DRO问题提供了首个有限的保障,而没有受到维度的诅咒。我们的结果突出表明,Wasserstein DRO在适当选择的半径范围内,平衡了损失的经验平均值和损失的变异性,以Lipschitz规范或损失的梯度标准来衡量。我们的分析基于两个具有独立兴趣的新的方法发展:(1) 新的集中不平等,控制着因损失变异性理论导致的大规模偏差率的衰减率;(2) 以当地拉迪玛为基础的损失理论为基础的拉迪马。