Distributionally robust optimization (DRO) is a widely-used approach to learn models that are robust against distribution shift. Compared with the standard optimization setting, the objective function in DRO is more difficult to optimize, and most of the existing theoretical results make strong assumptions on the loss function. In this work we bridge the gap by studying DRO algorithms for general smooth non-convex losses. By carefully exploiting the specific form of the DRO objective, we are able to provide non-asymptotic convergence guarantees even though the objective function is possibly non-convex, non-smooth and has unbounded gradient noise. In particular, we prove that a special algorithm called the mini-batch normalized gradient descent with momentum, can find an $\epsilon$ first-order stationary point within $O( \epsilon^{-4} )$ gradient complexity. We also discuss the conditional value-at-risk (CVaR) setting, where we propose a penalized DRO objective based on a smoothed version of the CVaR that allows us to obtain a similar convergence guarantee. We finally verify our theoretical results in a number of tasks and find that the proposed algorithm can consistently achieve prominent acceleration.
翻译:分布稳健优化( DRO) 是一种广泛使用的方法,用于学习与分配转移相对稳健的模型。 与标准优化设置相比, DRO的目标功能更难优化, 而大部分现有的理论结果对损失功能做出了有力的假设。 在这项工作中,我们通过研究用于一般顺利的非凝油损失的DRO算法来弥补差距。 通过仔细利用DRO目标的具体形式,我们有能力提供非零星趋同保证,即使目标功能可能是非凝固的、非吸附的和无约束的梯度噪音。 特别是,我们证明一种称为迷你批平滑梯度回归的特殊算法能够以势头在$O( eepsilon ⁇ -4}) 美元范围内找到一阶固定点。 我们还讨论了有条件的值风险( CVaR) 设置, 我们根据CVaR 的平滑版本提出一个受罚的DRO目标,使我们能够获得类似的趋同的一致的保证。 我们最后核查了我们的理论结果,在一项拟议的任务中可以持续地找到一个数字。