We propose and analyze algorithms for distributionally robust optimization of convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence uncertainty sets. We prove that our algorithms require a number of gradient evaluations independent of training set size and number of parameters, making them suitable for large-scale applications. For $\chi^2$ uncertainty sets these are the first such guarantees in the literature, and for CVaR our guarantees scale linearly in the uncertainty level rather than quadratically as in previous work. We also provide lower bounds proving the worst-case optimality of our algorithms for CVaR and a penalized version of the $\chi^2$ problem. Our primary technical contributions are novel bounds on the bias of batch robust risk estimation and the variance of a multilevel Monte Carlo gradient estimator due to [Blanchet & Glynn, 2015]. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
翻译:我们提出并分析各种算法,以分配上稳健的方式优化有风险有条件价值(CVaR)和美元/chi ⁇ 2美元差异性不确定组合。我们证明我们的算法需要若干梯度评价,而不受培训设定的大小和参数数的限制,使其适合大规模应用。对于$/chi ⁇ 2美元的不确定性组,这些是文献中的第一个此类保障,对于CVaR,我们的保障规模在不确定性水平上是线性,而不是像以前的工作那样是二次的。我们还提供了较低的界限,证明我们计算CVaR的算法最差的情况最佳,以及受罚的2美元问题版本。我们的主要技术贡献是因[Blanchet & Glynn,2015年]而导致的分批稳健风险估算和多层蒙特卡洛梯度估测仪的偏差的偏差新界限。对MNIST和图像网的实验证实了我们的算法的理论尺度,其效率比全包套方法高9-36倍。