Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications -- reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity. Existing algorithms for solving Wasserstein DRSL -- one of the most popular DRSL frameworks based around robustness to perturbations in the Wasserstein distance -- involve solving complex subproblems or fail to make use of stochastic gradients, limiting their use in large-scale machine learning problems. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable stochastic extra-gradient algorithms which provably achieve faster convergence rates than existing approaches. We demonstrate their effectiveness on synthetic and real data when compared to existing DRSL approaches. Key to our results is the use of variance reduction and random reshuffling to accelerate stochastic min-max optimization, the analysis of which may be of independent interest.
翻译:分布稳健的监管学习(DRSL)正在成为为现实应用建立可靠的机器学习系统的关键范例 -- -- 反映分类员和预测模型的需求,这些模型对于选择偏差或非静态等现象产生的分布变化具有很强的影响力。解决瓦塞尔斯坦DRSL的现有算法是建立在瓦塞尔斯坦距离内稳健和扰动周围的最受欢迎的DRSL框架之一 -- -- 涉及解决复杂的次级问题或不使用随机梯度,限制其在大型机器学习问题中的使用。我们通过微量最大优化的镜片重新审视瓦西尔斯坦 DRSL,并得出可调整和高效可执行的超梯度算法,这些算法比现有方法更快地达到趋同率。我们展示了与现有的DRSL方法相比,这些算法在合成数据和真实数据上的有效性。我们结果的关键是使用差异减少和随机调整来加速微量微轴的优化,对此的分析可能具有独立的兴趣。