Distributionally robust supervised learning (DRSL) is emerging as a key paradigm for building reliable machine learning systems for real-world applications -- reflecting the need for classifiers and predictive models that are robust to the distribution shifts that arise from phenomena such as selection bias or nonstationarity. Existing algorithms for solving Wasserstein DRSL -- one of the most popular DRSL frameworks based around robustness to perturbations in the Wasserstein distance -- have serious limitations that limit their use in large-scale problems -- in particular they involve solving complex subproblems and they fail to make use of stochastic gradients. We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable stochastic extra-gradient algorithms which provably achieve faster convergence rates than existing approaches. We demonstrate their effectiveness on synthetic and real data when compared to existing DRSL approaches. Key to our results is the use of variance reduction and random reshuffling to accelerate stochastic min-max optimization, the analysis of which may be of independent interest.
翻译:分布上稳健的监管学习(DRSL)正在成为建立可靠机器学习系统用于现实应用的关键范例 -- -- 反映分类和预测模型对于因选择偏差或非静态等现象产生的分布变化的强大需要。解决瓦塞尔斯坦DRSL的现有算法 -- -- 以瓦塞尔斯坦距离扰动为基础的最受欢迎的DRSL框架之一 -- -- 具有严重局限性,限制了其在大规模问题中的使用 -- -- 特别是它们涉及解决复杂的次级问题,而且没有利用随机梯度。我们通过微轴优化的透镜重新审视瓦塞尔斯坦 DRSL,并得出可调整和高效可执行的超梯度算法,这些算法比现有方法更快地实现趋同率。我们证明,与现有的DRSL方法相比,它们对于合成数据和实际数据的有效性。我们的结果是使用差异减少和随机调整来加速微轴优化,对此的分析可能具有独立的兴趣。