Many machine learning tasks involve subpopulation shift where the testing data distribution is a subpopulation of the training distribution. For such settings, a line of recent work has proposed the use of a variant of empirical risk minimization(ERM) known as distributionally robust optimization (DRO). In this work, we apply DRO to real, large-scale tasks with subpopulation shift, and observe that DRO performs relatively poorly, and moreover has severe instability. We identify one direct cause of this phenomenon: sensitivity of DRO to outliers in the datasets. To resolve this issue, we propose the framework of DORO, for Distributional and Outlier Robust Optimization. At the core of this approach is a refined risk function which prevents DRO from overfitting to potential outliers. We instantiate DORO for the Cressie-Read family of R\'enyi divergence, and delve into two specific instances of this family: CVaR and $\chi^2$-DRO. We theoretically prove the effectiveness of the proposed method, and empirically show that DORO improves the performance and stability of DRO with experiments on large modern datasets, thereby positively addressing the open question raised by Hashimoto et al., 2018.
翻译:许多机算学习任务涉及子人口变化,而测试数据分布是培训分布的一个子人口分布的分数。在这种背景下,最近的一项工作提议使用经验风险最小化(ERM)的变体,称为分布强力优化(DRO ) 。在这项工作中,我们将DRO应用到亚人口变化的真正、大规模任务中,并观察到DRO表现相对差,还有严重的不稳定性。我们确定了这一现象的一个直接原因:DRO对数据集外端的敏感度。为了解决这一问题,我们提出了DOR(DOR)的框架,用于分配和外部机器人优化。在这种方法的核心是完善的风险功能,防止DRO过度适应潜在的外端。我们为R\'eny 差异的Cressie-Read家族将DRO立即应用到DRO,并跳入这个家庭的两个具体例子:CVaR和$chi ⁇ 2$-DRO。我们理论上证明拟议方法的有效性,我们从经验上表明DRO改进DRO的性能和稳定性,从而通过大规模数据模型,积极解决HRO在2018年的公开实验中产生的问题。