We consider the problem of training a classification model with group annotated training data. Recent work has established that, if there is distribution shift across different groups, models trained using the standard empirical risk minimization (ERM) objective suffer from poor performance on minority groups and that group distributionally robust optimization (Group-DRO) objective is a better alternative. The starting point of this paper is the observation that though Group-DRO performs better than ERM on minority groups for some benchmark datasets, there are several other datasets where it performs much worse than ERM. Inspired by ideas from the closely related problem of domain generalization, this paper proposes a new and simple algorithm that explicitly encourages learning of features that are shared across various groups. The key insight behind our proposed algorithm is that while Group-DRO focuses on groups with worst regularized loss, focusing instead, on groups that enable better performance even on other groups, could lead to learning of shared/common features, thereby enhancing minority performance beyond what is achieved by Group-DRO. Empirically, we show that our proposed algorithm matches or achieves better performance compared to strong contemporary baselines including ERM and Group-DRO on standard benchmarks on both minority groups and across all groups. Theoretically, we show that the proposed algorithm is a descent method and finds first order stationary points of smooth nonconvex functions.
翻译:我们考虑的是,在培训一个分类模式时使用集体附加说明的培训数据的问题。最近的工作已经确定,如果在不同群体之间进行分配转移,则使用标准经验风险最小化(ERM)目标培训的模型会因少数群体的表现不佳而受损,而群体分布强力优化(Group-DRO)目标是更好的替代办法。本文的出发点是,虽然Group-DRO在一些基准数据集方面比机构管理对少数群体的表现好,但还有其他几个数据集的运行情况比机构管理要差得多。在与领域一般化密切相关的观念的启发下,本文件提出了一个新的简单算法,明确鼓励学习不同群体共有的特点。我们提议的算法的主要见解是,虽然集团-DRO侧重于最常规化损失的群体,而不是侧重于能够提高其他群体业绩的群体,但可以导致学习共同/共同特征,从而提高少数群体的业绩,超出机构管理办公室所实现的绩效。我们设想的算法比强大的当代基准(包括机构风险管理和集团-DRO)要好。我们提议的算法的关键是,虽然小组-DRO侧重于所有少数群体的标准算法,但我们在标准等级和排序上都发现所有群体。