We study the problem of learning classifiers that perform well across (known or unknown) groups of data. After observing that common worst-group-accuracy datasets suffer from substantial imbalances, we set out to compare state-of-the-art methods to simple balancing of classes and groups by either subsampling or reweighting data. Our results show that these data balancing baselines achieve state-of-the-art-accuracy, while being faster to train and requiring no additional hyper-parameters. In addition, we highlight that access to group information is most critical for model selection purposes, and not so much during training. All in all, our findings beg closer examination of benchmarks and methods for research in worst-group-accuracy optimization.
翻译:我们研究了在各类(已知或未知)数据中表现优异的学习分类人员的问题。在发现常见的最坏群体准确性数据集存在严重失衡之后,我们开始比较最先进的方法,通过子抽样或重新加权数据来简单平衡类别和群体。我们的结果表明,这些平衡基线的数据达到了最先进的准确性,而培训速度更快,不需要额外的超参数。此外,我们强调,获取群体信息对于模式选择目的最为关键,在培训期间则并非如此。 最重要的是,我们的调查结果需要更仔细地研究最差群体准确性研究的基准和方法。