群体差异性学习什么时候才能在学习中生存净化的折叠? (When Does Group Invariant Learning Survive Spurious Correlations?)

By inferring latent groups in the training data, recent works introduce invariant learning to the case where environment annotations are unavailable. Typically, learning group invariance under a majority/minority split is empirically shown to be effective in improving out-of-distribution generalization on many datasets. However, theoretical guarantee for these methods on learning invariant mechanisms is lacking. In this paper, we reveal the insufficiency of existing group invariant learning methods in preventing classifiers from depending on spurious correlations in the training set. Specifically, we propose two criteria on judging such sufficiency. Theoretically and empirically, we show that existing methods can violate both criteria and thus fail in generalizing to spurious correlation shifts. Motivated by this, we design a new group invariant learning method, which constructs groups with statistical independence tests, and reweights samples by group label proportion to meet the criteria. Experiments on both synthetic and real data demonstrate that the new method significantly outperforms existing group invariant learning methods in generalizing to spurious correlation shifts.

翻译：通过在培训数据中推断潜在群体,最近的工作在环境说明无法提供的情况下引入了不变化的学习。通常,多数/少数群体差异下的学习群体差异在经验上表明,在改进许多数据集的分布外概括方面是有效的。然而,在学习差异机制方面缺乏对这些方法的理论保障。在本文件中,我们揭示了现有群体差异学习方法不足以防止分类者依赖成套培训中的虚假相关性。具体地说,我们提出了判断这种充分性的两个标准。从理论上和从经验上看,我们表明现有方法可能违反两个标准,从而无法普遍采用虚假的相关性变化。我们为此设计了新的差异学习方法,通过统计独立测试和按群体标签比例对样本进行再加权,以达到标准。关于合成和真实数据的实验表明,新的方法大大优于现有群体差异学习方法,从而普遍地形成了虚假的相关性变化。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日