当着部分群体标签的面争取集团强力 (Towards Group Robustness in the presence of Partial Group Labels)

Learning invariant representations is an important requirement when training machine learning models that are driven by spurious correlations in the datasets. These spurious correlations, between input samples and the target labels, wrongly direct the neural network predictions resulting in poor performance on certain groups, especially the minority groups. Robust training against these spurious correlations requires the knowledge of group membership for every sample. Such a requirement is impractical in situations where the data labeling efforts for minority or rare groups are significantly laborious or where the individuals comprising the dataset choose to conceal sensitive information. On the other hand, the presence of such data collection efforts results in datasets that contain partially labeled group information. Recent works have tackled the fully unsupervised scenario where no labels for groups are available. Thus, we aim to fill the missing gap in the literature by tackling a more realistic setting that can leverage partially available sensitive or group information during training. First, we construct a constraint set and derive a high probability bound for the group assignment to belong to the set. Second, we propose an algorithm that optimizes for the worst-off group assignments from the constraint set. Through experiments on image and tabular datasets, we show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.

翻译：当培训机器学习模型时,由于数据集中虚假的关联而引发的这些输入样本和目标标签之间的这些虚假关联,错误地引导神经网络预测导致某些群体,特别是少数群体的表现不佳。针对这些虚假关联的有力培训要求每个样本都有群体成员的知识。这种要求在为少数群体或稀有群体进行数据标签工作非常困难或组成数据集的个人选择隐藏敏感信息的情况下是不切实际的。另一方面,这种数据收集工作导致包含部分标签群体信息的数据集。最近的工作解决了完全不受监督的情景,没有为群体提供标签。因此,我们的目标是填补文献中缺失的空白,处理一个更现实的环境,在培训期间利用部分可用的敏感或群体信息。首先,我们设置了一个制约套套,并得出了属于一组任务的高概率。第二,我们提出一种算法,优化从组群任务中最坏的任务,在组群任务中含有部分标有标签的群信息。最近的工作已经解决了完全不受监督的情景,没有为群体标签。因此,我们的目标是通过在总体数据组合中进行绩效实验,同时在维护少数群体的图像和列表中显示总体数据的改进。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日