In prediction tasks, there exist features that are related to the label in the same way across different settings for that task; these are semantic features or semantics. Features with varying relationships to the label are nuisances. For example, in detecting cows from natural images, the shape of the head is a semantic but because images of cows often have grass backgrounds but not always, the background is a nuisance. Relationships between a nuisance and the label are unstable across settings and, consequently, models that exploit nuisance-label relationships face performance degradation when these relationships change. Direct knowledge of a nuisance helps build models that are robust to such changes, but requires extra annotations beyond labels and covariates. In this paper, we develop an alternative way to produce robust models by data augmentation. These data augmentations corrupt semantic information to produce models that identify and adjust for where nuisances drive predictions. We study semantic corruptions in powering different spurious-correlation avoiding methods on multiple out-of distribution (OOD) tasks like classifying waterbirds, natural language inference (NLI), and detecting cardiomegaly in chest X-rays.
翻译:在预测任务中,存在着与该任务不同环境的标签同样的方式相关的特征;这些特征是语义特征或语义学。与标签有不同关系的特征有麻烦。例如,在从自然图像中检测牛群时,头部的形状是一种语义学,但因为牛的图像往往具有草本背景,但并非总有其背景,因此背景是一种麻烦。骚扰和标签之间的关系在各个环境之间是不稳定的,因此,利用骚扰标签关系的模式在这种关系发生变化时会面临性能退化。对骚扰的直接了解有助于构建对此类变化具有强大作用的模型,但需要额外的说明,而不只是标签和变异。在本文中,我们开发了一种通过数据增强产生稳健模型的替代方法。这些数据增加了腐败的语义信息,以生成模型,从而识别和调整扰乱因素促使预测的地方。我们研究了不同场合在利用骚扰标签关系改变时会面临性功能退化的模型。关于扰动性关系的直接知识有助于建立对多种分配方法(ODG)的模型,但需要额外的说明,例如水鸟、天然语言的分类。</s>