There exist features that are related to the label in the same way across different settings for that task; these are semantic features or semantics. Features with varying relationships to the label are nuisances. For example, in detecting cows from natural images, the shape of the head is a semantic and because images of cows often have grass backgrounds but only in certain settings, the background is a nuisance. Relationships between a nuisance and the label are unstable across settings and, consequently, models that exploit nuisance-label relationships face performance degradation when these relationships change. Direct knowledge of a nuisance helps build models that are robust to such changes, but knowledge of a nuisance requires extra annotations beyond the label and the covariates. In this paper, we develop an alternative way to produce robust models by data augmentation. These data augmentations corrupt semantic information to produce models that identify and adjust for where nuisances drive predictions. We study semantic corruptions in powering different robust-modeling methods for multiple out-of distribution (OOD) tasks like classifying waterbirds, natural language inference, and detecting Cardiomegaly in chest X-rays.
翻译:与该任务不同环境的标签有相同的特征; 这些是语义特征或语义学。 与该标签有不同关系的特征有麻烦。 例如, 在从自然图像中检测牛牛时, 头部的形状是一个语义学特征, 因为牛的图像往往具有草本背景, 但只有在某些环境里, 其背景是一种麻烦。 骚扰和标签之间的关系在各个环境里是不稳定的, 因此, 利用扰动标签关系的模型在关系发生变化时会面临性能退化。 直接了解扰动有助于建立能够适应这种变化的模型, 但是对调动的认知需要额外说明。 在本文中, 我们开发了一种通过数据增强生成稳健型模型的替代方法。 这些数据放大了语义信息, 以生成模型来识别和调整骚扰驱动预测的方位。 我们研究在为多重外传布任务( OODD) 提供不同稳健模型的腐败问题, 例如对水鸟的胸部、 天然语言 进行分解。