Models trained on one set of domains often suffer performance drops on unseen domains, e.g., when wildlife monitoring models are deployed in new camera locations. In this work, we study principles for designing data augmentations for out-of-domain (OOD) generalization. In particular, we focus on real-world scenarios in which some domain-dependent features are robust, i.e., some features that vary across domains are predictive OOD. For example, in the wildlife monitoring application above, image backgrounds vary across camera locations but indicate habitat type, which helps predict the species of photographed animals. Motivated by theoretical analysis on a linear setting, we propose targeted augmentations, which selectively randomize spurious domain-dependent features while preserving robust ones. We prove that targeted augmentations improve OOD performance, allowing models to generalize better with fewer domains. In contrast, existing approaches such as generic augmentations, which fail to randomize domain-dependent features, and domain-invariant augmentations, which randomize all domain-dependent features, both perform poorly OOD. In experiments on three real-world datasets, we show that targeted augmentations set new states-of-the-art for OOD performance by 3.2-15.2%.
翻译:在一组领域培训的模型往往在无形领域出现性能下降,例如,当野生生物监测模型部署在新的摄像地点时,野生生物监测模型在新的摄像地点部署时,在这种工作中,我们研究设计数据增强外部(OOD)一般化的原则。特别是,我们侧重于一些依赖域特征具有强健性的现实世界情景,即不同领域的一些特征是预测性OOD。例如,在上文的野生生物监测应用中,图像背景不同,但显示生境类型,这有助于预测被拍照动物的物种。在对线性设置进行理论分析时,我们提议有选择地随机地将虚假的域依赖性特征用于保护强性特征。我们证明,有针对性的增强提高了OOD的性能,使模型能够以较少的领域更好地概括性能。相比之下,现有的方法,例如没有将依赖域的特性随机地分解,以及将所有依赖域的特性随机性能均不甚高。在三个真实世界数据集的实验中,我们提议有选择地随机随机随机地将虚假的域依赖域特性特征分化。我们证明,我们显示有目标的扩展的扩展的扩展3.2%-1.5的性性性性性状态。