Neural image classifiers are known to undergo severe performance degradation when exposed to input that exhibits covariate-shift with respect to the training distribution. Successful hand-crafted augmentation pipelines aim at either approximating the expected test domain conditions or to perturb the features that are specific to the training environment. The development of effective pipelines is typically cumbersome, and produce transformations whose impact on the classifier performance are hard to understand and control. In this paper, we show that recent Text-to-Image (T2I) generators' ability to simulate image interventions via natural-language prompts can be leveraged to train more robust models, offering a more interpretable and controllable alternative to traditional augmentation methods. We find that a variety of prompting mechanisms are effective for producing synthetic training data sufficient to achieve state-of-the-art performance in widely-adopted domain-generalization benchmarks and reduce classifiers' dependency on spurious features. Our work suggests that further progress in T2I generation and a tighter integration with other research fields may represent a significant step towards the development of more robust machine learning systems.
翻译:已知神经图像分类在接触显示培训分布发生变迁的投入时会发生严重的性能退化。成功的手工制作增强管道旨在接近预期的测试域条件或干扰培训环境特有的特征。开发有效的管道通常很繁琐,并产生对分类性能影响难以理解和控制的转化。在本文中,我们表明,可以利用最近通过文字到图像生成器(T2I)通过自然语言提示模拟图像干预的能力来培养更强有力的模型,为传统增强方法提供更可解释和可控制的替代方法。我们发现,各种促进机制对于制作合成培训数据是有效的,足以在广泛采用的一般域化基准中达到最新性能,减少分类者对虚假特征的依赖。我们的工作表明,T2I生成的进一步发展和与其他研究领域更紧密的整合可能是朝向发展更健全的机器学习系统迈出的重要一步。