Neural image classifiers are known to undergo severe performance degradation when exposed to input that exhibits covariate shift with respect to the training distribution. In this paper, we show that recent Text-to-Image (T2I) generators' ability to edit images to approximate interventions via natural-language prompts is a promising technology to train more robust classifiers. Using current open-source models, we find that a variety of prompting strategies are effective for producing augmented training datasets sufficient to achieve state-of-the-art performance (1) in widely adopted Single-Domain Generalization benchmarks, (2) in reducing classifiers' dependency on spurious features and (3) facilitating the application of Multi-Domain Generalization techniques when fewer training domains are available.
翻译: