Data augmentation has been established as an efficacious approach to supplement useful information for low-resource datasets. Traditional augmentation techniques such as noise injection and image transformations have been widely used. In addition, generative data augmentation (GDA) has been shown to produce more diverse and flexible data. While generative adversarial networks (GANs) have been frequently used for GDA, they lack diversity and controllability compared to text-to-image diffusion models. In this paper, we propose TTIDA (Text-to-Text-to-Image Data Augmentation) to leverage the capabilities of large-scale pre-trained Text-to-Text (T2T) and Text-to-Image (T2I) generative models for data augmentation. By conditioning the T2I model on detailed descriptions produced by T2T models, we are able to generate photo-realistic labeled images in a flexible and controllable manner. Experiments on in-domain classification, cross-domain classification, and image captioning tasks show consistent improvements over other data augmentation baselines. Analytical studies in varied settings, including few-shot, long-tail, and adversarial, further reinforce the effectiveness of TTIDA in enhancing performance and increasing robustness.
翻译:数据增强已经被证明是一种有效的方法,可为低资源数据集提供有用信息。传统的增强技术,如噪声注入和图像变换已经被广泛使用。此外,已经证明生成式数据增强(GDA)可以产生更多样化和灵活的数据。然而,与文本到图像扩散模型相比,生成对抗网络(GANs)缺乏多样性和可控性。在本文中,我们提出了TTIDA (文本到文本到图像数据增强),利用大规模预训练的文本到文本(T2T)和文本到图像(T2I)生成模型的能力进行数据增强。通过在T2T模型生成的详细描述上进行条件约束,我们能够以灵活和可控的方式生成逼真的带标签图像。在领域内分类、跨领域分类和图像字幕任务上的实验显示,TTIDA相较于其他增强技术具有更好的性能。在少样本、长尾和敌对等不同的设置下进行的分析研究进一步加强了TTIDA在提高性能和增强鲁棒性方面的有效性。