The power of Deep Neural Networks (DNNs) depends heavily on the training data quantity, quality and diversity. However, in many real scenarios, it is costly and time-consuming to collect and annotate large-scale data. This has severely hindered the application of DNNs. To address this challenge, we explore a new task of dataset expansion, which seeks to automatically create new labeled samples to expand a small dataset. To this end, we present a Guided Imagination Framework (GIF) that leverages the recently developed big generative models (e.g., DALL-E2) and reconstruction models (e.g., MAE) to "imagine" and create informative new data from seed data to expand small datasets. Specifically, GIF conducts imagination by optimizing the latent features of seed data in a semantically meaningful space, which are fed into the generative models to generate photo-realistic images with new contents. For guiding the imagination towards creating samples useful for model training, we exploit the zero-shot recognition ability of CLIP and introduce three criteria to encourage informative sample generation, i.e., prediction consistency, entropy maximization and diversity promotion. With these essential criteria as guidance, GIF works well for expanding datasets in different domains, leading to 29.9% accuracy gain on average over six natural image datasets, and 12.3% accuracy gain on average over three medical image datasets.
翻译:深神经网络(DNNS)的力量在很大程度上取决于培训数据的数量、质量和多样性。然而,在许多真实的情景中,收集和批注大型数据既费钱又费时。这严重阻碍了DNNS的应用。为了应对这一挑战,我们探索了扩大数据集的新任务,即自动创建新的标签样本,以扩大小型数据集。为此,我们提出了一个指导想象框架(GIF),利用最近开发的大型基因化模型(例如DALL-E2)和重建模型(例如MAE)来“想象”和从种子数据中创建信息性的新数据,以扩大小型数据集。具体地说,GIF通过优化在具有语义意义的空间的种子数据潜在特征来发挥想象力,以生成具有新内容的摄影现实图像。为了引导想象力,我们利用了CLIP的零光识别能力和重建模型(例如MAE)来“想象”和从种子数据中创建信息性新数据,从种子数据中创建信息性新数据。具体地说,GIFS的想象力是通过三个模型生成的精确度,即获取率。