Many deep learning tasks require annotations that are too time consuming for human operators, resulting in small dataset sizes. This is especially true for dense regression problems such as crowd counting which requires the location of every person in the image to be annotated. Techniques such as data augmentation and synthetic data generation based on simulations can help in such cases. In this paper, we introduce PromptMix, a method for artificially boosting the size of existing datasets, that can be used to improve the performance of lightweight networks. First, synthetic images are generated in an end-to-end data-driven manner, where text prompts are extracted from existing datasets via an image captioning deep network, and subsequently introduced to text-to-image diffusion models. The generated images are then annotated using one or more high-performing deep networks, and mixed with the real dataset for training the lightweight network. By extensive experiments on five datasets and two tasks, we show that PromptMix can significantly increase the performance of lightweight networks by up to 26%.
翻译:许多深层次的学习任务都需要说明,对于人类操作者来说,这些说明过于耗费时间,导致数据集大小过小。对于人群计数等密集的回归问题来说尤其如此,因为人群计数要求将图像中的每个人的位置加注。数据增强和基于模拟的合成数据生成等技术在这类情况下可以起到帮助作用。在本文中,我们引入了快速Mix,这是人为地提高现有数据集规模的一种方法,可以用来改善轻量网络的性能。首先,合成图像是以端到端数据驱动的方式生成的,即通过图像描述深层网络从现有数据集中提取文本提示,然后引入文本到图像扩散模型。随后产生的图像将使用一个或多个高性能深层网络附加说明,并与培训轻量网络的实际数据集混合。通过对五个数据集和两个任务进行广泛的实验,我们显示,快速Mix可以显著提高轻量网络的性能,达到26%。