Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous works utilize object instances either from human-annotated instance segmentation datasets or rendered from 3D object models, and both approaches are too expensive to scale up to obtain good diversity. In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion). We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable. To make such success happen, we design a data acquisition and processing framework, dubbed "X-Paste", upon which a systematic study is conducted. On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP +6.5 mask AP on long-tail classes.
翻译:复制面板是一种简单而有效的数据增强策略, 例如 分割 。 通过随机将对象实例粘贴到新的背景图像上, 它为免费创建新的培训数据, 并大大提升分割性功能, 特别是稀有对象类别 。 虽然复制面板中使用了各种不同的、 高质量的对象实例, 从而带来更多的性能收益, 以前的作品使用人类附加说明的实例分解数据集或3D 对象模型生成的物体实例, 这两种方法都过于昂贵, 无法扩大范围以获得良好的多样性 。 在本文中, 我们用新出现的零分解识别模型( 如 CLIP) 和 文本2 image 模型( 如 StacastDiflation) 的力量, 创造新的、 高质量的、 高质量的、 高品质的和 格式化的图像。 我们第一次展示的是, 使用文本2 模型模型模型生成图像或零分辨识辨识模型, 将不同对象类别 6 的APPS 6 6 图像进行真正的缩放。 为了实现这样的成功, 我们设计了一个数据获取和处理框架, dubbbed " X- Pastele " e- past " le " elevel " 5 " 和 " e- past " ” 。