We propose EM-PASTE: an Expectation Maximization(EM) guided Cut-Paste compositional dataset augmentation approach for weakly-supervised instance segmentation using only image-level supervision. The proposed method consists of three main components. The first component generates high-quality foreground object masks. To this end, an EM-like approach is proposed that iteratively refines an initial set of object mask proposals generated by a generic region proposal method. Next, in the second component, high-quality context-aware background images are generated using a text-to-image compositional synthesis method like DALL-E. Finally, the third component creates a large-scale pseudo-labeled instance segmentation training dataset by compositing the foreground object masks onto the original and generated background images. The proposed approach achieves state-of-the-art weakly-supervised instance segmentation results on both the PASCAL VOC 2012 and MS COCO datasets by using only image-level, weak label information. In particular, it outperforms the best baseline by +7.4 and +2.8 mAP0.50 on PASCAL and COCO, respectively. Further, the method provides a new solution to the long-tail weakly-supervised instance segmentation problem (when many classes may only have few training samples), by selectively augmenting under-represented classes.
翻译:我们建议EM-PASTE: 仅使用图像监督的图像级监督, 使用微弱监督的图像分解使用 Cut- Paste 指导的 Cut- Paste 合成数据集增强法, 用于微弱监督的图像分解。 建议的方法由三个主要部分组成。 第一个组件生成了高质量的表面对象面罩。 为此, 提议了一个类似EM- 的方法, 迭接地完善了由通用区域建议方法产生的一套初步对象面罩建议。 其次, 在第二个组件中, 使用像 DALL- E 这样的文字到模拟合成合成法生成高质量的背景图像。 最后, 第三个组件创建了一个大型假标签的图像分解培训数据, 由原始和生成的背景图像合成的表面对象面罩组成。 拟议的方法通过仅使用图像级的 VOCC 2012 和 MS CO COCO 数据集, 仅使用图像级的、 微弱标签级的标签信息, 特别是, 以+7.4 和 低级的 CO- 级的分类为最佳基线, 分别通过微级的 ASA- ser- supal ASA- suple 提供许多 AS- sem- supal- AS- AS- supal- supal- AS- supal- AS- supal- supal- supal- supal- supal- supal- supal- supal- sy AS- sild AS- supal- supal- sal- sal- sal- AS- AS- AS- sal- AS- sal- AS- AS- sqyleg- sal- sem- sal- sal- sem- sem- AS- AS- sal- sal- sem- AS- sal- sal- sal- sal- AS- AS- AS- AS- AS- AS- AS- AS- AS- sal- sal- sal- sal- sal- AS- sal- sal- sal- sil- sal- sal- sal- sil- AS- AS- sal- sil