Building instance segmentation models that are data-efficient and can handle rare object categories is an important challenge in computer vision. Leveraging data augmentations is a promising direction towards addressing this challenge. Here, we perform a systematic study of the Copy-Paste augmentation ([13, 12]) for instance segmentation where we randomly paste objects onto an image. Prior studies on Copy-Paste relied on modeling the surrounding visual context for pasting the objects. However, we find that the simple mechanism of pasting objects randomly is good enough and can provide solid gains on top of strong baselines. Furthermore, we show Copy-Paste is additive with semi-supervised methods that leverage extra data through pseudo labeling (e.g. self-training). On COCO instance segmentation, we achieve 49.1 mask AP and 57.3 box AP, an improvement of +0.6 mask AP and +1.5 box AP over the previous state-of-the-art. We further demonstrate that Copy-Paste can lead to significant improvements on the LVIS benchmark. Our baseline model outperforms the LVIS 2020 Challenge winning entry by +3.6 mask AP on rare categories.
翻译:建立数据高效且能够处理稀有对象类别的例积分化模型是计算机愿景中的一项重要挑战。 利用数据扩增是应对这一挑战的一个很有希望的方向。 在这里, 我们系统地研究复制面板扩增( [13, 12] ) 的分解, 例如我们随机将对象粘贴在图像上。 复制面板的先前研究依赖于对周围视觉环境进行模型来粘贴对象。 然而, 我们发现随机粘贴物体的简单机制足够好,可以在强强基线上带来扎实的收益。 此外, 我们显示复制面板是半监督方法的添加剂, 通过假标签( 如自我培训) 来利用额外数据。 在 COCO例分解中, 我们实现了49.1 个防伪 AP 和 57.3 框 AP, 改进了 +0. 6 遮罩 AP 和 +1.5 箱 AP 。 我们还进一步证明, 复制面板可以在 LVIS 基准上带来显著的改进。 我们的基线模型超越了 LVIS 2020 挑战项由 +3.6 AS 以稀有类别赢得。