In this paper, we consider the instance segmentation task on a long-tailed dataset, which contains label noise, i.e., some of the annotations are incorrect. There are two main reasons making this case realistic. First, datasets collected from real world usually obey a long-tailed distribution. Second, for instance segmentation datasets, as there are many instances in one image and some of them are tiny, it is easier to introduce noise into the annotations. Specifically, we propose a new dataset, which is a large vocabulary long-tailed dataset containing label noise for instance segmentation. Furthermore, we evaluate previous proposed instance segmentation algorithms on this dataset. The results indicate that the noise in the training dataset will hamper the model in learning rare categories and decrease the overall performance, and inspire us to explore more effective approaches to address this practical challenge. The code and dataset are available in https://github.com/GuanlinLee/Noisy-LVIS.
翻译:在本文中,我们考虑长尾数据集的分解任务,该数据集包含标签噪音,即有些注释是不正确的。主要原因有两个:首先,从现实世界收集的数据集通常遵循长尾分布。第二,例如,分解数据集,因为一个图像中有许多实例,有些则很小,因此更容易在说明中引入噪音。具体地说,我们提议建立一个新的数据集,这是一个大型词汇,长尾数据集,含有标签噪音,例如分解。此外,我们评估了先前提议的该数据集的分解算法。结果显示,培训数据集中的噪音将妨碍学习稀有类别的模式,降低总体性能,并激励我们探索更有效的方法应对这一实际挑战。代码和数据集可在https://github.com/GualinLee/Noisy-LVIS中查阅。