Empowered by large datasets, e.g., ImageNet, unsupervised learning on large-scale data has enabled significant advances for classification tasks. However, whether the large-scale unsupervised semantic segmentation can be achieved remains unknown. There are two major challenges: i) we need a large-scale benchmark for assessing algorithms; ii) we need to develop methods to simultaneously learn category and shape representation in an unsupervised manner. In this work, we propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to help the research progress. Building on the ImageNet dataset, we propose the ImageNet-S dataset with 1.2 million training images and 50k high-quality semantic segmentation annotations for evaluation. Our benchmark has a high data diversity and a clear task objective. We also present a simple yet effective method that works surprisingly well for LUSS. In addition, we benchmark related un/weakly/fully supervised methods accordingly, identifying the challenges and possible directions of LUSS. The benchmark and source code is publicly available at https://github.com/LUSSeg.
翻译:由大型数据集授权的大型数据集,例如图像网,对大型数据进行未经监督的大规模语义分解(LUSS)的学习使分类任务取得重大进展。然而,大规模无监督的语义分解能否实现,尚不得而知。存在两大挑战:(1) 我们需要一个大型的算法评估基准;(2) 我们需要制定方法,以不受监督的方式同时学习类别和形成代表形式。在这项工作中,我们提出了一个新的问题,即大规模、不受监督的语义分解(LUSS),新创建了一个基准数据集,以帮助研究进展。在图像网数据集的基础上,我们提出图像网-S数据集,有120万个培训图像和50千个高质量的语义分解说明供评价使用。我们的基准具有高数据多样性和明确的任务目标。我们还提出了一个简单而有效的方法,对LUSSS来说效果令人惊讶。此外,我们据此对相关未经/严格监督的方法进行基准化,确定LUSSS的挑战和可能的方向。基准和源代码可在 https://sgiuth.orgeg.和源代码上公开查阅。