For best performance, today's semantic segmentation methods use large and carefully labeled datasets, requiring expensive annotation budgets. In this work, we show that coarse annotation is a low-cost but highly effective alternative for training semantic segmentation models. Considering the urban scene segmentation scenario, we leverage cheap coarse annotations for real-world captured data, as well as synthetic data to train our model and show competitive performance compared with finely annotated real-world data. Specifically, we propose a coarse-to-fine self-training framework that generates pseudo labels for unlabeled regions of the coarsely annotated data, using synthetic data to improve predictions around the boundaries between semantic classes, and using cross-domain data augmentation to increase diversity. Our extensive experimental results on Cityscapes and BDD100k datasets demonstrate that our method achieves a significantly better performance vs annotation cost tradeoff, yielding a comparable performance to fully annotated data with only a small fraction of the annotation budget. Also, when used as pretraining, our framework performs better compared to the standard fully supervised setting.
翻译:为了最佳性能,今天的语义分割方法使用大而谨慎的标签分类数据集,这需要昂贵的注释预算。在这项工作中,我们表明粗略的注解是一种低成本但非常有效的替代方法,用于培训语义分割模型。考虑到城市地区的分解设想,我们利用廉价粗略的注释来培训真实世界所捕取的数据,以及合成数据来培训我们的模型和显示竞争性的性能,而与细巧的注解真实世界数据相比,我们的方法是显著改进性能与注解成本的权衡。具体地说,我们提议了一个粗略到细微的自我培训框架,为粗略的注解数据中未标出假标签,使用合成数据来改进语义类之间界限周围的预测,并利用跨多面数据增强多样性。我们在城市景象和BDDD100k数据集上的广泛实验结果表明,我们的方法比注解成本的权衡要好得多,产生可比的性能,而完全注解析的数据只有一小部分注的预算。此外,在作为培训前,我们的框架比标准得到充分监督的确定要好。