In this paper, we address the semantic segmentation problem with a focus on the context aggregation strategy. Our motivation is that the label of a pixel is the category of the object that the pixel belongs to. We present a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. First, we learn object regions under the supervision of the ground-truth segmentation. Second, we compute the object region representation by aggregating the representations of the pixels lying in the object region. Last, % the representation similarity we compute the relation between each pixel and each object region, and augment the representation of each pixel with the object-contextual representation which is a weighted aggregation of all the object region representations according to their relations with the pixel. We empirically demonstrate that the proposed approach achieves competitive performance on various challenging semantic segmentation benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff. Cityscapes, ADE20K, LIP, PASCAL-Context and COCO-Stuff. Our submission "HRNet + OCR + SegFix" achieves the 1-st place on the Cityscapes leaderboard by the time of submission. Code is available at: https://git.io/openseg and https://git.io/HRNet.OCR.
翻译:在本文中,我们处理语义分解问题, 重点是背景聚合战略。 我们的动机是像素标签是像素所属对象的类别。 我们提出简单而有效的方法, 对象- 逻辑表达, 利用相应对象类的表示来描述像素。 首先, 我们在地面- 真相分割的监督下学习目标区域。 第二, 我们计算目标区域代表, 方法是将位于目标区域的像素表示方式集中起来。 最后, 我们的表示方式与每个像素和每个目标区域之间的关系相似, 并且用对象- 逻辑表达方式增加每个像素的表示方式。 这是所有对象区域代表方式的加权组合, 与其与像素类别的关系相符。 我们从经验上证明, 提议的方法是在各种具有挑战性的分解基准上取得竞争性的表现: 城市背景、 ADE20K、 LIP、 PASACAL- Context 和 CO- CO- CO- HR 、 HR HR 、 ADE20K、 LCR- 和 SEAR- F Text 的提交地 。 我们的提交地点- Secreal- 和 Secreal- http- http- http- http- http- real- http- http- 的提交.