In this paper, we address the semantic segmentation problem with a focus on the context aggregation strategy. Our motivation is that the label of a pixel is the category of the object that the pixel belongs to. We present a simple yet effective approach, object-contextual representations, characterizing a pixel by exploiting the representation of the corresponding object class. First, we learn object regions under the supervision of the ground-truth segmentation. Second, we compute the object region representation by aggregating the representations of the pixels lying in the object region. Last, % the representation similarity we compute the relation between each pixel and each object region, and augment the representation of each pixel with the object-contextual representation which is a weighted aggregation of all the object region representations according to their relations with the pixel. We empirically demonstrate that the proposed approach achieves competitive performance on various challenging semantic segmentation benchmarks: Cityscapes, ADE20K, LIP, PASCAL-Context, and COCO-Stuff.
翻译:在本文中,我们处理语义分解问题, 重点是上下文聚合战略。 我们的动机是像素标签是像素所属对象的类别。 我们展示了简单而有效的方法, 对象- 逻辑表达方式, 通过利用相应对象类的表示方式来描述像素。 首先, 我们在地面- 真理分离的监督下学习对象区域。 第二, 我们计算对象区域代表方式, 将位于对象区域的像素表示方式汇总在一起。 最后, 我们计算每个像素和每个对象区域之间关系的相似性百分比, 并以对象- 逻辑表达方式增加每个像素的表示方式, 这是根据对象区域与像素的关系, 对所有对象区域表示方式进行加权组合。 我们从经验上证明, 拟议的方法在具有挑战性的语义分解基准上取得了竞争性的表现: 城市景、 ADE20K、 LIP、 PASACAL- Context 和CO- Stuffe。