Semantic segmentation is a challenging problem due to difficulties in modeling context in complex scenes and class confusions along boundaries. Most literature either focuses on context modeling or boundary refinement, which is less generalizable in open-world scenarios. In this work, we advocate a unified framework(UN-EPT) to segment objects by considering both context information and boundary artifacts. We first adapt a sparse sampling strategy to incorporate the transformer-based attention mechanism for efficient context modeling. In addition, a separate spatial branch is introduced to capture image details for boundary refinement. The whole model can be trained in an end-to-end manner. We demonstrate promising performance on three popular benchmarks for semantic segmentation with low memory footprint. Code will be released soon.
翻译:由于在复杂场景和边界沿线各阶层的混乱中难以建模,因此,语义分割是一个具有挑战性的问题,因为难以在复杂的场景中进行建模,因此,大多数文献侧重于背景建模或边界改进,在开放世界的情景中,这种改进不太普遍;在这项工作中,我们主张通过考虑背景信息和边界文物,为分割对象建立一个统一的框架(UN-EPT);我们首先调整一个稀少的抽样战略,以纳入基于变压器的注意机制,从而高效地进行背景建模;此外,还引入一个单独的空间分支,以收集图像细节,进行边界改进;整个模型可以以端到端的方式加以培训;我们展示了在三种常用的语义分割基准和低记忆足迹方面的良好业绩;守则将很快发布。