The lightweight MLP-based decoder has become increasingly promising for semantic segmentation. However, the channel-wise MLP cannot expand the receptive fields, lacking the context modeling capacity, which is critical to semantic segmentation. In this paper, we propose a parametric-free patch rotate operation to reorganize the pixels spatially. It first divides the feature map into multiple groups and then rotates the patches within each group. Based on the proposed patch rotate operation, we design a novel segmentation network, named PRSeg, which includes an off-the-shelf backbone and a lightweight Patch Rotate MLP decoder containing multiple Dynamic Patch Rotate Blocks (DPR-Blocks). In each DPR-Block, the fully connected layer is performed following a Patch Rotate Module (PRM) to exchange spatial information between pixels. Specifically, in PRM, the feature map is first split into the reserved part and rotated part along the channel dimension according to the predicted probability of the Dynamic Channel Selection Module (DCSM), and our proposed patch rotate operation is only performed on the rotated part. Extensive experiments on ADE20K, Cityscapes and COCO-Stuff 10K datasets prove the effectiveness of our approach. We expect that our PRSeg can promote the development of MLP-based decoder in semantic segmentation.
翻译:轻量级的基于 MLP 的解码器在语义分割领域变得越来越有前途。 然而,通道级的 MLP 无法扩展感受野,缺乏上下文建模能力,这对于语义分割至关重要。在本文中,我们提出了一个无需参数的 Patch Rotate 操作来重新组织像素的空间。它首先将特征图分成多个组,并在每个组中旋转补丁。基于提出的 Patch Rotate 操作,我们设计了一种新的分割网络,名为 PRSeg,其中包括一个现成的骨干和一个轻量级的 Patch Rotate MLP 解码器,包含多个动态 Patch Rotate 块(DPR-Blocks)。 在每个 DPR-Block 中,在 Patch Rotate 模块(PRM)之后执行全连接层以在像素之间交换空间信息。具体而言,在 PRM 中,特征图首先根据 Dynamic Channel Selection Module(DCSM) 的预测概率沿通道维划分为保留部分和旋转部分,我们提出的 Patch Rotate 操作仅在旋转部分上执行。在 ADE20K、Cityscapes 和 COCO-Stuff 10K 数据集上的大量实验证明了我们方法的有效性。我们期望我们的 PRSeg 能够促进 MLP-based 解码器在语义分割中的发展。