Two-branch network architecture has shown its efficiency and effectiveness for real-time semantic segmentation tasks. However, direct fusion of low-level details and high-level semantics will lead to a phenomenon that the detailed features are easily overwhelmed by surrounding contextual information, namely overshoot in this paper, which limits the improvement of the accuracy of existed two-branch models. In this paper, we bridge a connection between Convolutional Neural Network (CNN) and Proportional-Integral-Derivative (PID) controller and reveal that the two-branch network is nothing but a Proportional-Integral (PI) controller, which inherently suffers from the similar overshoot issue. To alleviate this issue, we propose a novel three-branch network architecture: PIDNet, which possesses three branches to parse the detailed, context and boundary information (derivative of semantics), respectively, and employs boundary attention to guide the fusion of detailed and context branches in final stage. The family of PIDNets achieve the best trade-off between inference speed and accuracy and their test accuracy surpasses all the existed models with similar inference speed on Cityscapes, CamVid and COCO-Stuff datasets. Especially, PIDNet-S achieves 78.6% mIOU with inference speed of 93.2 FPS on Cityscapes test set and 81.6% mIOU with speed of 153.7 FPS on CamVid test set.
翻译:两处网络架构展示了实时语义分割任务的效率和有效性,然而,直接融合低层次细节和高层次语义学将会导致一个现象,即详细特征很容易被周围背景信息所淹没,即本文件的过度拍摄,限制了现有两处模式的准确性。在本文件中,我们连接了革命神经网络(CNN)和比例-综合-诊断(PID)控制器之间的连接,并揭示了两处网络只不过是一个成比例-整体(PI)控制器,它本身就存在类似的超标问题。为了缓解这一问题,我们提议建立一个新的三处网络架构:PIDNet,它分别拥有三个分支来分析详细、上下文和边界信息(代表语义学)的准确性。我们利用边界注意在最后阶段指导详细和上下文分支的融合。PIDNet的家族在推论速度和准确性(PIPI) 控制器的精确度之间实现最佳贸易逆差,其测试准确性精确性超过了所有CMVIS的测试速度。