RGB-D semantic segmentation can be advanced with convolutional neural networks due to the availability of Depth data. Although objects cannot be easily discriminated by just the 2D appearance, with the local pixel difference and geometric patterns in Depth, they can be well separated in some cases. Considering the fixed grid kernel structure, CNNs are limited to lack the ability to capture detailed, fine-grained information and thus cannot achieve accurate pixel-level semantic segmentation. To solve this problem, we propose a Pixel Difference Convolutional Network (PDCNet) to capture detailed intrinsic patterns by aggregating both intensity and gradient information in the local range for Depth data and global range for RGB data, respectively. Precisely, PDCNet consists of a Depth branch and an RGB branch. For the Depth branch, we propose a Pixel Difference Convolution (PDC) to consider local and detailed geometric information in Depth data via aggregating both intensity and gradient information. For the RGB branch, we contribute a lightweight Cascade Large Kernel (CLK) to extend PDC, namely CPDC, to enjoy global contexts for RGB data and further boost performance. Consequently, both modal data's local and global pixel differences are seamlessly incorporated into PDCNet during the information propagation process. Experiments on two challenging benchmark datasets, i.e., NYUDv2 and SUN RGB-D reveal that our PDCNet achieves state-of-the-art performance for the semantic segmentation task.
翻译:RGB-D 语义分解由于深度数据的可用性能,可以与同导神经网络推进。 虽然目标不能轻易地被仅仅以 2D 外观来区分, 其深度的像素差异和几何模式也很容易被区分开来, 但在某些情况下它们也可以被很好地区分开来。 考虑到固定的网格内核结构, CNN 有限地缺乏捕捉详细、 精细拼接信息的能力, 因而无法实现准确的像素级语义分解。 为了解决这个问题, 我们提议建立一个像素差异共振网络网络( PDCNet), 通过将深度数据以及RGB数据的全球范围中的密度和梯度信息, 将本地范围的强度和梯度信息合并到本地范围的深度数据范围, 即深度数据、 深度数据、 深度数据、 深度数据、 深度数据、 深度数据、 深度数据、 深度数据、 深度数据、 深度数据、 深度数据、 深度的SGB 数据、 深度的SGB 数据、 基础数据。