3D spatial information is known to be beneficial to the semantic segmentation task. Most existing methods take 3D spatial data as an additional input, leading to a two-stream segmentation network that processes RGB and 3D spatial information separately. This solution greatly increases the inference time and severely limits its scope for real-time applications. To solve this problem, we propose Spatial information guided Convolution (S-Conv), which allows efficient RGB feature and 3D spatial information integration. S-Conv is competent to infer the sampling offset of the convolution kernel guided by the 3D spatial information, helping the convolutional layer adjust the receptive field and adapt to geometric transformations. S-Conv also incorporates geometric information into the feature learning process by generating spatially adaptive convolutional weights. The capability of perceiving geometry is largely enhanced without much affecting the amount of parameters and computational cost. We further embed S-Conv into a semantic segmentation network, called Spatial information Guided convolutional Network (SGNet), resulting in real-time inference and state-of-the-art performance on NYUDv2 and SUNRGBD datasets.
翻译:已知3D空间信息有益于语义分割任务。大多数现有方法将 3D 空间数据作为附加输入,形成双流分割网络,分别处理 RGB 和 3D 空间信息。这一解决方案大大增加了推断时间,严重限制了实时应用的范围。为解决这一问题,我们提议空间信息引导演进(S-Conv),允许高效的 RGB 特征和 3D 空间信息整合。S-Conv 有能力推断3D 空间信息引导的演进内核的取样抵消,帮助卷发层调整可接收场并适应几何转换。S-Conv 还将几何信息纳入地貌学习过程,生成空间适应性振动权重。感测几何能力在很大程度上得到了增强,但不影响参数和计算成本的大小。我们进一步将S-Conv 嵌入一个语义分割网络,称为空间信息引导进化网络(SGNet ),导致NYUDVv2 和 SURGBD 数据设置的实时推断和状态性表现。