Perception is crucial for robots that act in real-world environments, as autonomous systems need to see and understand the world around them to act appropriately. Panoptic segmentation provides an interpretation of the scene by computing a pixel-wise semantic label together with instance IDs. In this paper, we address panoptic segmentation using RGB-D data of indoor scenes. We propose a novel encoder-decoder neural network that processes RGB and depth separately through two encoders. The features of the individual encoders are progressively merged at different resolutions, such that the RGB features are enhanced using complementary depth information. We propose a novel merging approach called ResidualExcite, which reweighs each entry of the feature map according to its importance. With our double-encoder architecture, we are robust to missing cues. In particular, the same model can train and infer on RGB-D, RGB-only, and depth-only input data, without the need to train specialized models. We evaluate our method on publicly available datasets and show that our approach achieves superior results compared to other common approaches for panoptic segmentation.
翻译:感知对于在现实世界环境中运作的机器人来说至关重要,因为自主系统需要看到和理解周围世界的机器人来适当采取行动。 光谱分割通过计算像素的语义标签和实例ID来解释场景。 在本文中,我们使用室内场景的 RGB-D 数据处理光学分割问题。 我们提议建立一个新颖的编码器脱色神经网络,通过两个编码器分别处理 RGB 和深度。 单个编码器的特性在不同分辨率上逐渐合并, 以便利用补充深度信息强化 RGB 特征。 我们提议一种叫作残余Excite 的新型合并方法, 它将特征图的每个条目按其重要性重新连接起来。 我们使用双编码器结构, 我们很能忽略线索。 特别是, 同一模型可以培训和推导出 RGB- D, 仅RGB- 和仅深度输入数据, 而不需要培训专门模型。 我们评估了我们关于公开数据集的方法, 并显示我们的方法比其他通用的全段路方法更优。