We present a novel real-time capable learning method that jointly perceives a 3D scene's geometry structure and semantic labels. Recent approaches to real-time 3D scene reconstruction mostly adopt a volumetric scheme, where a truncated signed distance function (TSDF) is directly regressed. However, these volumetric approaches tend to focus on the global coherence of their reconstructions, which leads to a lack of local geometrical detail. To overcome this issue, we propose to leverage the latent geometrical prior knowledge in 2D image features by explicit depth prediction and anchored feature generation, to refine the occupancy learning in TSDF volume. Besides, we find that this cross-dimensional feature refinement methodology can also be adopted for the semantic segmentation task. Hence, we proposed an end-to-end cross-dimensional refinement neural network (CDRNet) to extract both 3D mesh and 3D semantic labeling in real time. The experiment results show that the proposed method achieves state-of-the-art 3D perception efficiency on multiple datasets, which indicates the great potential of our method for industrial applications.
翻译:我们提出了一个新型的实时学习能力方法,共同看待三维场景的几何结构和语义标签。最近对实时三维场景重建的方法大多采用体积方法,即短径签名远程功能(TSDF)直接回归。然而,这些体积方法往往侧重于其重建的全球一致性,导致缺乏本地几何细节。为了克服这一问题,我们提议通过清晰的深度预测和定位地貌生成来利用2D图像特征的潜在几何前知识,完善TSDF体体积的占用学习。此外,我们发现,对于语义分解任务,也可以采用这种跨维维特性改进方法。因此,我们提议了一个端到端的跨维改进神经网络(CDRNet),实时提取3D网和3D语义符号标签。实验结果显示,拟议方法在多套数据集上达到了最先进的3D认知效率,这显示了我们工业应用方法的巨大潜力。</s>