Depth prediction is a critical problem in robotics applications especially autonomous driving. Generally, depth prediction based on binocular stereo matching and fusion of monocular image and laser point cloud are two mainstream methods. However, the former usually suffers from overfitting while building cost volume, and the latter has a limited generalization due to the lack of geometric constraint. To solve these problems, we propose a novel multimodal neural network, namely UAMD-Net, for dense depth completion based on fusion of binocular stereo matching and the weak constrain from the sparse point clouds. Specifically, the sparse point clouds are converted to sparse depth map and sent to the multimodal feature encoder (MFE) with binocular image, constructing a cross-modal cost volume. Then, it will be further processed by the multimodal feature aggregator (MFA) and the depth regression layer. Furthermore, the existing multimodal methods ignore the problem of modal dependence, that is, the network will not work when a certain modal input has a problem. Therefore, we propose a new training strategy called Modal-dropout which enables the network to be adaptively trained with multiple modal inputs and inference with specific modal inputs. Benefiting from the flexible network structure and adaptive training method, our proposed network can realize unified training under various modal input conditions. Comprehensive experiments conducted on KITTI depth completion benchmark demonstrate that our method produces robust results and outperforms other state-of-the-art methods.
翻译:深度预测是机器人应用中的一个关键问题,特别是自主驱动。一般而言,基于双筒立体相匹配以及单镜图像和激光点云融合的深度预测是两种主流方法。然而,前者通常在建筑成本量的同时会受到过度装配,而后者则由于缺乏几何限制而具有有限的概括性。为了解决这些问题,我们提议建立一个新型的多式联运神经网络,即UAMD-Net,以基于双筒立体配对接和稀疏云的微弱限制进行密集深度完成。具体地说,稀露点云被转换成稀薄深度地图,并发送到带有双筒图像的多式联运特征编码器(MFE),建造一个跨模式成本量。然后,由于缺乏几何几何几度限制,后者将进一步被广泛处理。此外,现有的多式联运方法忽视了模式依赖性的问题,也就是说,当某种模式投入存在问题时,该网络将无法运作。因此,我们提出了一个新的培训战略,称为Modal-投射法,使网络能够适应性地训练多式的多式模型,从而实现我们所拟采用的具体标准化方法,从而在具体完成方法下实现我们的具体的模型培训方法。