3D object detection and dense depth estimation are one of the most vital tasks in autonomous driving. Multiple sensor modalities can jointly attribute towards better robot perception, and to that end, we introduce a method for jointly training 3D object detection and monocular dense depth reconstruction neural networks. It takes as inputs, a LiDAR point-cloud, and a single RGB image during inference and produces object pose predictions as well as a densely reconstructed depth map. LiDAR point-cloud is converted into a set of voxels, and its features are extracted using 3D convolution layers, from which we regress object pose parameters. Corresponding RGB image features are extracted using another 2D convolutional neural network. We further use these combined features to predict a dense depth map. While our object detection is trained in a supervised manner, the depth prediction network is trained with both self-supervised and supervised loss functions. We also introduce a loss function, edge-preserving smooth loss, and show that this results in better depth estimation compared to the edge-aware smooth loss function, frequently used in depth prediction works.
翻译:3D天体探测和密度深度估计是自主驱动中最重要的任务之一。 多传感器模式可以共同归结为更好的机器人感知,为此,我们引入了一种方法,用于联合培训3D天体探测和单望远镜密度深度重建神经网络。它吸收了输入、一个LiDAR点球和在推断过程中一个单一的 RGB 图像,并生成了物体作出预测以及一个密集重建的深度地图。LiDAR点球被转换成一组氧化物,其特性通过3D 变形层提取,我们从中反向对象提出参数。RGB 相对应的图像特征利用另一个2D 相光学网络提取。我们进一步使用这些组合特征来预测一个稠密的深度地图。虽然我们的天体探测是经过监督培训的,但我们的深度预测网络受到自我监督和监管的损失功能的培训。我们还引入了一种损失功能,边缘保持平稳损失,并显示这一结果与深度预测工作经常使用的边缘平稳损失功能相比,更深入估计。