Stereoscopy exposits a natural perception of distance in a scene, and its manifestation in 3D world understanding is an intuitive phenomenon. However, an innate rigid calibration of binocular vision sensors is crucial for accurate depth estimation. Alternatively, a monocular camera alleviates the limitation at the expense of accuracy in estimating depth, and the challenge exacerbates in harsh environmental conditions. Moreover, an optical sensor often fails to acquire vital signals in harsh environments, and radar is used instead, which gives coarse but more accurate signals. This work explores the utility of coarse signals from radar when fused with fine-grained data from a monocular camera for depth estimation in harsh environmental conditions. A variant of feature pyramid network (FPN) extensively operates on fine-grained image features at multiple scales with a fewer number of parameters. FPN feature maps are fused with sparse radar features extracted with a Convolutional neural network. The concatenated hierarchical features are used to predict the depth with ordinal regression. We performed experiments on the nuScenes dataset, and the proposed architecture stays on top in quantitative evaluations with reduced parameters and faster inference. The depth estimation results suggest that the proposed techniques can be used as an alternative to stereo depth estimation in critical applications in robotics and self-driving cars. The source code will be available in the following: \url{https://github.com/MI-Hussain/RVMDE}.
翻译:光学感应器在严酷的环境中往往不能获得重要信号,而是使用雷达,提供粗糙但更准确的信号。这项工作探索了雷达的粗糙信号的效用,当雷达与单色相机的精细数据结合后,在严酷的环境条件下进行深度估计时,对望远镜传感器进行内在的僵硬校准对于准确的深度估计至关重要。或者,单色相机可以降低深度估计的精确度,而挑战在恶劣的环境条件下会加剧。此外,光学感应器往往不能在严酷的环境中获得重要信号,而雷达则使用这种雷达,提供粗糙但更准确的信号。这项工作探索了雷达的粗糙信号的效用,当雷达与单色相机的精密数据结合,在严酷的环境条件下进行深度估计时,对望远镜的精确度进行精确校准。地金字塔网络的变异型在多个尺度上广泛操作精细的图像特征,参数较少。 FPNPN特征图将结合成稀薄的雷达特征,通过电流神经网络来预测星级回归的深度。我们进行了实验,拟议的结构结构将维持在顶部的精确深度评估,在可选取的深度中,将显示为精确的深度。