Determining the distance between the objects in a scene and the camera sensor from 2D images is feasible by estimating depth images using stereo cameras or 3D cameras. The outcome of depth estimation is relative distances that can be used to calculate absolute distances to be applicable in reality. However, distance estimation is very challenging using 2D monocular cameras. This paper presents a deep learning framework that consists of two deep networks for depth estimation and object detection using a single image. Firstly, objects in the scene are detected and localized using the You Only Look Once (YOLOv5) network. In parallel, the estimated depth image is computed using a deep autoencoder network to detect the relative distances. The proposed object detection based YOLO was trained using a supervised learning technique, in turn, the network of depth estimation was self-supervised training. The presented distance estimation framework was evaluated on real images of outdoor scenes. The achieved results show that the proposed framework is promising and it yields an accuracy of 96% with RMSE of 0.203 of the correct absolute distance.
翻译:通过使用立体摄像机或立体摄像机估计深度图像,确定场景中物体与摄像传感器与2D图像之间的距离是可行的。深度估计的结果是相对距离,可以用来计算实际应用的绝对距离。然而,使用2D单体照相机的距离估计非常具有挑战性。本文提出了一个深层次学习框架,由两个深层网络组成,以便用单一图像进行深度估计和物体探测。首先,利用“你一眼一眼”(YOLOv5)网络对场景中物体进行探测和本地化。与此同时,估计深度图像是使用一个深层自动编码网络来计算,以探测相对距离。拟议的以YOLO为主的物体探测方法经过了培训,而深度估计网络则是使用监督的学习技术进行自我监督培训。所提出的距离估计框架是用真实的室外景图像来评价的。已实现的结果显示,拟议框架很有希望,其精确度为96%,而 RMSE为0.23的准确绝对距离为0.23。