Getting the distance to objects is crucial for autonomous vehicles. In instances where depth sensors cannot be used, this distance has to be estimated from RGB cameras. As opposed to cars, the task of estimating depth from on-board mounted cameras is made complex on drones because of the lack of constrains on motion during flights. In this paper, we present a method to estimate the distance of objects seen by an on-board mounted camera by using its RGB video stream and drone motion information. Our method is built upon a pyramidal convolutional neural network architecture and uses time recurrence in pair with geometric constraints imposed by motion to produce pixel-wise depth maps. In our architecture, each level of the pyramid is designed to produce its own depth estimate based on past observations and information provided by the previous level in the pyramid. We introduce a spatial reprojection layer to maintain the spatio-temporal consistency of the data between the levels. We analyse the performance of our approach on Mid-Air, a public drone dataset featuring synthetic drone trajectories recorded in a wide variety of unstructured outdoor environments. Our experiments show that our network outperforms state-of-the-art depth estimation methods and that the use of motion information is the main contributing factor for this improvement. The code of our method is publicly available on GitHub; see https://github.com/michael-fonder/M4Depth
翻译:在无法使用深度传感器的情况下,必须使用 RGB 相机来估计距离。相对于汽车,从机上安装的相机估计深度的任务在无人驾驶飞机上变得复杂,因为飞行期间运动缺乏限制。在本文中,我们提出一种方法来估计机上安装的相机所看到物体的距离,使用其 RGB 视频流和无人机运动信息。我们的方法建在金字塔的同源神经网络结构上,并使用时间重现,同时使用运动所施加的几何限制来制作像素智能深度地图。在我们的结构中,每一级金字塔的设计都是根据以往观测和金字塔先前水平提供的信息来作出自己的深度估计的。我们采用了一个空间再预测层,以保持机上安装的相机所看到物体之间的空间-时空一致性。我们分析了我们在Mid-Air上的方法的性能,一个公共无人机数据集,以合成无人机的轨迹为对象,记录在各种不结构化的室外环境中。我们的网络的每层均设计,根据以往的观察和金字塔先前水平提供的深度估算结果。我们使用的网络是用于移动的深度的模型。