Getting the distance to objects is crucial for autonomous vehicles. In instances where depth sensors cannot be used, this distance has to be estimated from RGB cameras. As opposed to cars, the task of estimating depth from on-board mounted cameras is made complex on drones because of the lack of constrains on motion during flights. %In the case of drones, this task is even more complex than for car-mounted cameras since the camera motion is unconstrained. In this paper, we present a method to estimate the distance of objects seen by an on-board mounted camera by using its RGB video stream and drone motion information. Our method is built upon a pyramidal convolutional neural network architecture and uses time recurrence in pair with geometric constraints imposed by motion to produce pixel-wise depth maps. %from a RGB video stream of a camera attached to the drone In our architecture, each level of the pyramid is designed to produce its own depth estimate based on past observations and information provided by the previous level in the pyramid. We introduce a spatial reprojection layer to maintain the spatio-temporal consistency of the data between the levels. We analyse the performance of our approach on Mid-Air, a public drone dataset featuring synthetic drone trajectories recorded in a wide variety of unstructured outdoor environments. Our experiments show that our network outperforms state-of-the-art depth estimation methods and that the use of motion information is the main contributing factor for this improvement. The code of our method is publicly available on GitHub; see $\href{https://github.com/michael-fonder/M4Depth}{\text{https://github.com/michael-fonder/M4Depth}}$
翻译:在无法使用深度传感器的情况下,这种距离必须从 RGB 相机中估算。相对于汽车,从机上安装的相机中估计深度的任务在无人机上变得复杂,因为飞行期间运动缺乏限制。% 就无人机而言,这项任务比汽车安装的相机更为复杂,因为相机运动不受控制。在本文中,我们用RGB 视频流和无人机运动信息来估计机上安装的相机所看到物体的距离。我们的方法是建立在金字塔的神经神经网络结构上,并且使用时间重现,同时使用运动产生的测深限制来制作像素深度地图。% 就无人机而言,这种任务比汽车安装的摄像机流更复杂得多,因为摄像机运动运动运动没有受到控制。 每一个水平的金字塔都是根据以往的观察结果和金字塔先前水平提供的信息来得出自己的深度估计。我们引入了空间再定位层,以保持数据在金字塔级的内流流流流流流流流数据的一致性。我们所记录到的无人机/运动网络的深度方法是我们内部的模型。