【泡泡一分钟】在CPU上进行实时无监督单目深度估计

会员服务 ·

【泡泡一分钟】在CPU上进行实时无监督单目深度估计

2019 年 5 月 10 日 泡泡机器人SLAM

泡泡一分钟，带你精读机器人顶级会议文章

标题：Towards real-time unsupervised monocular depth estimation on CPU

作者：Matteo Poggi , Filippo Aleotti , Fabio Tosi , Stefano Mattoccia

来源：2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

翻译：张宁

审核：颜青松，陈世浪

欢迎个人转发朋友圈；其他机构或自媒体如需转载，后台留言申请授权

摘要

单个图像的无监督深度估计是一种非常有吸引力的技术，在机器人，自主导航，增强现实等方面具有多种意义。本主题代表了一项非常具有挑战性的任务，深度学习的出现使得能够以优异的成绩解决这一问题。但是，这些架构非常深刻和复杂。因此，仅通过利用耗电量大的GPU可以实现实时性能，所述GPU不允许在以低功率约束为特征的应用领域中推断深度图。为了解决这个问题，在本文中，我们提出了一种新颖的架构，能够使用从单个输入图像中提取的特征金字塔，在CPU甚至是嵌入式系统上快速推断出精确的深度图。

图1（顶部）来自KITTI数据集[1]的输入图像。最先进的无监督单目深度估计方法[2]（中）和提出的PyD-Net架构（底部）之间的定性比较。我们的模型在标准CPU上实时运行，并且在图中报告的最精确配置中，在Raspberry Pi 3的低功耗ARM CPU上采用1.7 s，具有整体功耗，包括网络摄像头，约3.5 W.

图2 PyD-Net架构。从输入图像中提取特征金字塔，并且在每个级别，浅网络在该分辨率处推断深度。然后将处理后的特征上采样到上述级别以细化估计，直到最高级别。

与现有技术类似，我们以无人监督的方式训练我们的网络，将深度估计作为图像重建问题。此外，通过交易效率的准确性，我们的网络允许分别推断大约2Hz和40Hz的地图，仍然比大多数最先进的慢速方法更准确。据我们所知，这是第一种在CPU上实现这种性能的方法，即使在嵌入式系统上也能为有效部署无监督单眼深度估计铺平道路。

Abstract

Unsupervised depth estimation from a single image is a very attractive technique with several implications in robotic, autonomous navigation, augmented reality and so on.This topic represents a very challenging task and the advent of deep learning enabled to tackle this problem with excellent results. However, these architectures are extremely deep and complex. Thus, real-time performance can be achieved only by leveraging power-hungry GPUs that do not allow to infer depth maps in application fields characterized by low-power constraints. To tackle this issue, in this paper we propose a novel architecture capable to quickly infer an accurate depth map on a CPU, even of an embedded system, using a pyramid of features extracted from a single input image. Similarly to state-of-the-art, we train our network in an unsupervised manner casting depth estimation as an image reconstruction problem.Extensive experimental results on the KITTI dataset show that compared to the top performing approach our network has similar accuracy but a much lower complexity (about 6% of parameters) enabling to infer a depth map for a KITTI image in about 1.7 s on the Raspberry Pi 3 and at more than 8 Hz on a standard CPU. Moreover, by trading accuracy for efficiency, our network allows to infer maps at about 2 Hz and 40 Hz respectively, still being more accurate than most state-of-the-art slower methods. To the best of our knowledge, it is the first method enabling such performance on CPUs paving the way for effective deployment of unsupervised monocular depth estimation even on embedded systems.

如果你对本文感兴趣，请点击点击阅读原文下载完整文章，如想查看更多文章请关注【泡泡机器人SLAM】公众号（paopaorobot_slam）。

百度网盘提取码：5cmr

欢迎来到泡泡论坛，这里有大牛为你解答关于SLAM的任何疑惑。

有想问的问题，或者想刷帖回答问题，泡泡论坛欢迎你！

泡泡网站：www.paopaorobot.org

泡泡论坛：http://paopaorobot.org/bbs/