【泡泡一分钟】DeMoN：能够估计深度图和运动关系的神经网络（CVPR-27）

会员服务 ·

【泡泡一分钟】DeMoN：能够估计深度图和运动关系的神经网络（CVPR-27）

2017 年 12 月 24 日 泡泡机器人SLAM 泡泡一分钟

每天一分钟，带你读遍机器人顶级会议文章

标题：DeMoN：Depth and Motion Network for Learning Monocular Stereo

作者：B.Ummenhofer and H. Zhou and J. Uhrig and N. Mayer and E. Ilg and A. Dosovitskiyand T. Brox

来源：CVPR2017

播音员：火箭姚小麦

编译：鲁涛

欢迎个人转发朋友圈；其他机构或自媒体如需转载，后台留言申请授权

摘要

大家好，今天给大家介绍一项深度学习在SfM领域的尝试性工作——DeMoN:Depth and Motion Network for Learning Monocular Stereo——用神经网络学习单目和双目图像的深度图和运动关系，该文章发表于CVPR2017。

本文尝试用学习的思路解决SfM（运动恢复结构）问题。作者训练端到端的卷积神经网络，使其能够根据连续且无约束的图像对，计算出深度图及相机运动。算法主要由多个堆叠的编码-解码网络组成，核心部分是一个迭代网络，它能够改善自身的预测。作者的算法不仅能估计深度图和相机运动，还能估计面元法向量、图像对之间的光流以及匹配的置信度。能做到这一切的关键在于，作者在损失函数中考虑了图像间的空间相对差异。与传统采用两帧的SfM方法相比，本文的结果更加精确且稳定。与当前正流行的用单帧图像预测深度的网络相比，本文的网络由于学到了图像间的匹配，从而可以更好地推广到未知的场景中。

图1. DeMoN的效果图。输入是连续两帧单目相机采集的图像，网络会估计出第一帧的深度图及第二帧的相机运动。

图2. 系统流程图。网络是由一系列编码器-解码器组成的链式结构，会根据光流、深度以及运动关系进行迭代。最后有个精细化的网络用于提升深度图的精度。

图3. 编码-解码器网络的结构示意图。第一个编码解码器估计了光流及其置信度。第二个估计了深度图和面元的法向量，附加到编码器上的全连接网络估计了相机的运动以及深度的尺度因子。

Abstract

In this paper we formulate structure from motion as a learning problem. We train a convolutional network endto-end to compute depth and camera motion from successive, unconstrained image pairs. The architecture is composed of multiple stacked encoder-decoder networks, the core part being an iterative network that is able to improve its own predictions. The network estimates not only depth and motion, but additionally surface normals, optical flow between the images and confidence of the matching. A crucial component of the approach is a training loss based on spatial relative differences. Compared to traditional twoframe structure from motion methods, results are more accurate and more robust. In contrast to the popular depthfrom-single-image networks, DeMoN learns the concept of matching and, thus, better generalizes to structures not seen during training.

如果你对本文感兴趣，想要下载完整文章进行阅读，可以关注【泡泡机器人SLAM】公众号。

回复关键字“DeMoN”，即可获取本文下载链接。