【泡泡一分钟】SfM-Net：从视频中学习结构和运动

会员服务 ·

【泡泡一分钟】SfM-Net：从视频中学习结构和运动

2018 年 5 月 29 日 泡泡机器人SLAM

每天一分钟，带你读遍机器人顶级会议文章

标题：SfM-Net：Learning of Structure and Motion from Video

作者：Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, and Katerina Fragkiadaki

来源：arXiv:1704.07804 (arXiv2017)

播音员：王肃

编译：陈建华

欢迎个人转发朋友圈；其他机构或自媒体如需转载，后台留言申请授权

摘要

大家好，今天为大家带来的文章是——SfM-Net：从视频中学习结构和运动，该文章发表于arXiv2017。

在本文中，作者提出了SfM-Net。SfM-Net是一种用于视频运动估计的几何感知神经网络，该网络根据场景和物体深度，相机运动以及三维物体旋转和平移来分解帧与帧之间的像素运动。给定一组序列帧，SfM-Net能够预测深度、分割、相机以及刚性物体的运动，并且把这些转换成稠密的帧间运动场（光流），进而通过对图像帧的扭曲变换以进行像素匹配以及反向传播等工作。

图1 SfM-Net的系统流程图

本文提出的模型可以进行不同监督程度的训练：1）通过重投影光度误差进行自我监督（完全无监督），2）通过自身运动（相机运动）进行监督，或者3）通过深度进行监督（例如，RGB-D传感器提供的深度信息）。

图2 SfM-Net的架构图

图3 SfM-Net无监督方式的运动分割结果

此外，SfM-Net能够提取有意义的深度估计，并且成功地估计帧与帧之间相机的旋转和平移。而且即便从未提供这种监督训练，SfM-Net也经常能够成功地将场景中的运动物体分割出来。

图4 不同数据集下相机运动估计结果对比

图5 SfM-Net无监督方式的物体分割和光流效果图

Abstract

We propose SfM-Net, a geometry-aware neural network for motion estimation in videos that decomposes frame-to-frame pixel motion in terms of scene and object depth, camera motion and 3D object rotations and translations. Given a sequence of frames, SfM-Net predicts depth, segmentation, camera and rigid object motions, converts those into a dense frame-to-frame motion field (optical flow), differentiably warps frames in time to match pixels and back-propagates. The model can be trained with various degrees of supervision: 1) self-supervised by the reprojection photometric error (completely unsupervised), 2) supervised by ego-motion (camera motion), or 3) supervised by depth (e.g., as provided by RGBD sensors). SfM-Net extracts meaningful depth estimates and successfully estimates frame-to-frame camera rotations and translations. It often successfully segments the moving objects in the scene, even though such supervision is never provided.

如果你对本文感兴趣，想要下载完整文章进行阅读，可以关注【泡泡机器人SLAM】公众号。

点击阅读原文，即可获取本文下载链接。

欢迎来到泡泡论坛，这里有大牛为你解答关于SLAM的任何疑惑。

有想问的问题，或者想刷帖回答问题，泡泡论坛欢迎你！

泡泡网站：www.paopaorobot.org

泡泡论坛：http://paopaorobot.org/forums/