【紫冬报告】吴毅红研究员：2017以来的2D到3D

会员服务 ·

【紫冬报告】吴毅红研究员：2017以来的2D到3D

2018 年 5 月 8 日 中国科学院自动化研究所 紫冬君

CASIA

点击蓝字关注我们↑↑↑↑

今日聚焦

中科院自动化所吴毅红研究员在第八届视觉与学习青年学者研讨会（VALSE）上做了三维计算视觉年度进展报告，详细介绍了图像匹配、视觉定位、三维重建等领域的研究进展。

2018年4月20日至22日，第八届视觉与学习青年学者研讨会(Vision And Learning SEminar, VALSE)于大连市成功举办。本届研讨会由大连理工大学承办，3000余名学者出席此次会议。

大会现场（图片来自网络）

会上，自动化所吴毅红研究员做了题为《2017以来的2D to 3D》的报告，详细介绍了图像匹配、视觉定位、三维重建等领域的研究进展。下面是报告的部分文字内容摘录(文末附PPT全文下载链接)。

图像匹配

概况

传统设计的描述子逐渐被学习型描述子取代，深度学习成为主流趋势
深度学习开始在特征检测领域展现光彩
实际使用中，仍然是传统设计方法为主

学习型特征检测算法

CovDet:CNN学习协变(Covariant)特征,Zhang,Yu, Kumar, Chang, CVPR2017
AffNet: CNN学习仿射协变参数, Mishkin, Radenovic, Matas, arXiv2017

学习型特征检测算法

L2Net: 新的采样模式及误差, Tian, Fan, Wu, CVPR2017
DeepCD: 浮点描述子与二值描述子互补, Yang, Hsu, Lin, Chuang, ICCV2017
Spread-out: 学习描述子的空间分布, Zhang, Yu, Kumar, Chang, ICCV2017
HardNet: 基于L2Net的改进误差, Mishchuk, Mishkin, Radenovic, Matas, NIPS2017

特征匹配方法

J. Bian, W. Lin, Y. Matsushita, S. Yeung, T. Nguyen, M. Cheng. GMS: Grid-based Motion Statistics for Fast, Ultra-robust Feature Correspondence. CVPR2017(利用运动平滑信息进行特征匹配，快速、鲁棒)
A. Seki, M. Pollefeys. SGM-Nets: Semi-global matching with neural networks. CVPR 2017. KITTI, Stereo 2012, rank 10

SGM: H. Hirschmuller. Stereo Processing by Semiglobal Matching and Mutual Information, PAMI 2008.

测评

Johannes L. Schonberger, Hans Hardmeier, Torsten Sattler, Marc Pollefeys. Comparative Evaluation of Hand-Crafted and Learned Local Features. CVPR 2017.

新的数据库: Hpatches

描述子学习领域可用数据集较少，之前的Brown数据集性能趋于饱和。HPatches在数据质量上较Brown数据集进一步提升，评测方法更有效、多样。

HPatches A benchmark and evaluation of handcrafted and learned local descriptor. V. Balntas, K. Lenc, A. Vedaldi, K. Mikolajczyk. In CVPR 2017

视觉定位

13D点已知:大场景，异质数据

Nathan Piasco, Désiré Sidibé, Cédric Demonceaux, Valérie Gouet-Brunet. A survey on Visual-Based Localization: On the benefit of heterogeneous data. Pattern Recognition, 2018.
Liu Liu, Hongdong Li, and Yuchao Dai. Efficient Global 2D-3D Matching for Camera Localization in a Large-Scale 3D Map, ICCV 2017.
Dylan Campbell, Lars Petersson, Laurent Kneip and Hongdong Li. Globally-Optimal Inlier Set Maximisation for Simultaneous Camera Pose and Feature Correspondence, ICCV 2017.
Youji Feng, Yihong Wu, and Lixin Fan. Real-time SLAM Relocalization with On-line Learning of Binary Feature Indexing. Machine Vision and Applications, 2017.
Jian Wu, Liwei Ma and Xiaolin Hu. Delving Deeper into Convolutional Neural Networks for Camera Relocalization. ICRA 2017.
T. Qin, P. Li and S. Shen. Relocalization, Global Optimization and Map Merging for Monocualr Visual-Inertial SLAM. ICRA 2018.

23D点未知: SLAM

综述

C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. Reid, and J.J. Leonard. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age. IEEE TRANSACTIONS ON ROBOTICS, 32(6), 2016.
G. Younes, D. Asmar, E. Shammas, J. Zelek. Keyframe-based monocular SLAM: design, survey, and future directions. Robotics and Autonomous Systems 98 (2017) 67–88

复杂环境下的鲁棒视觉定位

通过点/线/面等多种几何元素的融合，解决弱纹理、强光线、长通道等环境下的视觉定位问题；通过多目/多传感器融合，提升复杂环境下的视觉定位精度和鲁棒性。

A. Pumarola, A. Vakhitov, et al. PL-SLAM: Real-Time Monocular Visual SLAM with Points and Lines. ICRA 2017.(点线)
S. C. Yang and S. Scherer. Direct Monocular Odometry Using Points and Lines. ICRA 2017. (点线)
S. C. Yang, Y. Song, et al. Pop-up SLAM: Semantic Monocular Plane SLAM for Low-texture Environments. IROS 2016.(平面)
P. F. Proenca and Y. Gao. Probabilistic rgb-d odometry based on points lines and planes under depth uncertainty.arXiv.org, 2017.(线面)
K. Sun, K. Mohta, et al. Robust Stereo Visual Inertial Odometry for Fast Autonomous Flight. IEEE Robotics and Automation Letters, 2018. (双目+IMU)
R. Wang, M. Schworer and D. Cremers. Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras. ICCV 2017. (双目)
K. Qiu, T. Liu and S. Shen. Model-based global localization for aerial robots using edge alignment. IEEE Robotics and Automation Letters, 2(3):1256-1263, 2017. (边缘)
Y. Ling, M. Kuse and S. Shen. Edge alignment-based visual-inertial fusion for tracking of aggressive motions. Autonomous Robots, pages 1-16, 2017. (边缘)
Y. Ling and S. Shen. Building maps for autonomous navigation using sparse visual SLAM features. IROS 2017.

学习或与几何融合的视觉定位

通过学习或与几何融合的方法提升视觉定位鲁棒性或精度：利用深度学习来估计深度和相机姿态，弥补几何方法在弱纹理区域的不足；同时，利用几何方法进一步提升深度和相机姿态的精度。

K. Tateno, F. Tombari, et al. CNN-SLAM: Real-time Dense Monocular SLAM with Learned Depth Prediction. CVPR 2017.
B. Ummenhofer, H. Z. Zhou, et al. DeMoN: Depth and Motion Network for Learning Monocular Stereo. CVPR 2017.
T. H. Zhou, M. Brown, N. Snavely, D.G. Lowe. Unsupervised Learning of Depth and Ego-Motion from video. CVPR 2017.
S. Vijayanarasimhan, S. Ricco, et al. Sfm-Net: Learning of Structure and Motion from Video. arXiv:1704.07804, 2017.
R. Li, S. Wang, et al. UnDeepVO: Monocular Visual Odometry through Unsupervised Deep Learning. arXiv: 1709.06841, 2017.
D. Detone, T. Malisiewicz, A. Rabinovich. Toward Geometric Deep SLAM. arXiv:1707.07410, 2017.
R. Clark, S. Wang, et al. VINet: Visual-Inertial Odometry as a Sequence-to-Sequence Learning Problem. AAAI 2017.
Xiang Gao, Tao Zhang. Unsupervised learning to detect loops using deep neural networks for visual SLAM system. Auton Robot (2017) 41:1–18.
Helder J. Araujo et al. Deep EndoVO: A Recurrent Convolutional Neural Network (RCNN) based Visual Odometry Approach for Endoscopic Capsule Robots, Neurocomputing, 2017.

语义SLAM

从几何和内容两个层次感知世界，对地图内容进行抽象理解。通过语义理解，辅助SLAM提高建图和定位的精度；通过SLAM，帮助扩展语义理解的场景。

S. L. Bowman, N. Atanasov, et al. Probabilistic Data Association for Semantic SLAM. ICRA 2017. (Best Paper,5篇最佳论文之一)
J. McCormac, A. Handa, et al. Semantic Fusion: Dense 3D Semantic Mapping with Convolutional Neural Networks. ICRA 2017.

基于Marker的SLAM

Joseph DeGol, Timothy Bretl and Derek Hoiem. ChromaTag : A Colored Marker and Fast Detection Algorithm. ICCV, 2017.
R. Munoz-Salinas, M.J. Marin-Jimenez, E. Yeguas-Bolivar, R. Medina-Carnicer. Mapping and localization from planar markers. Pattern Recognition, Vol. 73, pp. 158-171, 2018.
Y. Wu. 轻量级无痕 marker SLAM. Patent, 2017. (不需要匹配，不需要PnP)

Event Camera SLAM, RGBD SLAM

基于事件相机的SLAM：事件相机的每个像素都在独立异步地感知接收光强变化，利用这种相机的低功耗、低带宽和对亮度变化非常敏感的性质来构建视觉SLAM系统。

G. Gallego, Jon E. A. Lund, et al. Event-based, 6-DOF Camera Tracking from Photometric Depth Maps. PAMI, 2017.
T. Rosinol Vidal, H. Rebecq, et al. Ultimate SLAM? Combining Events,Images, and IMU for Robust Visual SLAM in HDR and High Speed Scenarios.IEEE Robotics and Automation Letters, 2018
H. Rebecq, T. Horstschaefer and D. Scaramuzza. Real-time Visual-Inertial Odometry for Event Cameras using Keyframe-based Nonlinear Optimization. BMVC 2017.

深度学习方法呈上升趋势

传统几何方法热情不减

实际应用还是传统的多视几何方法为主导

三维重建

1从运动恢复结构(SFM)

增量式进展

Hainan Cui, Shuhan Shen, Xiang Gao, Zhanyi Hu. Batched Incremental Structure-from-Motion. 3DV 2017
Jianwei Li, Wei Gao, Yihong Wu. Elaborate Scene Reconstruction with a Consumer Depth Camera. International Journal of Automation and Computing，2017.

全局式进展

Hainan Cui, Shuhan Shen, Zhanyi Hu. Global Fusion of Generalized Camera Model for Efficient Large-Scale Structure from Motion. Science China: Information Sciences, 60: 038101:1–038101:3, 2017.

混合式进展

第一种：将摄像机位置和姿态求取拆开

Hainan Cui, Xiang Gao, Shuhan Shen, Zhanyi Hu. HSfM: Hybrid Structure-from-Motion. CVPR 2017.

第二种：将摄像机进行分组，每一组进行增量式重建，组之间再进行模型对齐时采用全局式方法。

Hainan Cui, Shuhan Shen, Xiang Gao, Zhanyi Hu. CSfM: Community-based Structure from Motion， ICIP 2017.
Siyu Zhu, Tianwei Shen, Lei Zhou, Runze Zhang, Jinglu Wang, Tian Fang, Long Quan. Parallel Structure from Motion from Local Increment to Global Averaging. ICCV 2017.

捆绑调整

Runze Zhang, Siyu Zhu, Tian Fang, Long Quan. Distributed Very Large Scale Bundle Adjustment by Global Camera Consensus. 29-38, ICCV 2017.
Hainan Cui, Shuhan Shen, Zhanyi Hu. Tracks Selection for Robust, Efficient and Scalable Large-Scale Structure from Motion. PR 2017.

天地融合

Lei Zhou, Siyu Zhu, Tianwei Shen, Jinglu Wang, Tian Fang, Long Quan. Progressive Large Scale-Invariant Image Matching in Scale Space. ICCV 2017.
Xiang Gao, Lihua Hu, Hainan Cui, Shuhan Shen, Zhanyi Hu. Accurate and Efficient Ground-to-Aerial Model Alignment. Pattern Recognition, 76(4): 288-302, 2018.
Yang Zhou, Shuhan Shen, Xiang Gao, Zhanyi Hu. Accurate Mesh-based Alignment for Ground and Aerial Multi-view Stereo Models. ICIP 2017.

点云处理

Nan. et al., PolyFit: Polygonal Surface Reconstruction from Point Clouds, ICCV 2017
Kelly. et al., BigSUR: Large-scale Structured Urban Reconstruction, TOG 2017
Zhu. et al., Variational Building Modeling from Urban MVS Meshes, 3DV 2017

2学习深度

Clement Godard, Oisin Mac Aodha, Gabriel J. Brostow. Unsupervised Monocular Depth Estimation With Left-Right Consistency. CVPR 2017. (左右视图视差的一致性)
T. H. Zhou, M. Brown, N. Snavely, D.G. Lowe. Unsupervised Learning of Depth and Ego-Motion from video. CVPR 2017. (同时估计当前帧的深度以及相邻帧的相机相对姿态)
Lei He, Guanghui Wang and Zhanyi Hu. Learning Depth from Single Images with Deep Neural Network Embedding Focal Length, IEEE Transactions on Image Processing, 2018.

将固定焦距数据集转换成了多焦距数据集；将焦距信息以全连接层的形式嵌入到全局特征网络中。

非刚体

Suryansh Kumar, Yuchao Dai, Hongdong Li. The 1st Winner of “Non-Rigid Structure from Motion Challenge 2017” @ CVPR 2017
Suryansh Kumar, Yuchao Dai, Hongdong Li. Spatial-temporal union of subspaces for multi-body non-rigid structure-from-motion. Pattern Recognition, 2017. (An unified framework to jointly segment and reconstruct multiple non-rigid objects, along both temporal direction and spatial direction)
Suryansh, Yuchao Dai, Hongdong Li. Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames, ICCV 2017.
Kangkan Wang, Guofeng Zhang, Shihong Xia. Templateless Non-Rigid Reconstruction and Motion Tracking With a Single RGB-D Camera. IEEE Transactions on Image Processing, 26(12): 5966 – 5979, 2017.

其余

T. Schoeps, T. Sattler, C. Haene, M. Pollefeys. Large-scale outdoor 3D reconstruction on a mobile device. CVIU 2017. (filter based method)
C. Haene, C. Zach, A. Cohen, M. Pollefeys, Dense Semantic 3D Reconstruction, PAMI, 2017.(volumetric)
Zhaopeng Cui, Jinwei Gu, Boxin Shi, Ping Tan and Jan Kautz. Polarimetric Multi-View Stereo. CVPR 2017.(从偏振，光度，法向信息，无纹理进行重建)

发展趋势

几何与学习融合

目前深度学习已广泛应用于计算机视觉领域，但在三维计算机视觉方面，深度学习方法的性能还超越不了传统的几何方法。传统方法有退化和不鲁棒的时候，如纯旋转进行三维重建，可用深度学习方法来弥补。深度学习方法泛化能力弱，很多情况下直接学习结构和运动精度较低，但有强大的特征表达能力。因此，以传统多视几何主导三维视觉，辅以深度学习，是提高三维视觉鲁棒性的一种发展趋势。

多传感器融合

当视觉环境复杂多变、即使深度学习也不能弥补时，辅以其余的传感器是一种有效方式。相比其余传感器，视觉传感器灵活、普遍、成本低廉。在要求低成本时，以视觉传感器主导，辅以廉价的激光、IMU等，可达到性能和成本兼顾的目的。

与硬件结合

目前已有很多深度相机、3D摄像头，但有些接口使用起来并不方便；将三维视觉的算法与硬件结合，嵌入到硬件或芯片中，是一种发展趋势。

M. Abouzahir, A. Elouardi, R. Latif, S. Bouaziz, A. Tajer. Embedding SLAM algorithms: Has it come of age? Robotics and Autonomous Systems 100 (2018) 14–26.

该文将几种典型的SLAM算法嵌入到芯片中，并对性能进行了分析和比较。

与具体应用结合

三维视觉在AGV、无人驾驶、服务机器人、AR教育、AR影音等方面有广泛的应用价值。

VALSE发起于2011年，秉承自由、平等之学术精神，为国内计算机视觉、图像处理、模式识别与机器学习等研究领域的青年学者、学生搭建高水平、强互动的学术交流舞台。

点击“阅读原文”获得PPT全文下载链接

分割线

更多精彩内容，欢迎关注

中科院自动化所官方网站：

http://www.ia.ac.cn

欢迎后台留言、推荐您感兴趣的话题、内容或资讯，小编恭候您的意见和建议！如需转载或投稿，请后台私信。

来源：自动化所机器人视觉课题组

编辑：鲁宁、欧梨成

排版：智慧

长

按

关

注

解锁更多智能之美

中科院自动化研究所

微信：casia1956

欢迎搭乘自动化所AI旗舰号！

登录查看更多

相关内容

吴毅红

关注 1

吴毅红，中国科学院自动化研究所、模式识别国家重点实验室, 研究员，博士生导师。研究方向为多视几何理论、图像匹配、相机标定与定位、SLAM、三维重建等。2001年6月毕业于中国科学院系统科学研究所，获博士学位，之后加入模式识别国家重点实验室，2008年被评为研究员。2005年，2010年，被法国IRIT实验室邀请合作研究。2006年至2008年，被香港城市大学多次邀请合作研究。研究领域:三维计算机视觉理论及应用，包括摄像机定标，摄像机定位，图像匹配，三维重建，多视几何学，基于图像的测量, SLAM，ARVR等。个人主页：http://people.ucas.ac.cn/~wuyihong

3D目标检测进展综述

专知会员服务

193+阅读 · 2020年4月24日

统计学习理论之父Vapnik-MIT2020报告《完全学习统计理论Statistical Theory of Learning》

专知会员服务

85+阅读 · 2020年2月16日

近期必读的9篇 CVPR 2019【视觉目标跟踪】相关论文和代码

专知会员服务

33+阅读 · 2020年1月10日

【北京智源大会2019】视觉信息处理的闭环，北京大学信息科学技术学院长聘教授吴思

专知会员服务

30+阅读 · 2019年11月22日