变换功能: 使用变换器重建单向 RGB 场景 (TransformerFusion: Monocular RGB Scene Reconstruction using Transformers) - 专知论文

会员服务 ·

0

变换 · 3D · Networking · Microsoft Surface · INTERACT ·

2021 年 7 月 5 日

TransformerFusion: Monocular RGB Scene Reconstruction using Transformers

翻译：变换功能: 使用变换器重建单向 RGB 场景

Aljaž Božič,Pablo Palafox,Justus Thies,Angela Dai,Matthias Nießner

from arxiv, Video: https://youtu.be/LIpTKYfKSqw

We introduce TransformerFusion, a transformer-based 3D scene reconstruction approach. From an input monocular RGB video, the video frames are processed by a transformer network that fuses the observations into a volumetric feature grid representing the scene; this feature grid is then decoded into an implicit 3D scene representation. Key to our approach is the transformer architecture that enables the network to learn to attend to the most relevant image frames for each 3D location in the scene, supervised only by the scene reconstruction task. Features are fused in a coarse-to-fine fashion, storing fine-level features only where needed, requiring lower memory storage and enabling fusion at interactive rates. The feature grid is then decoded to a higher-resolution scene reconstruction, using an MLP-based surface occupancy prediction from interpolated coarse-to-fine 3D features. Our approach results in an accurate surface reconstruction, outperforming state-of-the-art multi-view stereo depth estimation methods, fully-convolutional 3D reconstruction approaches, and approaches using LSTM- or GRU-based recurrent networks for video sequence fusion.

翻译：我们引入了基于变压器的3D场景重建方法,即变压器Fusion。从输入单镜 RGB 视频中,视频框由一个变压器网络处理,该变压器网络将观测结果结合到代表场景的体积特征网格中;然后,这一地格网被解码成隐含的3D场景代表。我们的方法的关键是变压器结构,使网络能够学习对场景中每个3D地点最相关的图像框架,仅受现场重建任务监督。特征以粗略到平面的方式结合,只在需要的地方储存精细水平的特征,需要更低的内存存储,并能够以互动速度进行融合。然后,对地格网进行解码,以便进行更清晰的场景重建,使用基于多光速到线的3D特征的 MLP 地面占用预测。我们的方法在精确的地表重建、优于状态的多视图立声波深度估计方法、全面革命3D重建方法,以及使用基于LSTM或GRU的经常性网络进行视频波波波波波波波波波波波的频率的频率组合连接的处理。

0

相关内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

MonoGRNet：单目3D目标检测的通用框架（TPAMI2021）

MonoGRNet：单目3D目标检测的通用框架（TPAMI2021）

专知会员服务

18+阅读 · 2021年5月3日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

CVPR2020 | 商汤-港中文等提出PV-RCNN：3D目标检测新网络

CVPR2020 | 商汤-港中文等提出PV-RCNN：3D目标检测新网络

专知会员服务

45+阅读 · 2020年4月17日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

三维重建 3D reconstruction 有哪些实用算法？

三维重建 3D reconstruction 有哪些实用算法？

极市平台

13+阅读 · 2020年2月23日

跟踪SLAM前沿动态系列之ICCV2019

跟踪SLAM前沿动态系列之ICCV2019

泡泡机器人SLAM

7+阅读 · 2019年11月23日

CVPR 2019 | 34篇 CVPR 2019 论文实现代码

CVPR 2019 | 34篇 CVPR 2019 论文实现代码

AI科技评论

21+阅读 · 2019年6月23日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【泡泡点云时空】3DMV:联合三维多视图预测的三维语义场景分割(ECCV2018-7)

【泡泡点云时空】3DMV:联合三维多视图预测的三维语义场景分割(ECCV2018-7)

泡泡机器人SLAM

9+阅读 · 2018年10月16日

【泡泡点云时空】PointFusion：深度传感器融合估计3D包围盒(CVPR2018-16)

【泡泡点云时空】PointFusion：深度传感器融合估计3D包围盒(CVPR2018-16)

泡泡机器人SLAM

7+阅读 · 2018年9月26日

【泡泡一分钟】Matterport3D: 从室内RGBD数据集中训练 (3dv-22)

【泡泡一分钟】Matterport3D: 从室内RGBD数据集中训练 (3dv-22)

泡泡机器人SLAM

16+阅读 · 2017年12月31日

【泡泡一分钟】基于视觉传感器的三维空间几何重建（3dv-16）

【泡泡一分钟】基于视觉传感器的三维空间几何重建（3dv-16）

泡泡机器人SLAM

4+阅读 · 2017年12月18日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

Lidar Point Cloud Guided Monocular 3D Object Detection

Lidar Point Cloud Guided Monocular 3D Object Detection

Arxiv

1+阅读 · 2021年9月8日

Animatable Neural Radiance Fields from Monocular RGB Videos

Animatable Neural Radiance Fields from Monocular RGB Videos

Arxiv

0+阅读 · 2021年9月7日

Self-Attentive 3D Human Pose and Shape Estimation from Videos

Arxiv

0+阅读 · 2021年9月7日

Toward Realistic Single-View 3D Object Reconstructionwith Unsupervised Learning from Multiple Images

Arxiv

0+阅读 · 2021年9月6日

Underwater 3D Reconstruction Using Light Fields

Arxiv

0+阅读 · 2021年9月5日

Towards High-Fidelity 3D Face Reconstruction from In-the-Wild Images Using Graph Convolutional Networks

Towards High-Fidelity 3D Face Reconstruction from In-the-Wild Images Using Graph Convolutional Networks

Arxiv

8+阅读 · 2020年3月12日

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Arxiv

12+阅读 · 2020年2月27日

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Arxiv

9+阅读 · 2019年3月21日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding

Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding

Arxiv

7+阅读 · 2019年2月26日

VIP会员

文章信息

相关主题

Microsoft Surface

相关VIP内容

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

MonoGRNet：单目3D目标检测的通用框架（TPAMI2021）

MonoGRNet：单目3D目标检测的通用框架（TPAMI2021）

专知会员服务

18+阅读 · 2021年5月3日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

CVPR2020 | 商汤-港中文等提出PV-RCNN：3D目标检测新网络

CVPR2020 | 商汤-港中文等提出PV-RCNN：3D目标检测新网络

专知会员服务

45+阅读 · 2020年4月17日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

三维重建 3D reconstruction 有哪些实用算法？

三维重建 3D reconstruction 有哪些实用算法？

极市平台

13+阅读 · 2020年2月23日

跟踪SLAM前沿动态系列之ICCV2019

跟踪SLAM前沿动态系列之ICCV2019

泡泡机器人SLAM

7+阅读 · 2019年11月23日

CVPR 2019 | 34篇 CVPR 2019 论文实现代码

CVPR 2019 | 34篇 CVPR 2019 论文实现代码

AI科技评论

21+阅读 · 2019年6月23日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【泡泡点云时空】3DMV:联合三维多视图预测的三维语义场景分割(ECCV2018-7)

【泡泡点云时空】3DMV:联合三维多视图预测的三维语义场景分割(ECCV2018-7)

泡泡机器人SLAM

9+阅读 · 2018年10月16日

【泡泡点云时空】PointFusion：深度传感器融合估计3D包围盒(CVPR2018-16)

【泡泡点云时空】PointFusion：深度传感器融合估计3D包围盒(CVPR2018-16)

泡泡机器人SLAM

7+阅读 · 2018年9月26日

【泡泡一分钟】Matterport3D: 从室内RGBD数据集中训练 (3dv-22)

【泡泡一分钟】Matterport3D: 从室内RGBD数据集中训练 (3dv-22)

泡泡机器人SLAM

16+阅读 · 2017年12月31日

【泡泡一分钟】基于视觉传感器的三维空间几何重建（3dv-16）

【泡泡一分钟】基于视觉传感器的三维空间几何重建（3dv-16）

泡泡机器人SLAM

4+阅读 · 2017年12月18日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

相关论文

Lidar Point Cloud Guided Monocular 3D Object Detection

Lidar Point Cloud Guided Monocular 3D Object Detection

Arxiv

1+阅读 · 2021年9月8日

Animatable Neural Radiance Fields from Monocular RGB Videos

Animatable Neural Radiance Fields from Monocular RGB Videos

Arxiv

0+阅读 · 2021年9月7日

Self-Attentive 3D Human Pose and Shape Estimation from Videos

Arxiv

0+阅读 · 2021年9月7日

Toward Realistic Single-View 3D Object Reconstructionwith Unsupervised Learning from Multiple Images

Arxiv

0+阅读 · 2021年9月6日

Underwater 3D Reconstruction Using Light Fields

Arxiv

0+阅读 · 2021年9月5日

Towards High-Fidelity 3D Face Reconstruction from In-the-Wild Images Using Graph Convolutional Networks

Towards High-Fidelity 3D Face Reconstruction from In-the-Wild Images Using Graph Convolutional Networks

Arxiv

8+阅读 · 2020年3月12日

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Arxiv

12+阅读 · 2020年2月27日

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Sparse2Dense: From direct sparse odometry to dense 3D reconstruction

Arxiv

9+阅读 · 2019年3月21日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding

Single-Image Piece-wise Planar 3D Reconstruction via Associative Embedding

Arxiv

7+阅读 · 2019年2月26日

微信扫码咨询专知VIP会员