VoxFormer: 用于基于相机的 3D 语义场景补全的 Sparse Voxel 变换器 (VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion) - 专知论文

会员服务 ·

0

稀疏 · 3D · 情景 · 变换 · 可约的 ·

2023 年 2 月 23 日

VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion

翻译：VoxFormer: 用于基于相机的 3D 语义场景补全的 Sparse Voxel 变换器

Yiming Li,Zhiding Yu,Christopher Choy,Chaowei Xiao,Jose M. Alvarez,Sanja Fidler,Chen Feng,Anima Anandkumar

from arxiv, 8 pages, 3 figures

Humans can easily imagine the complete 3D geometry of occluded objects and scenes. This appealing ability is vital for recognition and understanding. To enable such capability in AI systems, we propose VoxFormer, a Transformer-based semantic scene completion framework that can output complete 3D volumetric semantics from only 2D images. Our framework adopts a two-stage design where we start from a sparse set of visible and occupied voxel queries from depth estimation, followed by a densification stage that generates dense 3D voxels from the sparse ones. A key idea of this design is that the visual features on 2D images correspond only to the visible scene structures rather than the occluded or empty spaces. Therefore, starting with the featurization and prediction of the visible structures is more reliable. Once we obtain the set of sparse queries, we apply a masked autoencoder design to propagate the information to all the voxels by self-attention. Experiments on SemanticKITTI show that VoxFormer outperforms the state of the art with a relative improvement of 20.0% in geometry and 18.1% in semantics and reduces GPU memory during training by ~45% to less than 16GB. Our code is available on https://github.com/NVlabs/VoxFormer.

翻译：人类可以很容易地想象整个 3D 隐蔽天体和场景的完整 3D 几何。这种吸引能力对于识别和理解至关重要。为了在 AI 系统中实现这种能力, 我们提议 VoxFormer, 一个基于变异器的语义场景完成框架, 只能从 2D 图像中输出完成 3D 体积语义。我们的框架采用一个两阶段的设计, 我们从一个稀少的、可见的和被占用的 voxel 查询集开始, 然后从深度估计开始, 进入一个密度稠密的 3D voxels 。这个设计的关键理念是 2D 图像的视觉特性只对应可见的场景结构, 而不是隐蔽或空空空空空间。因此, 从视觉结构的预览和预测开始, 比较可靠。一旦我们获得一组稀疏的查询, 我们应用一个隐蔽的自动编码设计来向所有的 voxel 传播信息。在 Smantic kITTI 上进行实验显示, VoxFormer 超越了艺术状态, 在地理记忆/ PI 16 中相对改进了20% 和在 AS 中, 我们的加密/ trembismiss 中减少了18%

0

相关内容

【CVPR 2022】单目3D语义场景完成框架，MonoScene: Monocular 3D Semantic Scene Completion

【CVPR 2022】单目3D语义场景完成框架，MonoScene: Monocular 3D Semantic Scene Completion

专知会员服务

15+阅读 · 2022年3月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

语义分割+视频分割开源代码集合

语义分割+视频分割开源代码集合

极市平台

35+阅读 · 2018年3月5日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

小鼠肺移植早期再灌注损伤中NETs作用的可视化研究

国家自然科学基金

0+阅读 · 2014年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于主动轮廓模型的图像分割与目标跟踪研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于Rho/ROCK信号通路的双黄连注射液致过敏样反应机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

ECAP中fcc织构转变规律及其与微观组织演化的交互作用

国家自然科学基金

0+阅读 · 2012年12月31日

电化学法制备金属多孔rugate滤波片来实现在可见-红外区完美吸收器

国家自然科学基金

0+阅读 · 2012年12月31日

I型Ge基clathrate晶体生长及热电性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Arxiv

12+阅读 · 2021年12月16日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Arxiv

12+阅读 · 2020年2月27日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

Arxiv

11+阅读 · 2019年11月25日

3D Backbone Network for 3D Object Detection

Arxiv

12+阅读 · 2019年1月24日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Arxiv

12+阅读 · 2018年5月18日

An application of cascaded 3D fully convolutional networks for medical image segmentation

Arxiv

10+阅读 · 2018年3月20日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】单目3D语义场景完成框架，MonoScene: Monocular 3D Semantic Scene Completion

【CVPR 2022】单目3D语义场景完成框架，MonoScene: Monocular 3D Semantic Scene Completion

专知会员服务

15+阅读 · 2022年3月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

NeurIPS 2025 | 自动化所新作速览（一）

大型语言模型（LLM）赋能的知识图谱构建：综述

NeurIPS 2025 | 自动化所新作速览（二）

领域特定文本分类中的预训练语言模型新进展：系统综述

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

语义分割+视频分割开源代码集合

语义分割+视频分割开源代码集合

极市平台

35+阅读 · 2018年3月5日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

Activation Modulation and Recalibration Scheme for Weakly Supervised Semantic Segmentation

Arxiv

12+阅读 · 2021年12月16日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Deep Neural Network Based Relation Extraction: An Overview

Arxiv

14+阅读 · 2021年1月6日

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image

Arxiv

12+阅读 · 2020年2月27日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds

Arxiv

11+阅读 · 2019年11月25日

3D Backbone Network for 3D Object Detection

Arxiv

12+阅读 · 2019年1月24日

Aspect Based Sentiment Analysis with Gated Convolutional Networks

Arxiv

12+阅读 · 2018年5月18日

An application of cascaded 3D fully convolutional networks for medical image segmentation

Arxiv

10+阅读 · 2018年3月20日

相关基金

Schr？dinger-Poisson方程守恒DDG方法研究

国家自然科学基金

2+阅读 · 2015年12月31日

小鼠肺移植早期再灌注损伤中NETs作用的可视化研究

国家自然科学基金

0+阅读 · 2014年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于主动轮廓模型的图像分割与目标跟踪研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于Rho/ROCK信号通路的双黄连注射液致过敏样反应机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

ECAP中fcc织构转变规律及其与微观组织演化的交互作用

国家自然科学基金

0+阅读 · 2012年12月31日

电化学法制备金属多孔rugate滤波片来实现在可见-红外区完美吸收器

国家自然科学基金

0+阅读 · 2012年12月31日

I型Ge基clathrate晶体生长及热电性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员