我们可以解决三维愿景任务吗? 从二维愿景转换器开始? (Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?) - 专知论文

会员服务 ·

0

3D · Vision · Performer · 可理解性 · 变换 ·

2022 年 9 月 15 日

Can We Solve 3D Vision Tasks Starting from A 2D Vision Transformer?

翻译：我们可以解决三维愿景任务吗? 从二维愿景转换器开始?

Yi Wang,Zhiwen Fan,Tianlong Chen,Hehe Fan,Zhangyang Wang

Vision Transformers (ViTs) have proven to be effective, in solving 2D image understanding tasks by training over large-scale image datasets; and meanwhile as a somehow separate track, in modeling the 3D visual world too such as voxels or point clouds. However, with the growing hope that transformers can become the "universal" modeling tool for heterogeneous data, ViTs for 2D and 3D tasks have so far adopted vastly different architecture designs that are hardly transferable. That invites an (over-)ambitious question: can we close the gap between the 2D and 3D ViT architectures? As a piloting study, this paper demonstrates the appealing promise to understand the 3D visual world, using a standard 2D ViT architecture, with only minimal customization at the input and output levels without redesigning the pipeline. To build a 3D ViT from its 2D sibling, we "inflate" the patch embedding and token sequence, accompanied with new positional encoding mechanisms designed to match the 3D data geometry. The resultant "minimalist" 3D ViT, named Simple3D-Former, performs surprisingly robustly on popular 3D tasks such as object classification, point cloud segmentation and indoor scene detection, compared to highly customized 3D-specific designs. It can hence act as a strong baseline for new 3D ViTs. Moreover, we note that pursing a unified 2D-3D ViT design has practical relevance besides just scientific curiosity. Specifically, we demonstrate that Simple3D-Former naturally enables to exploit the wealth of pre-trained weights from large-scale realistic 2D images (e.g., ImageNet), which can be plugged in to enhancing the 3D task performance "for free".

翻译：视觉变异器( ViTs ) 已证明是有效的, 通过大规模图像数据集的培训解决 2D 图像理解任务( 2D - 3D 图像网络) 。同时, 作为一种以某种方式分开的轨道, 建模 3D 视觉世界, 也像 voxels 或点云一样。然而, 随着变异器成为“ 通用” 数据模型工具的希望日益增强, 2D 和 3D 任务VT 迄今已采用了极不易转让的巨型结构设计。这就引出了一个( 超) 雄心勃勃的问题: 我们能否通过 2D 和 3D ViT 结构来缩小 2D 图像之间的距离? 作为实验性研究, 本文展示了理解 3D 3D 视觉世界的吸引力承诺, 使用标准 2D ViT 结构, 只需在输入最小化的输入的输入量和输出值上最小化的 3D 。为了从 2D 之前建立3D 直观的新的定位, 我们的直观分析, 可以将3D 直观的直观分析。

0

相关内容

3D是英文“Three Dimensions”的简称，中文是指三维、三个维度、三个坐标，即有长、有宽、有高，换句话说，就是立体的，是相对于只有长和宽的平面（2D）而言。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

3D多孔结构LiMnPO4•LiVPO4F@石墨烯气凝胶复合物材料的构筑及电化学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于Aurivillius-Sillenite结构光催化材料的性能调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

微纳结构钽基异质复合阵列的构筑、界面调控及光电化学性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

iTRAQ技术分析钝叶草抗旱的分子机理

国家自然科学基金

0+阅读 · 2013年12月31日

氧化应激相关的LncRNA和miRNA相互作用在糖尿病视网膜病变发病中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

雄激素经AR/PI3K/AKT通路调控CA916798参与肺腺癌发生的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

SARI基因在肺癌侵袭转移中的作用及分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

Point Cloud Learning with Transformer

Arxiv

0+阅读 · 2022年10月25日

One Transformer Can Understand Both 2D & 3D Molecular Data

Arxiv

0+阅读 · 2022年10月24日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Arxiv

0+阅读 · 2022年10月23日

Boosting vision transformers for image retrieval

Arxiv

0+阅读 · 2022年10月21日

Generative Language Models for Paragraph-Level Question Generation

Arxiv

0+阅读 · 2022年10月21日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Arxiv

16+阅读 · 2021年5月26日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Point Cloud Learning with Transformer

Arxiv

0+阅读 · 2022年10月25日

One Transformer Can Understand Both 2D & 3D Molecular Data

Arxiv

0+阅读 · 2022年10月24日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Arxiv

0+阅读 · 2022年10月23日

Boosting vision transformers for image retrieval

Arxiv

0+阅读 · 2022年10月21日

Generative Language Models for Paragraph-Level Question Generation

Arxiv

0+阅读 · 2022年10月21日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Arxiv

16+阅读 · 2021年5月26日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

3D多孔结构LiMnPO4•LiVPO4F@石墨烯气凝胶复合物材料的构筑及电化学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于Aurivillius-Sillenite结构光催化材料的性能调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

微纳结构钽基异质复合阵列的构筑、界面调控及光电化学性能研究

国家自然科学基金

0+阅读 · 2014年12月31日

PPAR β/δ基因在结直肠癌血管生成调控中的作用及分子机理

国家自然科学基金

2+阅读 · 2014年12月31日

iTRAQ技术分析钝叶草抗旱的分子机理

国家自然科学基金

0+阅读 · 2013年12月31日

氧化应激相关的LncRNA和miRNA相互作用在糖尿病视网膜病变发病中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

雄激素经AR/PI3K/AKT通路调控CA916798参与肺腺癌发生的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

SARI基因在肺癌侵袭转移中的作用及分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员