基于遮蔽图像建模的多视角三维理解中更好的知识传递 (Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding) - 专知论文

会员服务 ·

0

LIDAR · 3D · Branch · 知识 (knowledge) · Better ·

2023 年 3 月 20 日

Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding

翻译：基于遮蔽图像建模的多视角三维理解中更好的知识传递

Jihao Liu,Tai Wang,Boxiao Liu,Qihang Zhang,Yu Liu,Hongsheng Li

Multi-view camera-based 3D detection is a challenging problem in computer vision. Recent works leverage a pretrained LiDAR detection model to transfer knowledge to a camera-based student network. However, we argue that there is a major domain gap between the LiDAR BEV features and the camera-based BEV features, as they have different characteristics and are derived from different sources. In this paper, we propose Geometry Enhanced Masked Image Modeling (GeoMIM) to transfer the knowledge of the LiDAR model in a pretrain-finetune paradigm for improving the multi-view camera-based 3D detection. GeoMIM is a multi-camera vision transformer with Cross-View Attention (CVA) blocks that uses LiDAR BEV features encoded by the pretrained BEV model as learning targets. During pretraining, GeoMIM's decoder has a semantic branch completing dense perspective-view features and the other geometry branch reconstructing dense perspective-view depth maps. The depth branch is designed to be camera-aware by inputting the camera's parameters for better transfer capability. Extensive results demonstrate that GeoMIM outperforms existing methods on nuScenes benchmark, achieving state-of-the-art performance for camera-based 3D object detection and 3D segmentation.

翻译：基于多视角相机的三维检测是计算机视觉中一个具有挑战性的问题。最近的工作利用预训练的 LiDAR 检测模型将知识传递给基于相机的学生网络。然而，我们认为 LiDAR BEV 特征和基于相机 BEV 特征之间存在重大的域差异，因为它们具有不同的特征并来自不同的来源。在本文中，我们提出了基于几何增强的遮蔽图像建模 (GeoMIM) 以预训练微调范式来传递 LiDAR 模型的知识，以提高多视角基于相机的三维检测能力。GeoMIM 是一个多相机视觉变换器，具有交叉视图注意力 (CVA) 块，使用预训练 BEV 模型编码的 LiDAR BEV 特征作为学习目标。在预训练期间，GeoMIM 的解码器拥有完成密集的透视视图特征的语义分支和重构密集透视视图深度地图的几何分支。深度分支经过设计是相机感知的，通过输入相机的参数，以获得更好的传递能力。广泛的实验结果表明，GeoMIM 在 nuScenes 基准测试中优于现有方法，在相机三维物体检测和三维语义分割方面均取得了最先进的性能。

0

相关内容

LIDAR

【CVPR2023】MSeg3D:面向自动驾驶的多模态3D语义分割

【CVPR2023】MSeg3D:面向自动驾驶的多模态3D语义分割

专知会员服务

16+阅读 · 2023年3月17日

【AAAI2023】面向领域自适应语义分割的几何感知网络

【AAAI2023】面向领域自适应语义分割的几何感知网络

专知会员服务

21+阅读 · 2022年12月7日

自动化所11篇NeurIPS 2022新作速览！

自动化所11篇NeurIPS 2022新作速览！

专知会员服务

40+阅读 · 2022年10月5日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

专知会员服务

59+阅读 · 2020年6月30日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

专知会员服务

29+阅读 · 2019年11月23日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

大白话用Transformer做BEV 3D目标检测

大白话用Transformer做BEV 3D目标检测

PaperWeekly

1+阅读 · 2022年6月7日

ECCV 2020 | IIAI&谷歌等提出MIRNet：用于真实图像的恢复和增强

ECCV 2020 | IIAI&谷歌等提出MIRNet：用于真实图像的恢复和增强

CVer

16+阅读 · 2020年10月12日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

解读自监督学习(Self-Supervised Learning)几篇相关paper

解读自监督学习(Self-Supervised Learning)几篇相关paper

CVer

25+阅读 · 2020年2月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新七篇知识图谱相关论文—知识表示学习、增强神经网络、链接预测、关系预测与提取、综述、递归特性生成、深度知识感知网络

【论文推荐】最新七篇知识图谱相关论文—知识表示学习、增强神经网络、链接预测、关系预测与提取、综述、递归特性生成、深度知识感知网络

专知

29+阅读 · 2018年3月6日

飞行器三维不变矩特征提取与识别研究

国家自然科学基金

2+阅读 · 2015年12月31日

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

融合多尺度稀疏与稠密特征结构的透视不变图像匹配模型研究

国家自然科学基金

0+阅读 · 2014年12月31日

多光谱颜色成像技术及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

群代数的双曲模判别及应用

国家自然科学基金

0+阅读 · 2011年12月31日

基于边缘点的折反射图像立体匹配与三维重建研究

国家自然科学基金

0+阅读 · 2009年12月31日

人类视觉关于图像质量感知模型的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于小波变换的仿射不变形状表示算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

高分辨率SAR复杂场景建模与基于场景指引的目标检测

国家自然科学基金

1+阅读 · 2009年12月31日

生物视觉信息处理机制建模及形状目标识别

国家自然科学基金

0+阅读 · 2008年12月31日

Vision Langauge Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation

Arxiv

0+阅读 · 2023年5月9日

Vision Lanauge Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation

Arxiv

0+阅读 · 2023年5月8日

DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

Arxiv

0+阅读 · 2023年5月5日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Arxiv

21+阅读 · 2022年9月27日

A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond

Arxiv

10+阅读 · 2022年7月30日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

VIP会员

文章信息

相关主题

知识 (knowledge)

相关VIP内容

【CVPR2023】MSeg3D:面向自动驾驶的多模态3D语义分割

【CVPR2023】MSeg3D:面向自动驾驶的多模态3D语义分割

专知会员服务

16+阅读 · 2023年3月17日

【AAAI2023】面向领域自适应语义分割的几何感知网络

【AAAI2023】面向领域自适应语义分割的几何感知网络

专知会员服务

21+阅读 · 2022年12月7日

自动化所11篇NeurIPS 2022新作速览！

自动化所11篇NeurIPS 2022新作速览！

专知会员服务

40+阅读 · 2022年10月5日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

专知会员服务

59+阅读 · 2020年6月30日

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

【微软亚洲研究院】无监督词嵌入对齐的几何感知域自适应，Geometry-aware Domain Adaptation for Unsupervised Alignment of Word Embeddings

专知会员服务

23+阅读 · 2020年4月21日

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

学习具有层次标签的图像表示，Learning Representations For Images With Hierarchical Labels

专知会员服务

38+阅读 · 2020年4月6日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

【AAAI2020】用于视觉对话中深度视觉理解的自适应双向编码模型（DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue）, 中科院信工所于静等

专知会员服务

29+阅读 · 2019年11月23日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

面向性能、成本效益、云边隐私与可信性的大小语言模型协作综述

乌克兰太空研究（2022-2024年） | 176页

【CMU博士论文】大型语言模型的隐性特性

国防领域人工智能走向何方？

相关资讯

大白话用Transformer做BEV 3D目标检测

大白话用Transformer做BEV 3D目标检测

PaperWeekly

1+阅读 · 2022年6月7日

ECCV 2020 | IIAI&谷歌等提出MIRNet：用于真实图像的恢复和增强

ECCV 2020 | IIAI&谷歌等提出MIRNet：用于真实图像的恢复和增强

CVer

16+阅读 · 2020年10月12日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

解读自监督学习(Self-Supervised Learning)几篇相关paper

解读自监督学习(Self-Supervised Learning)几篇相关paper

CVer

25+阅读 · 2020年2月21日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新七篇知识图谱相关论文—知识表示学习、增强神经网络、链接预测、关系预测与提取、综述、递归特性生成、深度知识感知网络

【论文推荐】最新七篇知识图谱相关论文—知识表示学习、增强神经网络、链接预测、关系预测与提取、综述、递归特性生成、深度知识感知网络

专知

29+阅读 · 2018年3月6日

相关论文

Vision Langauge Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation

Arxiv

0+阅读 · 2023年5月9日

Vision Lanauge Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation

Arxiv

0+阅读 · 2023年5月8日

DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEV Perception

Arxiv

0+阅读 · 2023年5月5日

A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Arxiv

21+阅读 · 2022年9月27日

A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond

Arxiv

10+阅读 · 2022年7月30日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

相关基金

飞行器三维不变矩特征提取与识别研究

国家自然科学基金

2+阅读 · 2015年12月31日

S3AGA样本（Spitzer-SDSS Spectral Atlas of Galaxies and AGNs)及其AGN研究

国家自然科学基金

0+阅读 · 2014年12月31日

融合多尺度稀疏与稠密特征结构的透视不变图像匹配模型研究

国家自然科学基金

0+阅读 · 2014年12月31日

多光谱颜色成像技术及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

群代数的双曲模判别及应用

国家自然科学基金

0+阅读 · 2011年12月31日

基于边缘点的折反射图像立体匹配与三维重建研究

国家自然科学基金

0+阅读 · 2009年12月31日

人类视觉关于图像质量感知模型的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于小波变换的仿射不变形状表示算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

高分辨率SAR复杂场景建模与基于场景指引的目标检测

国家自然科学基金

1+阅读 · 2009年12月31日

生物视觉信息处理机制建模及形状目标识别

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员