InternImage: 使用可变形卷积探索大规模视觉基础模型 (InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions) - 专知论文

会员服务 ·

0

CNN · 变形 · 卷积 · ADE · 大模型 ·

2023 年 4 月 17 日

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

翻译：InternImage: 使用可变形卷积探索大规模视觉基础模型

Wenhai Wang,Jifeng Dai,Zhe Chen,Zhenhang Huang,Zhiqi Li,Xizhou Zhu,Xiaowei Hu,Tong Lu,Lewei Lu,Hongsheng Li,Xiaogang Wang,Yu Qiao

from arxiv, Accepted to CVPR 2023

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, and ADE20K. It is worth mentioning that InternImage-H achieved a new record 65.4 mAP on COCO test-dev and 62.9 mIoU on ADE20K, outperforming current leading CNNs and ViTs. The code will be released at https://github.com/OpenGVLab/InternImage.

翻译：与近年来大规模视觉Transformer（ViT）的巨大进展相比，基于卷积神经网络（CNN）的大规模模型仍处于早期阶段。本文提出了一种新的基于CNN的大规模基础模型，称为InternImage，可以像ViT一样从增加参数和训练数据中获得收益。与最近关注大型密集内核的CNN不同，InternImage以可变形卷积为核心运算符，因此我们的模型不仅具有识别和分割等下游任务所需的大有效感受野，而且具有适应输入和任务信息的自适应空间聚合。因此，所提出的InternImage减少了传统CNN的严格归纳偏见，并使从类似ViT的大规模参数和大量数据中学习更强大和更强健的模式成为可能。我们的模型的有效性在ImageNet、COCO和ADE20K等具有挑战性的基准测试上得到证明。值得一提的是，InternImage-H在COCO测试中实现了65.4 mAP的新纪录，在ADE20K上实现了62.9 mIoU，在性能上优于当前领先的CNN和ViT。代码将在https://github.com/OpenGVLab/InternImage上发布。

2

相关内容

CNN

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【上海交大】可解释CNN的对象分类，Interpretable CNNs for Object Classification

专知会员服务

54+阅读 · 2020年3月14日

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

专知会员服务

46+阅读 · 2020年3月13日

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

专知会员服务

17+阅读 · 2020年3月9日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

用CNN做基础模型，可变形卷积InternImage实现检测分割新纪录！

用CNN做基础模型，可变形卷积InternImage实现检测分割新纪录！

机器之心

1+阅读 · 2022年11月17日

COCO新纪录64.5mAP！InternImage：注入新机制，扩展DCNv3，探索视觉大模型

COCO新纪录64.5mAP！InternImage：注入新机制，扩展DCNv3，探索视觉大模型

极市平台

0+阅读 · 2022年11月14日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

双色双光子激光快速直写大规模特征尺寸<50nm纳米结构阵列关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

核心蛋白聚糖（decorin）缺失的肿瘤微环境与结直肠癌发生和转移机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

CFRP/钛合金叠层制孔过程中切削热对加工损伤和缺陷形成的影响机制

国家自然科学基金

0+阅读 · 2013年12月31日

深空通信中的自适应容错图像编码器实现方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

介电弹性材料在力电耦合变形下的击穿破坏行为研究

国家自然科学基金

0+阅读 · 2013年12月31日

仿生结构层状复合刀具制备及其高速切削损伤演变、失效机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于视觉感知和形状语义的快速水平集图像分割方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

剧烈塑性变形条件下金属间化合物相变研究

国家自然科学基金

0+阅读 · 2012年12月31日

de novo预测蛋白质结构的并行元启发方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

不可压缩湍流的能量级串及拟序结构的生成机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Deformable Convolutions and LSTM-based Flexible Event Frame Fusion Network for Motion Deblurring

Arxiv

0+阅读 · 2023年6月1日

Can Large Pre-trained Models Help Vision Models on Perception Tasks?

Arxiv

0+阅读 · 2023年6月1日

Too Large; Data Reduction for Vision-Language Pre-Training

Too Large; Data Reduction for Vision-Language Pre-Training

Arxiv

0+阅读 · 2023年5月31日

A Survey on Large Language Models for Recommendation

Arxiv

12+阅读 · 2023年5月31日

V1T: large-scale mouse V1 response prediction using a Vision Transformer

Arxiv

0+阅读 · 2023年5月30日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Arxiv

25+阅读 · 2023年2月20日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Dynamic Zoom-in Network for Fast Object Detection in Large Images

Arxiv

20+阅读 · 2018年3月27日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

VIP会员

文章信息

相关主题

相关VIP内容

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【上海交大】可解释CNN的对象分类，Interpretable CNNs for Object Classification

专知会员服务

54+阅读 · 2020年3月14日

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

【上海交通大学-张拳石】可解释CNN，Interpretable CNNs for Object Classification

专知会员服务

46+阅读 · 2020年3月13日

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

专知会员服务

17+阅读 · 2020年3月9日

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

【ICLR-2020】网络反卷积，NETWORK DECONVOLUTION

专知会员服务

39+阅读 · 2020年2月21日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

【AAAI2020论文】关注实体以更好地理解文本（Attending to Entities for Better Text Understanding）

专知会员服务

25+阅读 · 2019年11月15日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

热门VIP内容

开通专知VIP会员享更多权益服务

不确定环境下无人机三维路径规划研究 | 221页

远征作战军事后勤规划

大语言模型将如何改变军事指挥结构

美陆军能力集成与开发系统（ACIDS）流程指南 | 2025最新122页

相关资讯

用CNN做基础模型，可变形卷积InternImage实现检测分割新纪录！

用CNN做基础模型，可变形卷积InternImage实现检测分割新纪录！

机器之心

1+阅读 · 2022年11月17日

COCO新纪录64.5mAP！InternImage：注入新机制，扩展DCNv3，探索视觉大模型

COCO新纪录64.5mAP！InternImage：注入新机制，扩展DCNv3，探索视觉大模型

极市平台

0+阅读 · 2022年11月14日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Deformable Convolutions and LSTM-based Flexible Event Frame Fusion Network for Motion Deblurring

Arxiv

0+阅读 · 2023年6月1日

Can Large Pre-trained Models Help Vision Models on Perception Tasks?

Arxiv

0+阅读 · 2023年6月1日

Too Large; Data Reduction for Vision-Language Pre-Training

Too Large; Data Reduction for Vision-Language Pre-Training

Arxiv

0+阅读 · 2023年5月31日

A Survey on Large Language Models for Recommendation

Arxiv

12+阅读 · 2023年5月31日

V1T: large-scale mouse V1 response prediction using a Vision Transformer

Arxiv

0+阅读 · 2023年5月30日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Arxiv

25+阅读 · 2023年2月20日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

TinyBERT: Distilling BERT for Natural Language Understanding

TinyBERT: Distilling BERT for Natural Language Understanding

Arxiv

11+阅读 · 2019年9月23日

Dynamic Zoom-in Network for Fast Object Detection in Large Images

Arxiv

20+阅读 · 2018年3月27日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

相关基金

双色双光子激光快速直写大规模特征尺寸<50nm纳米结构阵列关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

核心蛋白聚糖（decorin）缺失的肿瘤微环境与结直肠癌发生和转移机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

CFRP/钛合金叠层制孔过程中切削热对加工损伤和缺陷形成的影响机制

国家自然科学基金

0+阅读 · 2013年12月31日

深空通信中的自适应容错图像编码器实现方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

介电弹性材料在力电耦合变形下的击穿破坏行为研究

国家自然科学基金

0+阅读 · 2013年12月31日

仿生结构层状复合刀具制备及其高速切削损伤演变、失效机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于视觉感知和形状语义的快速水平集图像分割方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

剧烈塑性变形条件下金属间化合物相变研究

国家自然科学基金

0+阅读 · 2012年12月31日

de novo预测蛋白质结构的并行元启发方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

不可压缩湍流的能量级串及拟序结构的生成机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员