革命:扭转革命对视觉认知的固有影响 (Involution: Inverting the Inherence of Convolution for Visual Recognition) - 专知论文

会员服务 ·

0

求逆 · 卷积 · Neural Networks · Principle · Vision ·

2021 年 3 月 10 日

Involution: Inverting the Inherence of Convolution for Visual Recognition

翻译：革命:扭转革命对视觉认知的固有影响

Duo Li,Jie Hu,Changhu Wang,Xiangtai Li,Qi She,Lei Zhu,Tong Zhang,Qifeng Chen

from arxiv, Accepted to CVPR 2021. Code and models are available at https://github.com/d-li14/involution

Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision. In this work, we rethink the inherent principles of standard convolution for vision tasks, specifically spatial-agnostic and channel-specific. Instead, we present a novel atomic operation for deep neural networks by inverting the aforementioned design principles of convolution, coined as involution. We additionally demystify the recent popular self-attention operator and subsume it into our involution family as an over-complicated instantiation. The proposed involution operator could be leveraged as fundamental bricks to build the new generation of neural networks for visual recognition, powering different deep learning models on several prevalent benchmarks, including ImageNet classification, COCO detection and segmentation, together with Cityscapes segmentation. Our involution-based models improve the performance of convolutional baselines using ResNet-50 by up to 1.6% top-1 accuracy, 2.5% and 2.4% bounding box AP, and 4.7% mean IoU absolutely while compressing the computational cost to 66%, 65%, 72%, and 57% on the above benchmarks, respectively. Code and pre-trained models for all the tasks are available at https://github.com/d-li14/involution.

翻译：进化是现代神经网络的核心要素,引发了深层次视觉学习的激增。在这项工作中,我们重新思考了视觉任务,特别是空间-不可知性和频道特定任务的标准进化的内在原则。相反,我们为深神经网络展示了一种新的原子操作,通过颠倒上述的演化设计原则,以进化的形式出现。我们进一步解开了最近流行的自我关注操作器的神秘性,并将其作为过度复杂的瞬间反应纳入我们的进化家庭。拟议的进化操作器可以被作为基本砖块,用于建造新一代神经网络,用于视觉识别,在包括图像网络分类、COCOCO探测和分解在内的一些流行基准上,赋予不同的深层学习模型以动力。我们的进化模型利用ResNet-50改进进化基线的性能,最高精确度达到1.6 %-1,最高精确度为2.5%和2.4%绑定的邮箱AP,以及4.7%的IoU绝对值,同时将计算成本压缩到66%、65%、72%和57%的深入学习模型,分别用于以上/进化模式。

6

相关内容

哈工大SCIR 14篇长文被ACL 2021主会/Findings和IJCAI 2021录用

专知会员服务

56+阅读 · 2021年5月10日

“内卷“算子超越卷积、自注意力机制：CVPR2021强大的神经网络新算子involution

专知会员服务

28+阅读 · 2021年3月27日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

5+阅读 · 2017年8月15日

TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition

Arxiv

4+阅读 · 2020年12月18日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Gated Channel Transformation for Visual Recognition

Arxiv

4+阅读 · 2020年3月27日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

Arxiv

9+阅读 · 2019年8月28日

Local Relation Networks for Image Recognition

Local Relation Networks for Image Recognition

Arxiv

4+阅读 · 2019年4月25日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

4+阅读 · 2019年4月18日

Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning

Arxiv

6+阅读 · 2019年2月27日

Global-and-local attention networks for visual recognition

Global-and-local attention networks for visual recognition

Arxiv

5+阅读 · 2018年9月6日

Fully Convolutional Networks for Semantic Segmentation

Arxiv

3+阅读 · 2015年3月8日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

哈工大SCIR 14篇长文被ACL 2021主会/Findings和IJCAI 2021录用

专知会员服务

56+阅读 · 2021年5月10日

“内卷“算子超越卷积、自注意力机制：CVPR2021强大的神经网络新算子involution

专知会员服务

28+阅读 · 2021年3月27日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

NeurIPS 2025 | 自动化所新作速览（一）

大型语言模型（LLM）赋能的知识图谱构建：综述

NeurIPS 2025 | 自动化所新作速览（二）

领域特定文本分类中的预训练语言模型新进展：系统综述

相关资讯

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

5+阅读 · 2017年8月15日

相关论文

TDN: Temporal Difference Networks for Efficient Action Recognition

TDN: Temporal Difference Networks for Efficient Action Recognition

Arxiv

4+阅读 · 2020年12月18日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Gated Channel Transformation for Visual Recognition

Arxiv

4+阅读 · 2020年3月27日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images

Arxiv

9+阅读 · 2019年8月28日

Local Relation Networks for Image Recognition

Local Relation Networks for Image Recognition

Arxiv

4+阅读 · 2019年4月25日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

4+阅读 · 2019年4月18日

Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning

Arxiv

6+阅读 · 2019年2月27日

Global-and-local attention networks for visual recognition

Global-and-local attention networks for visual recognition

Arxiv

5+阅读 · 2018年9月6日

Fully Convolutional Networks for Semantic Segmentation

Arxiv

3+阅读 · 2015年3月8日

微信扫码咨询专知VIP会员