外观视野变异器精美可见视觉分类 (Feature Fusion Vision Transformer Fine-Grained Visual Categorization) - 专知论文

会员服务 ·

0

词元分析器 · Vision · Weight · 变换 · CC ·

2021 年 7 月 6 日

Feature Fusion Vision Transformer Fine-Grained Visual Categorization

翻译：外观视野变异器精美可见视觉分类

Jun Wang,Xiaohan Yu,Yongsheng Gao

from arxiv, 9 pages, 2 figures, 3 tables

The core for tackling the fine-grained visual categorization (FGVC) is to learn subtleyet discriminative features. Most previous works achieve this by explicitly selecting thediscriminative parts or integrating the attention mechanism via CNN-based approaches.However, these methods enhance the computational complexity and make the modeldominated by the regions containing the most of the objects. Recently, vision trans-former (ViT) has achieved SOTA performance on general image recognition tasks. Theself-attention mechanism aggregates and weights the information from all patches to theclassification token, making it perfectly suitable for FGVC. Nonetheless, the classifi-cation token in the deep layer pays more attention to the global information, lackingthe local and low-level features that are essential for FGVC. In this work, we proposea novel pure transformer-based framework Feature Fusion Vision Transformer (FFVT)where we aggregate the important tokens from each transformer layer to compensate thelocal, low-level and middle-level information. We design a novel token selection mod-ule called mutual attention weight selection (MAWS) to guide the network effectivelyand efficiently towards selecting discriminative tokens without introducing extra param-eters. We verify the effectiveness of FFVT on three benchmarks where FFVT achievesthe state-of-the-art performance.

翻译：解决细微视觉分类(FGVC)的核心是学习细微的差别化特征。大部分以前的工作是通过明确选择偏差部分或通过有线电视新闻网的方法整合关注机制来实现这一点。但是,这些方法提高了计算的复杂性,并使含有大多数物体的区域主导模型。最近, 视觉转换(VIT)在一般图像识别任务上取得了SOTA的性能。自我注意机制将所有补丁的信息汇总和加权到分类符号中, 使信息完全适合FGVC 。然而, 深层的分类标志更加关注全球信息,缺乏对FGVC至关重要的本地和低级别特征。在这项工作中, 我们提出了一个全新的纯基于变异框架“ 变异视野变变变变变变变变变变变变变变变变变变变变”, 将每个变换层的重要标志加在一起, 以补偿本地、低级和中级信息。我们设计了一个新型的标志性选择模块, 即相互关注重度选择(MAWS), 来有效验证业绩网络。

0

相关内容

词元分析器

词元分析器

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

注意力机制综述

注意力机制综述

专知会员服务

83+阅读 · 2021年1月26日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

[NeurIPS 2020 oral] 基于因果干预的弱监督语义分割

专知会员服务

47+阅读 · 2020年10月5日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

55+阅读 · 2019年11月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

Attention最新进展

Attention最新进展

极市平台

5+阅读 · 2020年5月30日

解决ReID中遮挡问题：Pose-Guided Feature Alignment for Occluded ReID

解决ReID中遮挡问题：Pose-Guided Feature Alignment for Occluded ReID

极市平台

9+阅读 · 2020年1月15日

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】基于注意力机制的深度网络HydraPlus-Net(ICCV2017-34)

【泡泡一分钟】基于注意力机制的深度网络HydraPlus-Net(ICCV2017-34)

泡泡机器人SLAM

8+阅读 · 2018年6月9日

【泡泡一分钟】基于多视图卷积网络的草图三维重建技术(3dv-66)

【泡泡一分钟】基于多视图卷积网络的草图三维重建技术(3dv-66)

泡泡机器人SLAM

11+阅读 · 2018年3月31日

文字描述生成视频的开源项目

文字描述生成视频的开源项目

CreateAMind

5+阅读 · 2017年12月31日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

Transformers in Vision: A Survey

Arxiv

0+阅读 · 2021年9月8日

Efficient Vision Transformers via Fine-Grained Manifold Distillation

Efficient Vision Transformers via Fine-Grained Manifold Distillation

Arxiv

0+阅读 · 2021年9月7日

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

Arxiv

0+阅读 · 2021年9月7日

Dense Supervision Propagation for Weakly Supervised Semantic Segmentation on 3D Point Clouds

Arxiv

0+阅读 · 2021年9月3日

Full-Duplex Strategy for Video Object Segmentation

Arxiv

0+阅读 · 2021年9月3日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Arxiv

8+阅读 · 2020年12月7日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

Unsupervised Semantic-based Aggregation of Deep Convolutional Features

Arxiv

8+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

注意力机制综述

注意力机制综述

专知会员服务

83+阅读 · 2021年1月26日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

[NeurIPS 2020 oral] 基于因果干预的弱监督语义分割

专知会员服务

47+阅读 · 2020年10月5日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

【CVPR 2019 | tutorial】计算机视觉的深度强化学习：Deep Reinforcement Learning for Computer Vision

专知会员服务

55+阅读 · 2019年11月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《运用阵营部署粒子滤波器在部分可观测的陆基军事仿真中追踪敌方部队实体位置》2025最新127页

《基于博弈论学习与控制提升复杂自适应系统的韧性》358页

人工智能能否胜任“金穹”的三分钟窗口战争？

《时间受限环境下的规划：连与排级单位的快速规划方法》

相关资讯

Attention最新进展

Attention最新进展

极市平台

5+阅读 · 2020年5月30日

解决ReID中遮挡问题：Pose-Guided Feature Alignment for Occluded ReID

解决ReID中遮挡问题：Pose-Guided Feature Alignment for Occluded ReID

极市平台

9+阅读 · 2020年1月15日

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】基于注意力机制的深度网络HydraPlus-Net(ICCV2017-34)

【泡泡一分钟】基于注意力机制的深度网络HydraPlus-Net(ICCV2017-34)

泡泡机器人SLAM

8+阅读 · 2018年6月9日

【泡泡一分钟】基于多视图卷积网络的草图三维重建技术(3dv-66)

【泡泡一分钟】基于多视图卷积网络的草图三维重建技术(3dv-66)

泡泡机器人SLAM

11+阅读 · 2018年3月31日

文字描述生成视频的开源项目

文字描述生成视频的开源项目

CreateAMind

5+阅读 · 2017年12月31日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

相关论文

Transformers in Vision: A Survey

Arxiv

0+阅读 · 2021年9月8日

Efficient Vision Transformers via Fine-Grained Manifold Distillation

Efficient Vision Transformers via Fine-Grained Manifold Distillation

Arxiv

0+阅读 · 2021年9月7日

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

Arxiv

0+阅读 · 2021年9月7日

Dense Supervision Propagation for Weakly Supervised Semantic Segmentation on 3D Point Clouds

Arxiv

0+阅读 · 2021年9月3日

Full-Duplex Strategy for Video Object Segmentation

Arxiv

0+阅读 · 2021年9月3日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

A Survey on Visual Transformer

Arxiv

19+阅读 · 2020年12月23日

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Arxiv

8+阅读 · 2020年12月7日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

Unsupervised Semantic-based Aggregation of Deep Convolutional Features

Arxiv

8+阅读 · 2018年4月3日

微信扫码咨询专知VIP会员