OAMIXer: 视觉变形器对象意识混合层 (OAMixer: Object-aware Mixing Layer for Vision Transformers) - 专知论文

会员服务 ·

0

层 · 混合 · INTERACT · MoDELS · 变换 ·

2022 年 12 月 13 日

OAMixer: Object-aware Mixing Layer for Vision Transformers

翻译：OAMIXer: 视觉变形器对象意识混合层

Hyunwoo Kang,Sangwoo Mo,Jinwoo Shin

from arxiv, CVPR Transformers for Vision Workshop 2022. First two authors contributed equally

Patch-based models, e.g., Vision Transformers (ViTs) and Mixers, have shown impressive results on various visual recognition tasks, alternating classic convolutional networks. While the initial patch-based models (ViTs) treated all patches equally, recent studies reveal that incorporating inductive bias like spatiality benefits the representations. However, most prior works solely focused on the location of patches, overlooking the scene structure of images. Thus, we aim to further guide the interaction of patches using the object information. Specifically, we propose OAMixer (object-aware mixing layer), which calibrates the patch mixing layers of patch-based models based on the object labels. Here, we obtain the object labels in unsupervised or weakly-supervised manners, i.e., no additional human-annotating cost is necessary. Using the object labels, OAMixer computes a reweighting mask with a learnable scale parameter that intensifies the interaction of patches containing similar objects and applies the mask to the patch mixing layers. By learning an object-centric representation, we demonstrate that OAMixer improves the classification accuracy and background robustness of various patch-based models, including ViTs, MLP-Mixers, and ConvMixers. Moreover, we show that OAMixer enhances various downstream tasks, including large-scale classification, self-supervised learning, and multi-object recognition, verifying the generic applicability of OAMixer

翻译：光学变换器(View Trangers)和混集器等基于补丁模型(ViTs)在各种视觉识别任务上展示了令人印象深刻的结果,交替了经典革命网络。虽然最初的基于补丁模型(ViTs)对所有补丁都一视同仁,但最近的研究表明,包含空间性等感知偏差的隐含性偏差使表达方式受益。然而,大多数先前的工作都仅仅侧重于补丁点的位置,忽略图像的场景结构。因此,我们的目标是利用对象信息进一步指导补丁的相互作用。具体地说,我们提议OAMxer(目标识识识识识的混合层)来校准基于对象标签的基于补丁的通用模型的补丁混合层。在这里,我们以不受监督或弱度超强的超强超强超强超强的超链接方式获得对象标签。我们通过学习对象-核心的精确度(OAmblix)的精确度和精确度(OAmblix)分析模型,我们展示了OA-Clix)的自我分解。

0

相关内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

一类稳态Schödinger-Poisson-Slater方程标准化解的研究

国家自然科学基金

1+阅读 · 2015年12月31日

可延展柔性无机电子互连结构及其机-电综合特性的研究

国家自然科学基金

0+阅读 · 2014年12月31日

CD4+CD25+Foxp3+调节T细胞介导的急性髓系白血病免疫逃逸机制研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

双转子中介轴承振动故障监测预警原理和方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

ErbB4通路激活介导非小细胞肺癌EGFR-TKIs获得性耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Cyclin G1对肝癌干细胞的调控及其在肝癌复发耐药中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Vitamin E脂质体纳米颗粒携带siRNA靶向抑制HCV的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于HHT的超光谱图像高精度分类算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

Vision Transformer Adapter for Dense Predictions

Arxiv

1+阅读 · 2023年2月13日

Video Transformers: A Survey

Arxiv

0+阅读 · 2023年2月13日

Transformers in Time Series: A Survey

Arxiv

0+阅读 · 2023年2月10日

Cross-Layer Retrospective Retrieving via Layer Attention

Arxiv

0+阅读 · 2023年2月10日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Label-aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition

Arxiv

10+阅读 · 2018年4月28日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【ICCV 2021 】Vision Transformer中的相对位置编码

专知会员服务

30+阅读 · 2021年7月30日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Vision Transformer Adapter for Dense Predictions

Arxiv

1+阅读 · 2023年2月13日

Video Transformers: A Survey

Arxiv

0+阅读 · 2023年2月13日

Transformers in Time Series: A Survey

Arxiv

0+阅读 · 2023年2月10日

Cross-Layer Retrospective Retrieving via Layer Attention

Arxiv

0+阅读 · 2023年2月10日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Label-aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition

Arxiv

10+阅读 · 2018年4月28日

相关基金

一类稳态Schödinger-Poisson-Slater方程标准化解的研究

国家自然科学基金

1+阅读 · 2015年12月31日

可延展柔性无机电子互连结构及其机-电综合特性的研究

国家自然科学基金

0+阅读 · 2014年12月31日

CD4+CD25+Foxp3+调节T细胞介导的急性髓系白血病免疫逃逸机制研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于Exemplar-Classifier思想的高分辨率光学遥感影像目标识别研究

国家自然科学基金

2+阅读 · 2013年12月31日

双转子中介轴承振动故障监测预警原理和方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩Navier-Stokes方程和Boltzmann方程解的渐近行为

国家自然科学基金

0+阅读 · 2013年12月31日

ErbB4通路激活介导非小细胞肺癌EGFR-TKIs获得性耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Cyclin G1对肝癌干细胞的调控及其在肝癌复发耐药中的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Vitamin E脂质体纳米颗粒携带siRNA靶向抑制HCV的实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于HHT的超光谱图像高精度分类算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员