多标签通用图像分类和变压器 (General Multi-label Image Classification with Transformers) - 专知论文

会员服务 ·

0

图片分类 · 变换 · 标注 · entity · 情景 ·

2020 年 11 月 27 日

General Multi-label Image Classification with Transformers

翻译：多标签通用图像分类和变压器

Jack Lanchantin,Tianlu Wang,Vicente Ordonez,Yanjun Qi

from arxiv, 13 pages, 7 figures

Multi-label image classification is the task of predicting a set of labels corresponding to objects, attributes or other entities present in an image. In this work we propose the Classification Transformer (C-Tran), a general framework for multi-label image classification that leverages Transformers to exploit the complex dependencies among visual features and labels. Our approach consists of a Transformer encoder trained to predict a set of target labels given an input set of masked labels, and visual features from a convolutional neural network. A key ingredient of our method is a label mask training objective that uses a ternary encoding scheme to represent the state of the labels as positive, negative, or unknown during training. Our model shows state-of-the-art performance on challenging datasets such as COCO and Visual Genome. Moreover, because our model explicitly represents the uncertainty of labels during training, it is more general by allowing us to produce improved results for images with partial or extra label annotations during inference. We demonstrate this additional capability in the COCO, Visual Genome, News500, and CUB image datasets.

翻译：多标签图像分类是预测一组与图像中的物体、属性或其他实体相对应的标签的任务。在这项工作中,我们提出了分类变异器(C-Tran),这是一个多标签图像分类总框架,它利用变异器来利用视觉特征和标签之间的复杂依赖性。我们的方法包括一个经过训练的变异器编码器,用来预测一组目标标签,其中输入了一组隐蔽标签,以及一个神经网络的视觉特征。我们方法的一个关键成分是标签掩码培训目标,它使用一个永久编码方案来表示标签的状态,在培训期间是正的、负的或未知的。我们的模型展示了具有挑战性的数据集(如COCO和视觉基因组)方面的最先进的性能。此外,由于我们的模型明确代表了培训期间标签的不确定性,因此更普遍的做法是允许我们用部分或额外的标签说明来改进图像的结果。我们展示了COCO、视觉基因组、New500和CUB图像数据集中的这种额外能力。

1

相关内容

图片分类

图像分类，顾名思义，是一个输入图像，输出对该图像内容分类的描述的问题。它是计算机视觉的核心，实际应用广泛。

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知会员服务

107+阅读 · 2020年8月30日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

绝对干货！NLP预训练模型：从transformer到albert

绝对干货！NLP预训练模型：从transformer到albert

新智元

13+阅读 · 2019年11月10日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

已删除

将门创投

12+阅读 · 2018年6月25日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

A Universal Representation Transformer Layer for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年9月2日

Multi-Label Text Classification using Attention-based Graph Neural Network

Arxiv

46+阅读 · 2020年3月22日

Active Generative Adversarial Network for Image Classification

Arxiv

4+阅读 · 2019年6月17日

Learning a Deep ConvNet for Multi-label Classification with Partial Labels

Learning a Deep ConvNet for Multi-label Classification with Partial Labels

Arxiv

6+阅读 · 2019年2月26日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Conditional BERT Contextual Augmentation

Conditional BERT Contextual Augmentation

Arxiv

8+阅读 · 2018年12月17日

Attend More Times for Image Captioning

Attend More Times for Image Captioning

Arxiv

6+阅读 · 2018年12月8日

Attentive Generative Adversarial Network for Raindrop Removal from a Single Image

Arxiv

3+阅读 · 2018年5月6日

Pose-Normalized Image Generation for Person Re-identification

Arxiv

5+阅读 · 2018年4月25日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

相关VIP内容

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

Transformer模型-深度学习自然语言处理，17页ppt

Transformer模型-深度学习自然语言处理，17页ppt

专知会员服务

107+阅读 · 2020年8月30日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

绝对干货！NLP预训练模型：从transformer到albert

绝对干货！NLP预训练模型：从transformer到albert

新智元

13+阅读 · 2019年11月10日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

已删除

将门创投

12+阅读 · 2018年6月25日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

A Universal Representation Transformer Layer for Few-Shot Image Classification

Arxiv

7+阅读 · 2020年9月2日

Multi-Label Text Classification using Attention-based Graph Neural Network

Arxiv

46+阅读 · 2020年3月22日

Active Generative Adversarial Network for Image Classification

Arxiv

4+阅读 · 2019年6月17日

Learning a Deep ConvNet for Multi-label Classification with Partial Labels

Learning a Deep ConvNet for Multi-label Classification with Partial Labels

Arxiv

6+阅读 · 2019年2月26日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Conditional BERT Contextual Augmentation

Conditional BERT Contextual Augmentation

Arxiv

8+阅读 · 2018年12月17日

Attend More Times for Image Captioning

Attend More Times for Image Captioning

Arxiv

6+阅读 · 2018年12月8日

Attentive Generative Adversarial Network for Raindrop Removal from a Single Image

Arxiv

3+阅读 · 2018年5月6日

Pose-Normalized Image Generation for Person Re-identification

Arxiv

5+阅读 · 2018年4月25日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

微信扫码咨询专知VIP会员