LAVTERR: 学习通过图像和生成说明来辅助的视觉和文字统一表达方式 (LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation) - 专知论文

会员服务 ·

0

学成 · 图像字幕 · 相似度 · 表示 · 注意力机制 ·

2021 年 10 月 19 日

LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation

翻译：LAVTERR: 学习通过图像和生成说明来辅助的视觉和文字统一表达方式

Mohammad Abuzar Shaikh,Zhanghexuan Ji,Dana Moukheiber,Yan Shen,Sargur Srihari,Mingchen Gao

from arxiv, 14 pages, 10 Figures, 5 Tables

Pre-training visual and textual representations from large-scale image-text pairs is becoming a standard approach for many downstream vision-language tasks. The transformer-based models learn inter and intra-modal attention through a list of self-supervised learning tasks. This paper proposes LAViTeR, a novel architecture for visual and textual representation learning. The main module, Visual Textual Alignment (VTA) will be assisted by two auxiliary tasks, GAN-based image synthesis and Image Captioning. We also propose a new evaluation metric measuring the similarity between the learnt visual and textual embedding. The experimental results on two public datasets, CUB and MS-COCO, demonstrate superior visual and textual representation alignment in the joint feature embedding space

翻译：以变压器为基础的模型通过一份自我监督的学习任务清单,学习了不同模式和不同模式内部的注意。本文提议LAViTER,这是视觉和文字表述学习的新结构。主要模块,即视觉文本调整(VTA)将辅助两项辅助任务,即基于GAN的图像合成和图像描述。我们还提议一个新的评价指标,衡量所学视觉和文字嵌入之间的相似性。两个公共数据集(CUB和MS-COCO)的实验结果显示在联合功能嵌入空间的视觉和文字代表的高度一致。

0

相关内容

【DeepMind】多模态预训练模型概述，37页ppt

【DeepMind】多模态预训练模型概述，37页ppt

专知会员服务

95+阅读 · 2021年7月2日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

最新《模仿学习(Imitation Learning》进展报告, 加州理工Yisong Yue教授，附下载

最新《模仿学习(Imitation Learning》进展报告, 加州理工Yisong Yue教授，附下载

专知会员服务

41+阅读 · 2020年12月6日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

专知会员服务

17+阅读 · 2020年3月9日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

专知会员服务

91+阅读 · 2019年12月16日

【报告推荐】模仿学习前沿进展，62页ppt，New Frontiers in Imitation Learning

【报告推荐】模仿学习前沿进展，62页ppt，New Frontiers in Imitation Learning

专知会员服务

39+阅读 · 2019年11月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

专知

11+阅读 · 2018年3月20日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

CLIP-Lite: Information Efficient Visual Representation Learning from Textual Annotations

Arxiv

0+阅读 · 2021年12月14日

LAFITE: Towards Language-Free Training for Text-to-Image Generation

Arxiv

0+阅读 · 2021年12月13日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Arxiv

6+阅读 · 2020年4月4日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Unified Vision-Language Pre-Training for Image Captioning and VQA

Unified Vision-Language Pre-Training for Image Captioning and VQA

Arxiv

8+阅读 · 2019年10月3日

Jointly Learning Entity and Relation Representations for Entity Alignment

Arxiv

3+阅读 · 2019年9月20日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Arxiv

7+阅读 · 2018年5月21日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【DeepMind】多模态预训练模型概述，37页ppt

【DeepMind】多模态预训练模型概述，37页ppt

专知会员服务

95+阅读 · 2021年7月2日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

最新《模仿学习(Imitation Learning》进展报告, 加州理工Yisong Yue教授，附下载

最新《模仿学习(Imitation Learning》进展报告, 加州理工Yisong Yue教授，附下载

专知会员服务

41+阅读 · 2020年12月6日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

专知会员服务

17+阅读 · 2020年3月9日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

【图机器学习论文】综述：网络表示学习（Network Representation Learning: A Survey）

专知会员服务

91+阅读 · 2019年12月16日

【报告推荐】模仿学习前沿进展，62页ppt，New Frontiers in Imitation Learning

【报告推荐】模仿学习前沿进展，62页ppt，New Frontiers in Imitation Learning

专知会员服务

39+阅读 · 2019年11月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

【论文推荐】最新6篇图像描述生成相关论文—语言为枢纽、细粒度、生成器、注意力机制、策略梯度优化、判别性目标

专知

11+阅读 · 2018年3月20日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

CLIP-Lite: Information Efficient Visual Representation Learning from Textual Annotations

Arxiv

0+阅读 · 2021年12月14日

LAFITE: Towards Language-Free Training for Text-to-Image Generation

Arxiv

0+阅读 · 2021年12月13日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Evaluating Multimodal Representations on Visual Semantic Textual Similarity

Arxiv

6+阅读 · 2020年4月4日

Evolving Losses for Unsupervised Video Representation Learning

Arxiv

23+阅读 · 2020年2月26日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Unified Vision-Language Pre-Training for Image Captioning and VQA

Unified Vision-Language Pre-Training for Image Captioning and VQA

Arxiv

8+阅读 · 2019年10月3日

Jointly Learning Entity and Relation Representations for Entity Alignment

Arxiv

3+阅读 · 2019年9月20日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Arxiv

7+阅读 · 2018年5月21日

微信扫码咨询专知VIP会员