【北京大学】探索提取跨模态信息进行图像caption，Distilling Cross-Modal Information - 专知

会员服务 ·

1

【北京大学】探索提取跨模态信息进行图像caption，Distilling Cross-Modal Information

2020 年 3 月 5 日 专知

题目： Exploring and Distilling Cross-Modal Information for Image Captioning

摘要： 近年来，基于注意力的编解码模型在图像字幕中得到了广泛的应用。然而，目前的图像理解方法还存在很大的困难。在这项工作中，我们认为这种理解需要对相关图像区域的视觉注意和对相关属性的语义注意。为了实现有效的注意，我们从跨模态的角度对图像字幕进行了研究，提出了一种全局和局部信息挖掘和提取的方法，对视觉和语言中的源信息进行挖掘和提取。它通过提取图像的显著区域组和属性搭配，全局地提供基于标题上下文的图像空间和关系表示形式aspect vector，并参照aspect vector局部地提取细粒度区域和属性进行选词。我们的全神贯注模型在COCO测试集上的离线COCO评估中获得了129.3分的CIDEr分数，在准确性、速度和参数预算方面都有显著的效率。

https://arxiv.org/abs/2002.12585

专知便捷查看

便捷下载，请关注专知公众号（点击上方蓝色专知关注）

后台回复“EDCM” 就可以获取《探索提取跨模态信息进行图像caption，Distilling Cross-Modal Information》专知下载链接

专知，专业可信的人工智能知识分发，让认知协作更快更好！欢迎注册登录专知www.zhuanzhi.ai，获取5000+AI主题干货知识资料！

欢迎微信扫一扫加入专知人工智能知识星球群，获取最新AI专业干货知识教程资料和与专家交流咨询！

点击“ 阅读原文 ”，了解使用专知 ，查看获取5000+AI主题知识资源

登录查看更多

7

相关内容

【CVPR2020-港中文】图像识别中的自注意力探索

【CVPR2020-港中文】图像识别中的自注意力探索

专知会员服务

56+阅读 · 2020年4月29日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

【CVPR2020-中科院计算所】多模态GNN：在视觉信息和场景文字上联合推理

【CVPR2020-中科院计算所】多模态GNN：在视觉信息和场景文字上联合推理

专知会员服务

61+阅读 · 2020年4月7日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知会员服务

56+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【CVPR2020-加州理工大学Devi Parikh】多任务视觉和语言表示学习

【CVPR2020-加州理工大学Devi Parikh】多任务视觉和语言表示学习

专知会员服务

38+阅读 · 2020年2月25日

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

专知会员服务

10+阅读 · 2020年2月15日

【CCL 2019】多模态--基于视觉的跨模态文本生成，复旦大学副教授魏忠钰

【CCL 2019】多模态--基于视觉的跨模态文本生成，复旦大学副教授魏忠钰

专知会员服务

74+阅读 · 2019年11月12日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知

41+阅读 · 2020年3月25日

【WWW2020-新加坡国立大学】知识图谱强化负采样的推荐系统，Reinforced Negative Sampling

【WWW2020-新加坡国立大学】知识图谱强化负采样的推荐系统，Reinforced Negative Sampling

专知

22+阅读 · 2020年3月14日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知

54+阅读 · 2020年3月12日

CCKS 2019 知识图谱评测技术报告：实体、关系、事件及问答

CCKS 2019 知识图谱评测技术报告：实体、关系、事件及问答

专知

24+阅读 · 2020年3月11日

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

专知

38+阅读 · 2020年3月5日

Capsule Networks，胶囊网络，57页ppt，布法罗大学

Capsule Networks，胶囊网络，57页ppt，布法罗大学

专知

12+阅读 · 2020年2月29日

【综述】视频摘要最新综述文章，附45页综述全文下载

【综述】视频摘要最新综述文章，附45页综述全文下载

专知

30+阅读 · 2019年10月2日

【中科院计算所】图卷积神经网络及其应用

【中科院计算所】图卷积神经网络及其应用

专知

39+阅读 · 2019年8月29日

【CVPR2019】MIT教程-使用GAN进行图像转换-附73页slides

【CVPR2019】MIT教程-使用GAN进行图像转换-附73页slides

专知

20+阅读 · 2019年6月17日

【干货】让遥感图像活起来：遥感图像描述生成的模型与数据集探索

【干货】让遥感图像活起来：遥感图像描述生成的模型与数据集探索

专知

24+阅读 · 2018年1月2日

Simple Multi-Resolution Representation Learning for Human Pose Estimation

Simple Multi-Resolution Representation Learning for Human Pose Estimation

Arxiv

6+阅读 · 2020年4月14日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

Object Hallucination in Image Captioning

Arxiv

3+阅读 · 2019年3月29日

Describing like humans: on diversity in image captioning

Arxiv

3+阅读 · 2019年3月28日

Improving Image Captioning by Leveraging Knowledge Graphs

Arxiv

8+阅读 · 2019年1月25日

Predicting Visual Features from Text for Image and Video Caption Retrieval

Arxiv

5+阅读 · 2018年7月14日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Fine-grained Video Classification and Captioning

Arxiv

7+阅读 · 2018年4月24日

Entity-aware Image Caption Generation

Arxiv

7+阅读 · 2018年4月21日

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月3日

VIP会员

相关主题

相关VIP内容

【CVPR2020-港中文】图像识别中的自注意力探索

【CVPR2020-港中文】图像识别中的自注意力探索

专知会员服务

56+阅读 · 2020年4月29日

【SIGIR2020】学习词项区分性，Learning Term Discrimination

【SIGIR2020】学习词项区分性，Learning Term Discrimination

专知会员服务

16+阅读 · 2020年4月28日

【CVPR2020-中科院计算所】多模态GNN：在视觉信息和场景文字上联合推理

【CVPR2020-中科院计算所】多模态GNN：在视觉信息和场景文字上联合推理

专知会员服务

61+阅读 · 2020年4月7日

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知会员服务

96+阅读 · 2020年3月25日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知会员服务

56+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【CVPR2020-加州理工大学Devi Parikh】多任务视觉和语言表示学习

【CVPR2020-加州理工大学Devi Parikh】多任务视觉和语言表示学习

专知会员服务

38+阅读 · 2020年2月25日

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

【ICLR2020-牛津大学】自动发现和学习新的视觉类别与排名统计，13页pdf，Automatically Discovering and Learning New Visual Categories with Ranking Statistics

专知会员服务

10+阅读 · 2020年2月15日

【CCL 2019】多模态--基于视觉的跨模态文本生成，复旦大学副教授魏忠钰

【CCL 2019】多模态--基于视觉的跨模态文本生成，复旦大学副教授魏忠钰

专知会员服务

74+阅读 · 2019年11月12日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

图卷积神经网络蒸馏知识，Distillating Knowledge from GCN

专知

41+阅读 · 2020年3月25日

【WWW2020-新加坡国立大学】知识图谱强化负采样的推荐系统，Reinforced Negative Sampling

【WWW2020-新加坡国立大学】知识图谱强化负采样的推荐系统，Reinforced Negative Sampling

专知

22+阅读 · 2020年3月14日

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

【MIT-伯克利-ICLR2020】对比表示蒸馏，Contrastive Representation Distillation

专知

54+阅读 · 2020年3月12日

CCKS 2019 知识图谱评测技术报告：实体、关系、事件及问答

CCKS 2019 知识图谱评测技术报告：实体、关系、事件及问答

专知

24+阅读 · 2020年3月11日

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

Video Description视频描述综述论文-方法、数据集和评估指标，UWA

专知

38+阅读 · 2020年3月5日

Capsule Networks，胶囊网络，57页ppt，布法罗大学

Capsule Networks，胶囊网络，57页ppt，布法罗大学

专知

12+阅读 · 2020年2月29日

【综述】视频摘要最新综述文章，附45页综述全文下载

【综述】视频摘要最新综述文章，附45页综述全文下载

专知

30+阅读 · 2019年10月2日

【中科院计算所】图卷积神经网络及其应用

【中科院计算所】图卷积神经网络及其应用

专知

39+阅读 · 2019年8月29日

【CVPR2019】MIT教程-使用GAN进行图像转换-附73页slides

【CVPR2019】MIT教程-使用GAN进行图像转换-附73页slides

专知

20+阅读 · 2019年6月17日

【干货】让遥感图像活起来：遥感图像描述生成的模型与数据集探索

【干货】让遥感图像活起来：遥感图像描述生成的模型与数据集探索

专知

24+阅读 · 2018年1月2日

相关论文

Simple Multi-Resolution Representation Learning for Human Pose Estimation

Simple Multi-Resolution Representation Learning for Human Pose Estimation

Arxiv

6+阅读 · 2020年4月14日

Prime Sample Attention in Object Detection

Arxiv

13+阅读 · 2019年4月9日

Object Hallucination in Image Captioning

Arxiv

3+阅读 · 2019年3月29日

Describing like humans: on diversity in image captioning

Arxiv

3+阅读 · 2019年3月28日

Improving Image Captioning by Leveraging Knowledge Graphs

Arxiv

8+阅读 · 2019年1月25日

Predicting Visual Features from Text for Image and Video Caption Retrieval

Arxiv

5+阅读 · 2018年7月14日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Fine-grained Video Classification and Captioning

Arxiv

7+阅读 · 2018年4月24日

Entity-aware Image Caption Generation

Arxiv

7+阅读 · 2018年4月21日

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月3日

大家都在搜

大型语言模型

CMU博士论文

久别重逢话双塔

无人机航拍交通事故现场勘查处置系统——行业第一的警用事故处理软件

微信扫码咨询专知VIP会员