视频字幕多式语义关注网络 (Multimodal Semantic Attention Network for Video Captioning) - 专知论文

会员服务 ·

0

多峰值 · 视频描述生成（Video Caption） · 注意力机制 · Weight · Networking ·

2019 年 5 月 8 日

Multimodal Semantic Attention Network for Video Captioning

翻译：视频字幕多式语义关注网络

Liang Sun,Bing Li,Chunfeng Yuan,Zhengjun Zha,Weiming Hu

from arxiv, 6 pages, 4 figures, accepted by IEEE International Conference on Multimedia and Expo (ICME) 2019

Inspired by the fact that different modalities in videos carry complementary information, we propose a Multimodal Semantic Attention Network(MSAN), which is a new encoder-decoder framework incorporating multimodal semantic attributes for video captioning. In the encoding phase, we detect and generate multimodal semantic attributes by formulating it as a multi-label classification problem. Moreover, we add auxiliary classification loss to our model that can obtain more effective visual features and high-level multimodal semantic attribute distributions for sufficient video encoding. In the decoding phase, we extend each weight matrix of the conventional LSTM to an ensemble of attribute-dependent weight matrices, and employ attention mechanism to pay attention to different attributes at each time of the captioning process. We evaluate algorithm on two popular public benchmarks: MSVD and MSR-VTT, achieving competitive results with current state-of-the-art across six evaluation metrics.

翻译：由于视频中的不同模式含有补充信息,我们提议建立一个多式语义注意网(MSAN),这是一个包含多式语义说明的新的编码器解码器框架,其中包含了用于视频字幕的多式语义特性。在编码阶段,我们通过将它作为一个多标签分类问题来检测和生成多式语义特性。此外,我们还将辅助分类损失添加到我们的模型中,以获得更有效的视觉特征和高级多式联运语义属性分布,从而实现足够的视频编码。在解码阶段,我们将常规LSTM的每个重量矩阵扩展至一个依赖属性的重量矩阵组合,并采用关注机制,在每次说明过程的每个时间关注不同属性。我们对两种受欢迎的公共基准:MSVD和MSR-VTT进行算法评估,在六个评价指标中与当前的最新技术取得竞争性结果。

4

相关内容

多峰值

【AAAI 2020】双曲图注意力网络，Hyperbolic Graph Attention Network

【AAAI 2020】双曲图注意力网络，Hyperbolic Graph Attention Network

专知会员服务

94+阅读 · 2020年6月15日

【WWW 2019】异质图注意力网络，Heterogeneous Graph Attention Network

【WWW 2019】异质图注意力网络，Heterogeneous Graph Attention Network

专知会员服务

75+阅读 · 2020年6月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

专知会员服务

15+阅读 · 2020年1月9日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【ACM MM 2019 】MMGCN：用于微视频个性化推荐的多模图卷积网络（MMGCN：Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video）

【ACM MM 2019 】MMGCN：用于微视频个性化推荐的多模图卷积网络（MMGCN：Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video）

专知会员服务

57+阅读 · 2019年11月20日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

CVPR 2019视频描述（video caption）相关论文总结

CVPR 2019视频描述（video caption）相关论文总结

极市平台

8+阅读 · 2019年10月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

一文带你读懂 SegNet（语义分割）

一文带你读懂 SegNet（语义分割）

AI研习社

19+阅读 · 2019年3月9日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

胶囊网络（Capsule Network）在文本分类中的探索

胶囊网络（Capsule Network）在文本分类中的探索

PaperWeekly

13+阅读 · 2018年4月5日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

已删除

将门创投

12+阅读 · 2017年10月13日

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Arxiv

4+阅读 · 2020年1月11日

Memory-Attended Recurrent Network for Video Captioning

Arxiv

7+阅读 · 2019年5月10日

An End-to-End Baseline for Video Captioning

Arxiv

6+阅读 · 2019年4月4日

Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning

Arxiv

6+阅读 · 2019年2月27日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Recurrent Fusion Network for Image Captioning

Recurrent Fusion Network for Image Captioning

Arxiv

3+阅读 · 2018年7月31日

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

Arxiv

6+阅读 · 2018年4月15日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

Deep Learning for Video Classification and Captioning

Arxiv

9+阅读 · 2018年2月22日

VIP会员

文章信息

相关主题

视频描述生成（Video Caption）

注意力机制

相关VIP内容

【AAAI 2020】双曲图注意力网络，Hyperbolic Graph Attention Network

【AAAI 2020】双曲图注意力网络，Hyperbolic Graph Attention Network

专知会员服务

94+阅读 · 2020年6月15日

【WWW 2019】异质图注意力网络，Heterogeneous Graph Attention Network

【WWW 2019】异质图注意力网络，Heterogeneous Graph Attention Network

专知会员服务

75+阅读 · 2020年6月14日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

【字节跳动&Adobe】图割多模态风格迁移，Multimodal Style Transfer via Graph Cuts

专知会员服务

15+阅读 · 2020年1月9日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【ACM MM 2019 】MMGCN：用于微视频个性化推荐的多模图卷积网络（MMGCN：Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video）

【ACM MM 2019 】MMGCN：用于微视频个性化推荐的多模图卷积网络（MMGCN：Multi-modal Graph Convolution Network for Personalized Recommendation of Micro-video）

专知会员服务

57+阅读 · 2019年11月20日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《多智能体不确定环境追逃博弈研究》216页

美智库最新发布《解放军"人机编组协同作战"发展路径：理论与实践》53页

现代战争"杀伤区"理论：空间尺度与结构特征、控制手段与毁伤机制、生存策略与战线转移

《俄军无人机创新技术或已在乌克兰达成"战场空中封锁"作战效果》最新18页报告

相关资讯

CVPR 2019视频描述（video caption）相关论文总结

CVPR 2019视频描述（video caption）相关论文总结

极市平台

8+阅读 · 2019年10月16日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

一文带你读懂 SegNet（语义分割）

一文带你读懂 SegNet（语义分割）

AI研习社

19+阅读 · 2019年3月9日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

【论文推荐】最新八篇情感分析相关论文—注意力网络、多模态情感分析、情感分析局限性、跨语言情感分类、多语言情感分析

专知

52+阅读 · 2018年6月28日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

胶囊网络（Capsule Network）在文本分类中的探索

胶囊网络（Capsule Network）在文本分类中的探索

PaperWeekly

13+阅读 · 2018年4月5日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

已删除

将门创投

12+阅读 · 2017年10月13日

相关论文

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Arxiv

4+阅读 · 2020年1月11日

Memory-Attended Recurrent Network for Video Captioning

Arxiv

7+阅读 · 2019年5月10日

An End-to-End Baseline for Video Captioning

Arxiv

6+阅读 · 2019年4月4日

Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning

Arxiv

6+阅读 · 2019年2月27日

A sequential guiding network with attention for image captioning

A sequential guiding network with attention for image captioning

Arxiv

5+阅读 · 2019年2月8日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Recurrent Fusion Network for Image Captioning

Recurrent Fusion Network for Image Captioning

Arxiv

3+阅读 · 2018年7月31日

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

Arxiv

6+阅读 · 2018年4月15日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

Deep Learning for Video Classification and Captioning

Arxiv

9+阅读 · 2018年2月22日

微信扫码咨询专知VIP会员