视频字幕描述的语义组网络 (Semantic Grouping Network for Video Captioning) - 专知论文

会员服务 ·

0

视频描述生成（Video Caption） · GROUP · 解码 · Networking · contrastive ·

2021 年 2 月 1 日

Semantic Grouping Network for Video Captioning

翻译：视频字幕描述的语义组网络

Hobin Ryu,Sunghun Kang,Haeyong Kang,Chang D. Yoo

from arxiv, AAAI 2021

This paper considers a video caption generating network referred to as Semantic Grouping Network (SGN) that attempts (1) to group video frames with discriminating word phrases of partially decoded caption and then (2) to decode those semantically aligned groups in predicting the next word. As consecutive frames are not likely to provide unique information, prior methods have focused on discarding or merging repetitive information based only on the input video. The SGN learns an algorithm to capture the most discriminating word phrases of the partially decoded caption and a mapping that associates each phrase to the relevant video frames - establishing this mapping allows semantically related frames to be clustered, which reduces redundancy. In contrast to the prior methods, the continuous feedback from decoded words enables the SGN to dynamically update the video representation that adapts to the partially decoded caption. Furthermore, a contrastive attention loss is proposed to facilitate accurate alignment between a word phrase and video frames without manual annotations. The SGN achieves state-of-the-art performances by outperforming runner-up methods by a margin of 2.1%p and 2.4%p in a CIDEr-D score on MSVD and MSR-VTT datasets, respectively. Extensive experiments demonstrate the effectiveness and interpretability of the SGN.

翻译：本文考虑一个视频字幕生成网络,称为“语义组网”,它试图(1) 将视频框架与部分解码标题的歧视性词句组合在一起,然后(2) 在预测下一个字词时解码这些语义一致的组。由于连续框架不可能提供独特的信息,先前的方法侧重于抛弃或合并仅以输入视频为基础的重复信息。SGN学习一种算法,以捕捉部分解码标题中最有区别的词句,并绘制一种将每个词句与有关视频框架联系起来的绘图——建立这一绘图可以将语义相关的框架集中起来,从而减少冗余力。与以前的方法不同,解码词组的连续反馈使SGN能够动态更新视频表述,使其适应部分解码标题。此外,还提议了对比性关注损失,以便于在没有人工说明的情况下对一个词句和视频框架进行准确的校正一致。 SGNB通过2.1%和2.4%的比差分法,在CIDERMS-D测试中,分别展示了MSVS-D的M-D 和MVSD的M-Crediversiality 和MVD的MValalality 。

8

相关内容

视频描述生成（Video Caption）

视频描述生成（Video Caption）

视频描述生成（Video Caption），就是从视频中自动生成一段描述性文字

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

注意力机制综述

注意力机制综述

专知会员服务

208+阅读 · 2021年1月26日

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

52+阅读 · 2020年4月15日

简明扼要！Python教程手册，206页pdf

简明扼要！Python教程手册，206页pdf

专知会员服务

48+阅读 · 2020年3月24日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【WWW2020-北邮】结构深度聚类网络，Structural Deep Clustering Network

【WWW2020-北邮】结构深度聚类网络，Structural Deep Clustering Network

专知会员服务

94+阅读 · 2020年2月14日

【机器学习面试】《Machine Learning Interviews - YouTube》by Huyen Chip [Senior Deep Learning Engineer, NVIDIA]

【机器学习面试】《Machine Learning Interviews - YouTube》by Huyen Chip [Senior Deep Learning Engineer, NVIDIA]

专知会员服务

44+阅读 · 2019年12月24日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【WWW2020】结构深度聚类网络， Structural Deep Clustering Network，北京邮电大学

【WWW2020】结构深度聚类网络， Structural Deep Clustering Network，北京邮电大学

专知

31+阅读 · 2020年2月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Image Captioning 36页最新综述， 161篇参考文献

Image Captioning 36页最新综述， 161篇参考文献

专知

90+阅读 · 2018年10月23日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

7+阅读 · 2017年7月11日

Non-Autoregressive Coarse-to-Fine Video Captioning

Arxiv

1+阅读 · 2021年3月24日

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Arxiv

4+阅读 · 2020年1月11日

Multimodal Semantic Attention Network for Video Captioning

Arxiv

4+阅读 · 2019年5月8日

Relation-aware Graph Attention Network for Visual Question Answering

Arxiv

4+阅读 · 2019年3月29日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Recurrent Fusion Network for Image Captioning

Recurrent Fusion Network for Image Captioning

Arxiv

3+阅读 · 2018年7月31日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

Deep Learning for Video Classification and Captioning

Arxiv

9+阅读 · 2018年2月22日

VIP会员

文章信息

相关主题

视频描述生成（Video Caption）

相关VIP内容

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

注意力机制综述

注意力机制综述

专知会员服务

208+阅读 · 2021年1月26日

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

【视频预测深度学习综述论文】A Review on Deep Learning Techniques for Video Prediction

专知会员服务

52+阅读 · 2020年4月15日

简明扼要！Python教程手册，206页pdf

简明扼要！Python教程手册，206页pdf

专知会员服务

48+阅读 · 2020年3月24日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

【WWW2020-北邮】结构深度聚类网络，Structural Deep Clustering Network

【WWW2020-北邮】结构深度聚类网络，Structural Deep Clustering Network

专知会员服务

94+阅读 · 2020年2月14日

【机器学习面试】《Machine Learning Interviews - YouTube》by Huyen Chip [Senior Deep Learning Engineer, NVIDIA]

【机器学习面试】《Machine Learning Interviews - YouTube》by Huyen Chip [Senior Deep Learning Engineer, NVIDIA]

专知会员服务

44+阅读 · 2019年12月24日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

【WWW2020】结构深度聚类网络， Structural Deep Clustering Network，北京邮电大学

【WWW2020】结构深度聚类网络， Structural Deep Clustering Network，北京邮电大学

专知

31+阅读 · 2020年2月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Image Captioning 36页最新综述， 161篇参考文献

Image Captioning 36页最新综述， 161篇参考文献

专知

90+阅读 · 2018年10月23日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

已删除

将门创投

7+阅读 · 2017年7月11日

相关论文

Non-Autoregressive Coarse-to-Fine Video Captioning

Arxiv

1+阅读 · 2021年3月24日

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Arxiv

4+阅读 · 2020年1月11日

Multimodal Semantic Attention Network for Video Captioning

Arxiv

4+阅读 · 2019年5月8日

Relation-aware Graph Attention Network for Visual Question Answering

Arxiv

4+阅读 · 2019年3月29日

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Hierarchical LSTMs with Adaptive Attention for Visual Captioning

Arxiv

5+阅读 · 2018年12月26日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Recurrent Fusion Network for Image Captioning

Recurrent Fusion Network for Image Captioning

Arxiv

3+阅读 · 2018年7月31日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

Deep Learning for Video Classification and Captioning

Arxiv

9+阅读 · 2018年2月22日

微信扫码咨询专知VIP会员