VLG-Net:视频定位视频语言图匹配网络 (VLG-Net: Video-Language Graph Matching Network for Video Grounding) - 专知论文

会员服务 ·

0

矩 · 图 · INFORMS · Networking · 可交换的 ·

2020 年 11 月 19 日

VLG-Net: Video-Language Graph Matching Network for Video Grounding

翻译：VLG-Net:视频定位视频语言图匹配网络

Sisi Qu,Mattia Soldan,Mengmeng Xu,Jesper Tegner,Bernard Ghanem

from arxiv, 12 pages, 5 figures

Grounding language queries in videos aims at identifying the time interval (or moment) semantically relevant to a language query. The solution to this challenging task demands the understanding of videos' and queries' semantic content and the fine-grained reasoning about their multi-modal interactions. Our key idea is to recast this challenge into an algorithmic graph matching problem. Fueled by recent advances in Graph Neural Networks, we propose to leverage Graph Convolutional Networks to model video and textual information as well as their semantic alignment. To enable the mutual exchange of information across the domains, we design a novel Video-Language Graph Matching Network (VLG-Net) to match video and query graphs. Core ingredients include representation graphs, built on top of video snippets and query tokens separately, which are used for modeling the intra-modality relationships. A Graph Matching layer is adopted for cross-modal context modeling and multi-modal fusion. Finally, moment candidates are created using masked moment attention pooling by fusing the moment's enriched snippet features. We demonstrate superior performance over state-of-the-art grounding methods on three widely used datasets for temporal localization of moments in videos with natural language queries: ActivityNet-Captions, TACoS, and DiDeMo.

翻译：视频中的地面语言询问旨在确定与语言查询相关的时间间隔( 或时间) 。解决这一具有挑战性的任务需要理解视频和查询的语义内容, 以及对其多模式互动的精细推理。我们的关键想法是将这项挑战重新写成一个算法图表匹配问题。由图表神经网络的最近进步推动, 我们提议利用图表革命网络来模拟视频和文字信息, 以及它们的语义匹配。为了让跨域的信息交流成为可能, 我们设计了一个新颖的视频- Language图表匹配网络( VLG- Net) 来匹配视频和查询图表。核心要素包括演示图, 以视频片段和查询符号为顶端, 用于模拟内部模式关系。我们提议利用图表匹配层匹配层图层图为跨模式背景建模和多模式融合。最后, 我们通过利用时空功能添加时尚关注, 将时空匹配网络匹配网络匹配网络的功能匹配到视频和查询图表图表图表图表的高级性能。我们展示了用于州际平时段的图像。

1

相关内容

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

专知会员服务

30+阅读 · 2020年10月9日

【KDD2020教程】多模态网络表示学习

【KDD2020教程】多模态网络表示学习

专知会员服务

132+阅读 · 2020年8月26日

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

专知会员服务

12+阅读 · 2020年3月13日

【WWW2020-香港中文大学】MAGNN:异构图嵌入的集合图神经网络

【WWW2020-香港中文大学】MAGNN:异构图嵌入的集合图神经网络

专知会员服务

112+阅读 · 2020年2月13日

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

专知会员服务

116+阅读 · 2020年2月10日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

内涵网络嵌入：Content-rich Network Embedding

内涵网络嵌入：Content-rich Network Embedding

我爱读PAMI

4+阅读 · 2019年11月5日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

ActivityNet Challenge 2017 冠军方案分享

ActivityNet Challenge 2017 冠军方案分享

极市平台

4+阅读 · 2017年7月25日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

Arxiv

14+阅读 · 2020年12月22日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Arxiv

12+阅读 · 2020年8月11日

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

Arxiv

4+阅读 · 2020年7月17日

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Arxiv

3+阅读 · 2020年3月12日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Arxiv

4+阅读 · 2020年1月11日

Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network

Arxiv

15+阅读 · 2019年5月28日

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

Arxiv

6+阅读 · 2018年4月15日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Arxiv

5+阅读 · 2018年3月27日

VIP会员

文章信息

相关主题

相关VIP内容

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

【图神经网络多模态检索】Multi-Modal Retrieval using Graph Neural Networks

专知会员服务

30+阅读 · 2020年10月9日

【KDD2020教程】多模态网络表示学习

【KDD2020教程】多模态网络表示学习

专知会员服务

132+阅读 · 2020年8月26日

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，56页ppt，Neural Topological SLAM for Visual Navigation

专知会员服务

14+阅读 · 2020年6月18日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

专知会员服务

12+阅读 · 2020年3月13日

【WWW2020-香港中文大学】MAGNN:异构图嵌入的集合图神经网络

【WWW2020-香港中文大学】MAGNN:异构图嵌入的集合图神经网络

专知会员服务

112+阅读 · 2020年2月13日

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

【WWW2020-MAGNN】异质图嵌入的集合图神经网络 MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding

专知会员服务

116+阅读 · 2020年2月10日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《美国海军陆战队软件定义网络应用案例：分布式防火墙自动化系统》148页

《多体环境下定位导航授时（PNT）系统研究》228页

软件定义无线电（SDR）：商业与军事领域的技术、应用及未来趋势

《攻势防空作战中无人追击者/规避者最优轨迹研究（含动态交战区建模）》95页

相关资讯

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

内涵网络嵌入：Content-rich Network Embedding

内涵网络嵌入：Content-rich Network Embedding

我爱读PAMI

4+阅读 · 2019年11月5日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

《pyramid Attention Network for Semantic Segmentation》

《pyramid Attention Network for Semantic Segmentation》

统计学习与视觉计算组

44+阅读 · 2018年8月30日

ActivityNet Challenge 2017 冠军方案分享

ActivityNet Challenge 2017 冠军方案分享

极市平台

4+阅读 · 2017年7月25日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

Arxiv

14+阅读 · 2020年12月22日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Arxiv

12+阅读 · 2020年8月11日

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

Arxiv

4+阅读 · 2020年7月17日

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Arxiv

3+阅读 · 2020年3月12日

Visual Grounding in Video for Unsupervised Word Translation

Visual Grounding in Video for Unsupervised Word Translation

Arxiv

7+阅读 · 2020年3月11日

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Arxiv

4+阅读 · 2020年1月11日

Cross-lingual Knowledge Graph Alignment via Graph Matching Neural Network

Arxiv

15+阅读 · 2019年5月28日

Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

Arxiv

6+阅读 · 2018年4月15日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-Identification

Arxiv

5+阅读 · 2018年3月27日

微信扫码咨询专知VIP会员