视频中时间语言本地化多模式互动图集变异网络 (Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos) - 专知论文

会员服务 ·

0

INTERACT · 图卷积神经网络/图卷积网络 · 矩 · 图卷积 · 可理解性 ·

2021 年 10 月 12 日

Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos

翻译：视频中时间语言本地化多模式互动图集变异网络

Zongmeng Zhang,Xianjing Han,Xuemeng Song,Yan Yan,Liqiang Nie

from arxiv, Accepted by IEEE Transactions on Image Processing

This paper focuses on tackling the problem of temporal language localization in videos, which aims to identify the start and end points of a moment described by a natural language sentence in an untrimmed video. However, it is non-trivial since it requires not only the comprehensive understanding of the video and sentence query, but also the accurate semantic correspondence capture between them. Existing efforts are mainly centered on exploring the sequential relation among video clips and query words to reason the video and sentence query, neglecting the other intra-modal relations (e.g., semantic similarity among video clips and syntactic dependency among the query words). Towards this end, in this work, we propose a Multi-modal Interaction Graph Convolutional Network (MIGCN), which jointly explores the complex intra-modal relations and inter-modal interactions residing in the video and sentence query to facilitate the understanding and semantic correspondence capture of the video and sentence query. In addition, we devise an adaptive context-aware localization method, where the context information is taken into the candidate moments and the multi-scale fully connected layers are designed to rank and adjust the boundary of the generated coarse candidate moments with different lengths. Extensive experiments on Charades-STA and ActivityNet datasets demonstrate the promising performance and superior efficiency of our model.

翻译：本文侧重于解决视频中时间语言本地化问题,目的是确定一个自然语言句在未经剪辑的视频中描述的一个时刻的起始点和终点点,然而,这是非三边的,因为它不仅需要全面理解视频和句子查询,而且需要准确的语义对应捕捉;现有努力主要侧重于探讨视频片段和查询词之间的相继关系,以说明视频和句子查询的理由,忽视其他模式内部关系(例如视频片段之间的语义相似性和查询词词的共通依赖性);为此,我们提议建立一个多模式互动图动画网络(MIGCN),共同探讨视频和句子查询中存在的复杂模式内部关系和模式互动,以便利理解和语义对应捕捉视频和句子查询。此外,我们设计了适应性的背景认识本地化方法,将背景信息带入候选时段和多尺度的完全关联层。为此,我们提议为此建立一个多模式的多模式,即互动动动动动画网络(MIGCN)网络,共同探索视频和句中复杂的内部关系和模式互动关系,调整了我们高层次和高额候选人演进度演算。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

专知会员服务

157+阅读 · 2019年11月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【ICDAR2019教程】计算机视觉中的文本形式，Vision and Language: the text modality in computer vision

【ICDAR2019教程】计算机视觉中的文本形式，Vision and Language: the text modality in computer vision

专知会员服务

25+阅读 · 2019年9月21日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

CVPR2019| 05-16更新10篇论文及代码合集（含一篇oral，全景分割/文本检测/目标检测等）

CVPR2019| 05-16更新10篇论文及代码合集（含一篇oral，全景分割/文本检测/目标检测等）

极市平台

14+阅读 · 2019年5月16日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

【荟萃】知识图谱论文与笔记

【荟萃】知识图谱论文与笔记

专知

71+阅读 · 2019年3月25日

人工智能 | AAAI 2019等国际会议信息7条

人工智能 | AAAI 2019等国际会议信息7条

Call4Papers

5+阅读 · 2018年9月3日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

计算机类 | 国际会议信息7条

计算机类 | 国际会议信息7条

Call4Papers

3+阅读 · 2017年11月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Lightweight Attentional Feature Fusion for Video Retrieval by Text

Arxiv

1+阅读 · 2021年12月3日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

Arxiv

7+阅读 · 2020年3月11日

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Arxiv

4+阅读 · 2020年1月11日

Graph Convolutional Networks for Temporal Action Localization

Arxiv

5+阅读 · 2019年9月7日

Multimodal Semantic Attention Network for Video Captioning

Arxiv

4+阅读 · 2019年5月8日

Convolutional Self-Attention Network

Arxiv

6+阅读 · 2019年4月8日

Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation

Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation

Arxiv

5+阅读 · 2018年12月6日

Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

Arxiv

5+阅读 · 2018年9月24日

Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Arxiv

3+阅读 · 2018年2月13日

VIP会员

文章信息

相关主题

图卷积神经网络/图卷积网络

相关VIP内容

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

【视频描述综述论文】Video Description: A Survey of Methods, Datasets, and Evaluation Metrics

专知会员服务

65+阅读 · 2020年5月12日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

【南洋理工大学课程】注意力神经网络，Attention Neural Networks，附78页PPT

专知会员服务

157+阅读 · 2019年11月9日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【ICDAR2019教程】计算机视觉中的文本形式，Vision and Language: the text modality in computer vision

【ICDAR2019教程】计算机视觉中的文本形式，Vision and Language: the text modality in computer vision

专知会员服务

25+阅读 · 2019年9月21日

热门VIP内容

开通专知VIP会员享更多权益服务

人工智能治理的未来

模态感知的特征匹配：单一模态与跨模态技术的全面综述

无监督行人重识别研究综述

【牛津博士论文】面向神经影像应用的可扩展且可解释的空间模型

相关资讯

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

CVPR2019| 05-16更新10篇论文及代码合集（含一篇oral，全景分割/文本检测/目标检测等）

CVPR2019| 05-16更新10篇论文及代码合集（含一篇oral，全景分割/文本检测/目标检测等）

极市平台

14+阅读 · 2019年5月16日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

【荟萃】知识图谱论文与笔记

【荟萃】知识图谱论文与笔记

专知

71+阅读 · 2019年3月25日

人工智能 | AAAI 2019等国际会议信息7条

人工智能 | AAAI 2019等国际会议信息7条

Call4Papers

5+阅读 · 2018年9月3日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

计算机类 | 国际会议信息7条

计算机类 | 国际会议信息7条

Call4Papers

3+阅读 · 2017年11月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

相关论文

Lightweight Attentional Feature Fusion for Video Retrieval by Text

Arxiv

1+阅读 · 2021年12月3日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions

Arxiv

7+阅读 · 2020年3月11日

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Arxiv

4+阅读 · 2020年1月11日

Graph Convolutional Networks for Temporal Action Localization

Arxiv

5+阅读 · 2019年9月7日

Multimodal Semantic Attention Network for Video Captioning

Arxiv

4+阅读 · 2019年5月8日

Convolutional Self-Attention Network

Arxiv

6+阅读 · 2019年4月8日

Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation

Stacked Spatio-Temporal Graph Convolutional Networks for Action Segmentation

Arxiv

5+阅读 · 2018年12月6日

Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

Arxiv

5+阅读 · 2018年9月24日

Modeling Text with Graph Convolutional Network for Cross-Modal Information Retrieval

Arxiv

3+阅读 · 2018年2月13日

微信扫码咨询专知VIP会员