带有语义关注的连接网络,用于在视频中提及表达式理解 (Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos) - 专知论文

会员服务 ·

0

可约的 · 注意力机制 · 学成 · Networking · Performer ·

2021 年 3 月 23 日

Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos

翻译：带有语义关注的连接网络,用于在视频中提及表达式理解

Sijie Song,Xudong Lin,Jiaying Liu,Zongming Guo,Shih-Fu Chang

from arxiv, Accepted to CVPR2021. The project page is at https://sijiesong.github.io/co-grounding

In this paper, we address the problem of referring expression comprehension in videos, which is challenging due to complex expression and scene dynamics. Unlike previous methods which solve the problem in multiple stages (i.e., tracking, proposal-based matching), we tackle the problem from a novel perspective, \textbf{co-grounding}, with an elegant one-stage framework. We enhance the single-frame grounding accuracy by semantic attention learning and improve the cross-frame grounding consistency with co-grounding feature learning. Semantic attention learning explicitly parses referring cues in different attributes to reduce the ambiguity in the complex expression. Co-grounding feature learning boosts visual feature representations by integrating temporal correlation to reduce the ambiguity caused by scene dynamics. Experiment results demonstrate the superiority of our framework on the video grounding datasets VID and LiOTB in generating accurate and stable results across frames. Our model is also applicable to referring expression comprehension in images, illustrated by the improved performance on the RefCOCO dataset. Our project is available at https://sijiesong.github.io/co-grounding.

翻译：在本文中,我们处理在视频中提及表达理解的问题,由于复杂的表达方式和场景动态,这一问题具有挑战性。与以前在多个阶段(即跟踪、基于建议书的匹配)解决问题的方法不同,我们从新颖的角度,即\textbf{co-rosuring},用一个优雅的单阶段框架来解决这个问题。我们通过语义关注学习来提高单框架定位准确性,并改进与共同地基特征学习的跨框架基础一致性。语义关注学习以不同属性明确提供线索,以减少复杂表达的模糊性。共同地基学习功能通过整合时间相关性来增强视觉特征表现,以减少场景动态造成的模糊性。实验结果显示了我们在视频地基数据集VID和LOTB上生成准确和稳定结果的优势。我们的模型还适用于在图像中提及表达理解,通过RefCO数据集的改进性能来说明。我们的项目可在https://sijiesong.github.io/co-grofring查阅。

0

相关内容

可约的

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

专知会员服务

9+阅读 · 2020年4月17日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

专知会员服务

28+阅读 · 2019年12月27日

【AAAI2020论文】小样本网络压缩，Few Shot Network Compression via Cross Distillation (附pdf）

专知会员服务

26+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【Github2.2K星】PyTorch资源列表：450个NLP/CV/SP、论文实现、教程、示例

【Github2.2K星】PyTorch资源列表：450个NLP/CV/SP、论文实现、教程、示例

新智元

6+阅读 · 2018年10月22日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

BranchOut: Regularization for Online Ensemble Tracking with CNN

BranchOut: Regularization for Online Ensemble Tracking with CNN

统计学习与视觉计算组

9+阅读 · 2017年10月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

Learning to Relate Depth and Semantics for Unsupervised Domain Adaptation

Arxiv

1+阅读 · 2021年5月17日

Leveraging Semantic Scene Characteristics and Multi-Stream Convolutional Architectures in a Contextual Approach for Video-Based Visual Emotion Recognition in the Wild

Arxiv

0+阅读 · 2021年5月16日

Cross-Modal Progressive Comprehension for Referring Segmentation

Arxiv

1+阅读 · 2021年5月15日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Arxiv

8+阅读 · 2020年12月7日

Comprehensive Image Captioning via Scene Graph Decomposition

Arxiv

9+阅读 · 2020年7月23日

Relation-Aware Graph Attention Network for Visual Question Answering

Relation-Aware Graph Attention Network for Visual Question Answering

Arxiv

7+阅读 · 2019年10月9日

Dynamic Graph Attention for Referring Expression Comprehension

Dynamic Graph Attention for Referring Expression Comprehension

Arxiv

6+阅读 · 2019年9月18日

Progressive Pose Attention Transfer for Person Image Generation

Progressive Pose Attention Transfer for Person Image Generation

Arxiv

4+阅读 · 2019年5月13日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

VIP会员

文章信息

相关主题

注意力机制

相关VIP内容

【AAAI2021】对比聚类，Contrastive Clustering

【AAAI2021】对比聚类，Contrastive Clustering

专知会员服务

78+阅读 · 2021年1月30日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

专知会员服务

9+阅读 · 2020年4月17日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

【论文推荐】不同图像域弱监督语义分割的综合分析，A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains

专知会员服务

28+阅读 · 2019年12月27日

【AAAI2020论文】小样本网络压缩，Few Shot Network Compression via Cross Distillation (附pdf）

专知会员服务

26+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《后勤保障》最新23页

《可持续创新之路：可组合系统构建军事技术新生态》

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【Github2.2K星】PyTorch资源列表：450个NLP/CV/SP、论文实现、教程、示例

【Github2.2K星】PyTorch资源列表：450个NLP/CV/SP、论文实现、教程、示例

新智元

6+阅读 · 2018年10月22日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

BranchOut: Regularization for Online Ensemble Tracking with CNN

BranchOut: Regularization for Online Ensemble Tracking with CNN

统计学习与视觉计算组

9+阅读 · 2017年10月7日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Learning to Relate Depth and Semantics for Unsupervised Domain Adaptation

Arxiv

1+阅读 · 2021年5月17日

Leveraging Semantic Scene Characteristics and Multi-Stream Convolutional Architectures in a Contextual Approach for Video-Based Visual Emotion Recognition in the Wild

Arxiv

0+阅读 · 2021年5月16日

Cross-Modal Progressive Comprehension for Referring Segmentation

Arxiv

1+阅读 · 2021年5月15日

Reasoning in Dialog: Improving Response Generation by Context Reading Comprehension

Arxiv

12+阅读 · 2020年12月14日

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Arxiv

8+阅读 · 2020年12月7日

Comprehensive Image Captioning via Scene Graph Decomposition

Arxiv

9+阅读 · 2020年7月23日

Relation-Aware Graph Attention Network for Visual Question Answering

Relation-Aware Graph Attention Network for Visual Question Answering

Arxiv

7+阅读 · 2019年10月9日

Dynamic Graph Attention for Referring Expression Comprehension

Dynamic Graph Attention for Referring Expression Comprehension

Arxiv

6+阅读 · 2019年9月18日

Progressive Pose Attention Transfer for Person Image Generation

Progressive Pose Attention Transfer for Person Image Generation

Arxiv

4+阅读 · 2019年5月13日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

微信扫码咨询专知VIP会员