视频问答学习对象中心代表 (Object-Centric Representation Learning for Video Question Answering) - 专知论文

会员服务 ·

0

自动问答 · Neural Networks · 表示学习 · 学成 · 语义鸿沟 ·

2021 年 7 月 9 日

Object-Centric Representation Learning for Video Question Answering

翻译：视频问答学习对象中心代表

Long Hoang Dang,Thao Minh Le,Vuong Le,Truyen Tran

from arxiv, Accepted by IJCNN 2021

Video question answering (Video QA) presents a powerful testbed for human-like intelligent behaviors. The task demands new capabilities to integrate video processing, language understanding, binding abstract linguistic concepts to concrete visual artifacts, and deliberative reasoning over spacetime. Neural networks offer a promising approach to reach this potential through learning from examples rather than handcrafting features and rules. However, neural networks are predominantly feature-based - they map data to unstructured vectorial representation and thus can fall into the trap of exploiting shortcuts through surface statistics instead of true systematic reasoning seen in symbolic systems. To tackle this issue, we advocate for object-centric representation as a basis for constructing spatio-temporal structures from videos, essentially bridging the semantic gap between low-level pattern recognition and high-level symbolic algebra. To this end, we propose a new query-guided representation framework to turn a video into an evolving relational graph of objects, whose features and interactions are dynamically and conditionally inferred. The object lives are then summarized into resumes, lending naturally for deliberative relational reasoning that produces an answer to the query. The framework is evaluated on major Video QA datasets, demonstrating clear benefits of the object-centric approach to video reasoning.

翻译：视频解答(Video QA)是对人类类似智能行为的强大测试点。这项任务要求具备新的能力,将视频处理、语言理解、将抽象语言概念与具体视觉文物联系在一起的抽象语言概念和空间时间的思考推理结合起来。神经网络为通过从实例而不是手工艺特征和规则中学习来发掘这一潜力提供了很有希望的方法。然而,神经网络主要以地貌为基础 — — 它们将数据映射成无结构的矢量表达方式,从而可能陷入通过表面统计来利用捷径的陷阱,而不是在象征性系统中看到的真正的系统推理。为了解决这一问题,我们倡导以对象中心为代表,作为从视频中建立spatio-时间结构的基础,从根本上弥补低层次模式识别和高层次象征性代数代数之间的语义差距。为此,我们提议一个新的查询制代表框架,将视频转换成不断变化的对象关系图,其特征和互动是动态和有条件的推断。然后,物体生命被归纳为再现,自然地用于从视频中得出对查询对象的答案的思考性推理。框架在主要视频数据推理学上被评估。

0

相关内容

自动问答

自动问答（Question Answering, QA）是指利用计算机自动回答用户所提出的问题以满足用户知识需求的任务。不同于现有搜索引擎，问答系统是信息服务的一种高级形式，系统返回用户的不再是基于关键词匹配排序的文档列表，而是精准的自然语言答案。近年来，随着人工智能的飞速发展，自动问答已经成为倍受关注且发展前景广泛的研究方向。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

【CVPR2020】自监督的深度视觉测程与在线适应，Self-Supervised Deep Visual Odometry

【CVPR2020】自监督的深度视觉测程与在线适应，Self-Supervised Deep Visual Odometry

专知会员服务

32+阅读 · 2020年5月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

机器人开发库软件大列表

机器人开发库软件大列表

专知

10+阅读 · 2018年3月18日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

SORNet: Spatial Object-Centric Representations for Sequential Manipulation

Arxiv

0+阅读 · 2021年9月8日

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Arxiv

5+阅读 · 2020年12月14日

Relation-aware Graph Attention Network for Visual Question Answering

Arxiv

4+阅读 · 2019年3月29日

Learning Visual Question Answering by Bootstrapping Hard Attention

Learning Visual Question Answering by Bootstrapping Hard Attention

Arxiv

4+阅读 · 2018年8月1日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Reciprocal Attention Fusion for Visual Question Answering

Arxiv

5+阅读 · 2018年5月11日

IQA: Visual Question Answering in Interactive Environments

Arxiv

5+阅读 · 2018年4月5日

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

Arxiv

4+阅读 · 2018年4月3日

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Arxiv

6+阅读 · 2018年1月24日

VQA: Visual Question Answering

Arxiv

9+阅读 · 2016年10月27日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

【CVPR2020】视觉导航的神经拓扑SLAM，Neural Topological SLAM for Visual Navigation

专知会员服务

52+阅读 · 2020年5月26日

【CVPR2020】自监督的深度视觉测程与在线适应，Self-Supervised Deep Visual Odometry

【CVPR2020】自监督的深度视觉测程与在线适应，Self-Supervised Deep Visual Odometry

专知会员服务

32+阅读 · 2020年5月14日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

【芝加哥大学】GRAPH-BERT: Only Attention is Needed for Learning Graph Representations

专知会员服务

85+阅读 · 2020年1月15日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

STRCF for Visual Object Tracking

STRCF for Visual Object Tracking

统计学习与视觉计算组

15+阅读 · 2018年5月29日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

机器人开发库软件大列表

机器人开发库软件大列表

专知

10+阅读 · 2018年3月18日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

SORNet: Spatial Object-Centric Representations for Sequential Manipulation

Arxiv

0+阅读 · 2021年9月8日

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Self-supervised pre-training and contrastive representation learning for multiple-choice video QA

Arxiv

5+阅读 · 2020年12月14日

Relation-aware Graph Attention Network for Visual Question Answering

Arxiv

4+阅读 · 2019年3月29日

Learning Visual Question Answering by Bootstrapping Hard Attention

Learning Visual Question Answering by Bootstrapping Hard Attention

Arxiv

4+阅读 · 2018年8月1日

R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering

Arxiv

7+阅读 · 2018年5月24日

Reciprocal Attention Fusion for Visual Question Answering

Arxiv

5+阅读 · 2018年5月11日

IQA: Visual Question Answering in Interactive Environments

Arxiv

5+阅读 · 2018年4月5日

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

Arxiv

4+阅读 · 2018年4月3日

Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering

Arxiv

6+阅读 · 2018年1月24日

VQA: Visual Question Answering

Arxiv

9+阅读 · 2016年10月27日

微信扫码咨询专知VIP会员