只需问:学习从数以百万计的叙事影片中解答问题 (Just Ask: Learning to Answer Questions from Millions of Narrated Videos)

Modern approaches to visual question answering require large annotated datasets for training. Manual annotation of questions and answers for videos, however, is tedious, expensive and prevents scalability. In this work, we propose to avoid manual annotation and to learn video question answering (VideoQA) from millions of readily-available narrated videos. We propose to automatically generate question-answer pairs from transcribed video narrations leveraging a state-of-the-art text transformer pipeline and obtain a new large-scale VideoQA training dataset. To handle the open vocabulary of diverse answers in this dataset, we propose a training procedure based on a contrastive loss between a video-question multi-modal transformer and an answer embedding. We evaluate our model on the zero-shot VideoQA task and show excellent results, in particular for rare answers. Furthermore, we demonstrate that finetuning our model on target datasets significantly outperforms the state of the art on MSRVTT-QA, MSVD-QA and ActivityNet-QA. Finally, for a detailed evaluation we introduce a new manually annotated VideoQA dataset with reduced language biases and high quality annotations. Our code and datasets will be made publicly available at https://www.di.ens.fr/willow/research/just-ask/ .

翻译：视频解答的现代方法需要大量的附加说明的数据集。但是,视频解答的人工说明是乏味的,昂贵的,而且无法缩放。在这项工作中,我们建议避免人工注解,并从成百上千万个随时可得的解说视频中学习解答(VideoQA)视频解答(VideoQA)的视频。我们建议利用最先进的文本变压器管道,自动生成解答视频解析的问答配对,并获得新的大型视频QA培训数据集。为了处理该数据集中不同解答的公开词汇,我们提议了一个基于视频问题多式变换器和答案嵌入之间的对比损失的培训程序。我们评估了我们零发视频解答任务的模式,并展示了优异的答案。此外,我们展示了我们对目标数据集模型的细微调整,大大超越了MSRVTTT-QA、MSVDQA-QA和ActionNet-QA的艺术状态。最后,我们提出一个详细的评估程序,我们用新的手动性格式显示高压数据。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【RLChina2020公开课】Lecture-11.pdf【多智能体学习与游戏AI前沿】

专知会员服务

27+阅读 · 2020年8月6日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日