为视频和语言学习推出单一框架比亚 (Revealing Single Frame Bias for Video-and-Language Learning) - 专知论文

会员服务 ·

0

Learning · Performer · 有偏 · MoDELS · INFORMS ·

2022 年 6 月 7 日

Revealing Single Frame Bias for Video-and-Language Learning

翻译：为视频和语言学习推出单一框架比亚

Jie Lei,Tamara L. Berg,Mohit Bansal

from arxiv, 19 pages, 8 figures

Training an effective video-and-language model intuitively requires multiple frames as model inputs. However, it is unclear whether using multiple frames is beneficial to downstream tasks, and if yes, whether the performance gain is worth the drastically-increased computation and memory costs resulting from using more frames. In this work, we explore single-frame models for video-and-language learning. On a diverse set of video-and-language tasks (including text-to-video retrieval and video question answering), we show the surprising result that, with large-scale pre-training and a proper frame ensemble strategy at inference time, a single-frame trained model that does not consider temporal information can achieve better performance than existing methods that use multiple frames for training. This result reveals the existence of a strong "static appearance bias" in popular video-and-language datasets. Therefore, to allow for a more comprehensive evaluation of video-and-language models, we propose two new retrieval tasks based on existing fine-grained action recognition datasets that encourage temporal modeling. Our code is available at https://github.com/jayleicn/singularity

翻译：培训一个有效的视频和语言模型,直观地要求多个框架作为模型投入。然而,尚不清楚使用多个框架是否有利于下游任务,如果是的话,则尚不清楚使用多个框架是否值得大幅提高的计算和记忆成本。在这项工作中,我们探索了视频和语言学习的单一框架模式。在一套不同的视频和语言任务(包括文本到视频检索和视频问答)中,我们展示出一个令人惊讶的结果,即:如果在推论时间采用大规模培训前和适当的框架组合战略,一个单一框架培训的模型不考虑时间信息能够比使用多个框架进行培训的现有方法取得更好的业绩。这一结果显示,在流行的视频和语言数据集中存在着强烈的“静态外观偏差 ” 。因此,为了能够更全面地评价视频和语言模型,我们提议了两项新的检索任务,其依据是现有的精细的识别动作数据集,鼓励时间模型。我们的代码可在 https://github.com/jayeleicn/singalityality。

0

相关内容

Learning

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

基于聚合物-石英复合光纤结构的轨道角动量产生和传输的研究

国家自然科学基金

0+阅读 · 2015年12月31日

网络编码在MIMO多向中继信道中的优化设计与性能分析

国家自然科学基金

0+阅读 · 2014年12月31日

面向物理层网络编码通信的多进制LDPC码的编码调制设计

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

环形自激振荡射流涡激空化效应及对卷吸性能的影响机理

国家自然科学基金

0+阅读 · 2012年12月31日

准循环LDPC码校验矩阵的秩和冗余行分析及码的性能优化

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

XML个性化协作搜索及其在社会网络服务中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

针刺对脑缺血损伤和保护机制相关信号蛋白磷酸化作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

三维片上网络（3D NoC）关键技术研究

国家自然科学基金

1+阅读 · 2008年12月31日

Self-supervised Video-centralised Transformer for Video Face Clustering

Self-supervised Video-centralised Transformer for Video Face Clustering

Arxiv

0+阅读 · 2022年7月22日

SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos

Arxiv

0+阅读 · 2022年7月21日

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

Arxiv

0+阅读 · 2022年7月21日

VideoDG: Generalizing Temporal Relations in Videos to Novel Domains

Arxiv

14+阅读 · 2021年9月17日

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Arxiv

36+阅读 · 2021年5月27日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

From Superpixel to Human Shape Modelling for Carried Object Detection

Arxiv

10+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】基于物理的模拟

流匹配在生物学与生命科学中的应用综述

高质量数据集实践指南（1.0）

ICML 2025 关于语言模型机械可解释性的教程

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

视频超分辨 Detail-revealing Deep Video Super-resolution 论文笔记

统计学习与视觉计算组

17+阅读 · 2018年3月16日

相关论文

Self-supervised Video-centralised Transformer for Video Face Clustering

Self-supervised Video-centralised Transformer for Video Face Clustering

Arxiv

0+阅读 · 2022年7月22日

SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos

Arxiv

0+阅读 · 2022年7月21日

NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition

Arxiv

0+阅读 · 2022年7月21日

VideoDG: Generalizing Temporal Relations in Videos to Novel Domains

Arxiv

14+阅读 · 2021年9月17日

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future

Arxiv

36+阅读 · 2021年5月27日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

From Superpixel to Human Shape Modelling for Carried Object Detection

Arxiv

10+阅读 · 2018年1月10日

相关基金

基于聚合物-石英复合光纤结构的轨道角动量产生和传输的研究

国家自然科学基金

0+阅读 · 2015年12月31日

网络编码在MIMO多向中继信道中的优化设计与性能分析

国家自然科学基金

0+阅读 · 2014年12月31日

面向物理层网络编码通信的多进制LDPC码的编码调制设计

国家自然科学基金

0+阅读 · 2013年12月31日

采用pinball loss的MEE算法研究

国家自然科学基金

1+阅读 · 2013年12月31日

环形自激振荡射流涡激空化效应及对卷吸性能的影响机理

国家自然科学基金

0+阅读 · 2012年12月31日

准循环LDPC码校验矩阵的秩和冗余行分析及码的性能优化

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

XML个性化协作搜索及其在社会网络服务中的应用

国家自然科学基金

0+阅读 · 2011年12月31日

针刺对脑缺血损伤和保护机制相关信号蛋白磷酸化作用研究

国家自然科学基金

0+阅读 · 2009年12月31日

三维片上网络（3D NoC）关键技术研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员