LocVTP: 视频-文字时间定位培训前培训 (LocVTP: Video-Text Pre-training for Temporal Localization) - 专知论文

会员服务 ·

0

Performer · Learning · contrastive · Extensibility · Projection ·

2022 年 7 月 21 日

LocVTP: Video-Text Pre-training for Temporal Localization

翻译：LocVTP: 视频-文字时间定位培训前培训

Meng Cao,Tianyu Yang,Junwu Weng,Can Zhang,Jue Wang,Yuexian Zou

from arxiv, Accepted by ECCV2022

Video-Text Pre-training (VTP) aims to learn transferable representations for various downstream tasks from large-scale web videos. To date, almost all existing VTP methods are limited to retrieval-based downstream tasks, e.g., video retrieval, whereas their transfer potentials on localization-based tasks, e.g., temporal grounding, are under-explored. In this paper, we experimentally analyze and demonstrate the incompatibility of current VTP methods with localization tasks, and propose a novel Localization-oriented Video-Text Pre-training framework, dubbed as LocVTP. Specifically, we perform the fine-grained contrastive alignment as a complement to the coarse-grained one by a clip-word correspondence discovery scheme. To further enhance the temporal reasoning ability of the learned feature, we propose a context projection head and a temporal aware contrastive loss to perceive the contextual relationships. Extensive experiments on four downstream tasks across six datasets demonstrate that our LocVTP achieves state-of-the-art performance on both retrieval-based and localization-based tasks. Furthermore, we conduct comprehensive ablation studies and thorough analyses to explore the optimum model designs and training strategies.

翻译：视频-视频预科培训(VTP)的目的是从大型网络视频中为各种下游任务学习可转移的代表性。迄今为止,几乎所有现有的VTP方法都局限于基于检索的下游任务,例如视频检索,而其在基于本地化的任务(例如时间定位)上的转移潜力则未得到充分探讨。在本文中,我们实验性地分析和展示目前VTP方法与本地化任务不相容,并提议一个新的本地化面向视频-版本预培训框架,称为LocVTP。具体地说,我们进行细微的对比性调整,以补充剪刀式通信发现计划粗略的下游任务。为了进一步提高所学特征的时间推理能力,我们提出了背景预测头和时间意识对比性损失,以了解背景关系。对六个数据集的四项下游任务进行的广泛实验表明,我们的LocVTP在基于本地化和基于本地化的任务中都取得了最新水平的绩效。此外,我们进行了全面的研究和深入分析,以探讨最佳的策略。

0

相关内容

Performer

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

无界区域椭圆型和抛物型偏微分方程的人工边界条件数值方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

Insulicolide A的全合成和结构优化

国家自然科学基金

0+阅读 · 2014年12月31日

Akt-mTOR-Snail信号通路诱导肺动脉高压内皮细胞向间充质细胞转化的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

LPA-LPAR1信号通路调控放射性肺损伤的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Nonlocal的MRI脑肿瘤图像分割方法的研究

国家自然科学基金

0+阅读 · 2014年12月31日

无界区域最优控制问题的无限元方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

EAST装置上主动控制电阻壁模的数值研究

国家自然科学基金

0+阅读 · 2013年12月31日

多功能5’三磷酸化siRNA靶向治疗恶性肿瘤：联合谷氨酰胺酶沉默与RIG-I信号通路激活

国家自然科学基金

0+阅读 · 2012年12月31日

退化k-Hessian方程解的正则性研究

国家自然科学基金

0+阅读 · 2011年12月31日

调节性T细胞对脓毒症免疫抑制的影响—#8212;膜结合TGFβ#20381;赖促凋亡机制

国家自然科学基金

0+阅读 · 2009年12月31日

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

Arxiv

0+阅读 · 2022年9月15日

Adaptive Perception Transformer for Temporal Action Localization

Arxiv

0+阅读 · 2022年9月15日

Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

Arxiv

0+阅读 · 2022年9月15日

Exploring Visual Interpretability for Contrastive Language-Image Pre-training

Arxiv

0+阅读 · 2022年9月15日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Arxiv

16+阅读 · 2021年5月26日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

VIP会员

文章信息

相关主题

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

相关论文

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

OmniVL:One Foundation Model for Image-Language and Video-Language Tasks

Arxiv

0+阅读 · 2022年9月15日

Adaptive Perception Transformer for Temporal Action Localization

Arxiv

0+阅读 · 2022年9月15日

Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

Arxiv

0+阅读 · 2022年9月15日

Exploring Visual Interpretability for Contrastive Language-Image Pre-training

Arxiv

0+阅读 · 2022年9月15日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey

Arxiv

16+阅读 · 2021年5月26日

Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling

Arxiv

10+阅读 · 2021年2月11日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

相关基金

无界区域椭圆型和抛物型偏微分方程的人工边界条件数值方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

Insulicolide A的全合成和结构优化

国家自然科学基金

0+阅读 · 2014年12月31日

Akt-mTOR-Snail信号通路诱导肺动脉高压内皮细胞向间充质细胞转化的分子机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

LPA-LPAR1信号通路调控放射性肺损伤的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Nonlocal的MRI脑肿瘤图像分割方法的研究

国家自然科学基金

0+阅读 · 2014年12月31日

无界区域最优控制问题的无限元方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

EAST装置上主动控制电阻壁模的数值研究

国家自然科学基金

0+阅读 · 2013年12月31日

多功能5’三磷酸化siRNA靶向治疗恶性肿瘤：联合谷氨酰胺酶沉默与RIG-I信号通路激活

国家自然科学基金

0+阅读 · 2012年12月31日

退化k-Hessian方程解的正则性研究

国家自然科学基金

0+阅读 · 2011年12月31日

调节性T细胞对脓毒症免疫抑制的影响—#8212;膜结合TGFβ#20381;赖促凋亡机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员