Cap4Video: Text-Video 检索的辅助性电容器能做什么? (Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?) - 专知论文

会员服务 ·

0

Performer · Branch · INTERACT · INFORMS · 视频描述生成（Video Caption） ·

2022 年 12 月 31 日

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

翻译：Cap4Video: Text-Video 检索的辅助性电容器能做什么?

Wenhao Wu,Haipeng Luo,Bo Fang,Jingdong Wang,Wanli Ouyang

from arxiv, Technical report

Most existing text-video retrieval methods focus on cross-modal matching between the visual content of offline videos and textual query sentences. However, in real scenarios, online videos are frequently accompanied by relevant text information such as titles, tags, and even subtitles, which can be utilized to match textual queries. This inspires us to generate associated captions from offline videos to help with existing text-video retrieval methods. To do so, we propose to use the zero-shot video captioner with knowledge of pre-trained web-scale models (e.g., CLIP and GPT-2) to generate captions for offline videos without any training. Given the captions, one question naturally arises: what can auxiliary captions do for text-video retrieval? In this paper, we present a novel framework Cap4Video, which makes use of captions from three aspects: i) Input data: The video and captions can form new video-caption pairs as data augmentation for training. ii) Feature interaction: We perform feature interaction between video and caption to yield enhanced video representations. iii) Output score: The Query-Caption matching branch can be complementary to the original Query-Video matching branch for text-video retrieval. We conduct thorough ablation studies to demonstrate the effectiveness of our method. Without any post-processing, our Cap4Video achieves state-of-the-art performance on MSR-VTT (51.4%), VATEX (66.6%), MSVD (51.8%), and DiDeMo (52.0%).

翻译：多数现有的文本视频检索方法侧重于离线视频和文本查询句的视觉内容之间的跨模式匹配。然而,在真实情况下,在线视频经常伴有标题、标签甚至字幕等相关文本信息,可用于匹配文本查询。这激励我们从离线视频生成相关标题,以帮助现有文本视频检索方法。为此,我们提议使用了解预先培训的网络规模模型(例如,CLIP和GPT-2)的零光视频字幕,为离线视频生成说明,而无需任何培训。鉴于标题,自然会出现一个问题:对文本视频检索来说,辅助标题可以做什么?在本文件中,我们提出了一个新的框架Cap4Video,它利用现有文本视频视频检索方法。输入数据:视频和字幕可以形成新的视频字幕配对,作为培训的数据增强。 (二) 功能互动:我们在视频和字幕之间进行特征互动,可以产生强化的视频演示。 3) 输出分分数:无视频视频视频检索到我们原始版本的文本检索方法。

0

相关内容

Performer

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

反Prelog规则羰基还原酶立体选择性识别分子机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

CAP70经调节PTEN磷酸化影响肾癌恶性表型的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Trx1/FOXO1信号通路调控肝癌耐药的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

长链非编码RNA-uc002mbe.2介导的HDACi凋亡效应及其在肝癌中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

姜黄素通过抑制MMP-2 酶切TRAIL促进DcR1阳性肺腺癌凋亡

国家自然科学基金

0+阅读 · 2012年12月31日

CapG在鼻咽癌微环境与鼻咽癌细胞相互作用中的功能及分子网络研究

国家自然科学基金

0+阅读 · 2012年12月31日

HIPK2在高糖介导足细胞损伤中调节机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属氧化物在羧基功能化离子液体中的溶解

国家自然科学基金

0+阅读 · 2011年12月31日

Wnt5A对人卵巢癌细胞化疗耐受性的影响及耐药相关机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

酪氨酸蛋白激酶Btk表达与激活的组蛋白乙酰化调节

国家自然科学基金

0+阅读 · 2008年12月31日

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Arxiv

1+阅读 · 2023年2月27日

A Multi-level Alignment Training Scheme for Video-and-Language Grounding

Arxiv

0+阅读 · 2023年2月27日

Deep Learning for Video-Text Retrieval: a Review

Arxiv

0+阅读 · 2023年2月24日

Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization

Arxiv

0+阅读 · 2023年2月23日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Deep Image Retrieval: A Survey

Arxiv

16+阅读 · 2021年1月27日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

视频描述生成（Video Caption）

相关VIP内容

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据驱动死亡：以色列AI战争机器如何锁定目标

【普林斯顿博士论文】通过以人为本的评估推动负责任的人工智能

ICML 2025 | BiAssemble: 双臂机器人几何拼合问题的协同可供性学习

ICML 2025杰出论文出炉：8篇获奖，南大研究者榜上有名

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

相关论文

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Arxiv

1+阅读 · 2023年2月27日

A Multi-level Alignment Training Scheme for Video-and-Language Grounding

Arxiv

0+阅读 · 2023年2月27日

Deep Learning for Video-Text Retrieval: a Review

Arxiv

0+阅读 · 2023年2月24日

Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization

Arxiv

0+阅读 · 2023年2月23日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Deep Image Retrieval: A Survey

Arxiv

16+阅读 · 2021年1月27日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

相关基金

反Prelog规则羰基还原酶立体选择性识别分子机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

CAP70经调节PTEN磷酸化影响肾癌恶性表型的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Trx1/FOXO1信号通路调控肝癌耐药的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

长链非编码RNA-uc002mbe.2介导的HDACi凋亡效应及其在肝癌中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

姜黄素通过抑制MMP-2 酶切TRAIL促进DcR1阳性肺腺癌凋亡

国家自然科学基金

0+阅读 · 2012年12月31日

CapG在鼻咽癌微环境与鼻咽癌细胞相互作用中的功能及分子网络研究

国家自然科学基金

0+阅读 · 2012年12月31日

HIPK2在高糖介导足细胞损伤中调节机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

金属氧化物在羧基功能化离子液体中的溶解

国家自然科学基金

0+阅读 · 2011年12月31日

Wnt5A对人卵巢癌细胞化疗耐受性的影响及耐药相关机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

酪氨酸蛋白激酶Btk表达与激活的组蛋白乙酰化调节

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员