HitT: 视频文本检索具有动态对比力的等级变异器 (HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval) - 专知论文

会员服务 ·

0

contrastive · 哈尔滨工业大学（HIT） · 变换 · INTERACT · 负例 ·

2021 年 8 月 18 日

HiT: Hierarchical Transformer with Momentum Contrast for Video-Text Retrieval

翻译：HitT: 视频文本检索具有动态对比力的等级变异器

Song Liu,Haoqi Fan,Shengsheng Qian,Yiru Chen,Wenkui Ding,Zhongyuan Wang

Video-Text Retrieval has been a hot research topic with the growth of multimedia data on the internet. Transformer for video-text learning has attracted increasing attention due to its promising performance. However, existing cross-modal transformer approaches typically suffer from two major limitations: 1) Exploitation of the transformer architecture where different layers have different feature characteristics is limited; 2) End-to-end training mechanism limits negative sample interactions in a mini-batch. In this paper, we propose a novel approach named Hierarchical Transformer (HiT) for video-text retrieval. HiT performs Hierarchical Cross-modal Contrastive Matching in both feature-level and semantic-level, achieving multi-view and comprehensive retrieval results. Moreover, inspired by MoCo, we propose Momentum Cross-modal Contrast for cross-modal learning to enable large-scale negative sample interactions on-the-fly, which contributes to the generation of more precise and discriminative representations. Experimental results on the three major Video-Text Retrieval benchmark datasets demonstrate the advantages of our method.

翻译：随着互联网多媒体数据的增长,视频-文字检索是一个热门的研究主题。视频-文字学习变异器因其有希望的性能而引起越来越多的关注。但是,现有的跨模式变异器方法通常受到两大限制:(1) 利用不同层次具有不同特征的变异器结构有限;(2) 端到端培训机制限制小型批量的负面抽样互动。在本文中,我们提议了一种名为“高层次变异器(HiT)”的新颖方法,用于视频-文字检索。HiT在地貌和语义层面都进行等级跨模式的交叉对比,实现多视图和综合检索结果。此外,在Moco的启发下,我们提议采用超模式跨模式学习,以促成大规模在飞地上进行负面的抽样互动,这有助于产生更精确、更具有歧视性的描述。三种主要视频-图像检索基准数据集的实验结果展示了我们的方法的优点。

7

相关内容

contrastive

【AAAI2021最佳论文】基于高效 Transformer 的长时间序列预测

【AAAI2021最佳论文】基于高效 Transformer 的长时间序列预测

专知会员服务

62+阅读 · 2021年2月6日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

专知会员服务

22+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Transformer中的相对位置编码

Transformer中的相对位置编码

AINLP

5+阅读 · 2020年11月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Decoupled Contrastive Learning

Arxiv

0+阅读 · 2021年10月13日

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations

Arxiv

10+阅读 · 2021年9月30日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Contrastive Triple Extraction with Generative Transformer

Arxiv

4+阅读 · 2020年12月15日

Contrastive Learning with Adversarial Examples

Arxiv

5+阅读 · 2020年10月22日

Hard Negative Mixing for Contrastive Learning

Arxiv

5+阅读 · 2020年10月2日

Self-supervised Learning: Generative or Contrastive

Arxiv

19+阅读 · 2020年7月21日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Contrastive Bidirectional Transformer for Temporal Representation Learning

Contrastive Bidirectional Transformer for Temporal Representation Learning

Arxiv

3+阅读 · 2019年6月13日

Star-Transformer

Star-Transformer

Arxiv

5+阅读 · 2019年2月28日

VIP会员

文章信息

相关主题

哈尔滨工业大学（HIT）

相关VIP内容

【AAAI2021最佳论文】基于高效 Transformer 的长时间序列预测

【AAAI2021最佳论文】基于高效 Transformer 的长时间序列预测

专知会员服务

62+阅读 · 2021年2月6日

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

【Google】监督对比学习，Supervised Contrastive Learning

【Google】监督对比学习，Supervised Contrastive Learning

专知会员服务

75+阅读 · 2020年4月24日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

【ICDAR2019教程】用于文档分析、文本识别和语言建模的深度学习（Deep Learning for Document Analysis, Text Recognition, and Language Modeling）

专知会员服务

22+阅读 · 2019年12月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

Transformer中的相对位置编码

Transformer中的相对位置编码

AINLP

5+阅读 · 2020年11月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Jointly Improving Summarization and Sentiment Classification

Jointly Improving Summarization and Sentiment Classification

黑龙江大学自然语言处理实验室

3+阅读 · 2018年6月12日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

相关论文

Decoupled Contrastive Learning

Arxiv

0+阅读 · 2021年10月13日

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations

Arxiv

10+阅读 · 2021年9月30日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Contrastive Triple Extraction with Generative Transformer

Arxiv

4+阅读 · 2020年12月15日

Contrastive Learning with Adversarial Examples

Arxiv

5+阅读 · 2020年10月22日

Hard Negative Mixing for Contrastive Learning

Arxiv

5+阅读 · 2020年10月2日

Self-supervised Learning: Generative or Contrastive

Arxiv

19+阅读 · 2020年7月21日

Text Summarization with Pretrained Encoders

Arxiv

5+阅读 · 2019年8月22日

Contrastive Bidirectional Transformer for Temporal Representation Learning

Contrastive Bidirectional Transformer for Temporal Representation Learning

Arxiv

3+阅读 · 2019年6月13日

Star-Transformer

Star-Transformer

Arxiv

5+阅读 · 2019年2月28日

微信扫码咨询专知VIP会员