与有条件嵌入的视频和文本匹配 (Video and Text Matching with Conditioned Embeddings) - 专知论文

会员服务 ·

0

INTERACT · state-of-the-art · SimPLe · 相互独立的 · HTTPS ·

2021 年 10 月 21 日

Video and Text Matching with Conditioned Embeddings

翻译：与有条件嵌入的视频和文本匹配

Ameen Ali,Idan Schwartz,Tamir Hazan,Lior Wolf

We present a method for matching a text sentence from a given corpus to a given video clip and vice versa. Traditionally video and text matching is done by learning a shared embedding space and the encoding of one modality is independent of the other. In this work, we encode the dataset data in a way that takes into account the query's relevant information. The power of the method is demonstrated to arise from pooling the interaction data between words and frames. Since the encoding of the video clip depends on the sentence compared to it, the representation needs to be recomputed for each potential match. To this end, we propose an efficient shallow neural network. Its training employs a hierarchical triplet loss that is extendable to paragraph/video matching. The method is simple, provides explainability, and achieves state-of-the-art results for both sentence-clip and video-text by a sizable margin across five different datasets: ActivityNet, DiDeMo, YouCook2, MSR-VTT, and LSMDC. We also show that our conditioned representation can be transferred to video-guided machine translation, where we improved the current results on VATEX. Source code is available at https://github.com/AmeenAli/VideoMatch.

翻译：我们提出了一个将文本句从给定内容到给定视频剪辑的匹配方法,反之亦然。传统的视频和文本匹配是通过学习共享嵌入空间和一种模式的编码独立于另一种模式来完成的。在这项工作中, 我们以考虑到查询相关信息的方式编码数据集数据。该方法的力量通过将文字和框架之间的交互数据集中起来而得到证明。由于视频剪辑的编码取决于该句与该句的比较, 每一个潜在匹配都需要重新配置其表达方式。为此, 我们建议建立一个高效的浅神经网络。其培训使用一个可扩展至段落/视频匹配的等级三重损失。方法简单, 提供解释性, 并且通过五个不同数据集( 活动网、 Diemo、 YouCook 2、 MSR- VTTT 和 LSMDC) 之间的一个可变宽的边距, 显示我们有条件的表达方式可以转换为视频制导导式的机器翻译, 从而改进了 VATEDA/ imevionA.

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

【因果基础】Causality Basics，36页ppt

专知会员服务

50+阅读 · 2021年8月8日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

17+阅读 · 2020年11月17日

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

专知会员服务

23+阅读 · 2020年3月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

专知会员服务

25+阅读 · 2019年12月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

90+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

143+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

12+阅读 · 2017年9月24日

Unsupervised Matching of Data and Text

Arxiv

0+阅读 · 2021年12月16日

Dense Video Captioning Using Unsupervised Semantic Information

Arxiv

1+阅读 · 2021年12月15日

iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability

Arxiv

17+阅读 · 2021年6月25日

Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval

Arxiv

10+阅读 · 2021年2月22日

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

Arxiv

4+阅读 · 2020年7月17日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning

Arxiv

6+阅读 · 2019年2月27日

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

Arxiv

4+阅读 · 2018年7月8日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Stacked Cross Attention for Image-Text Matching

Arxiv

3+阅读 · 2018年3月21日

VIP会员

文章信息

相关主题

state-of-the-art

相互独立的

相关VIP内容

【因果基础】Causality Basics，36页ppt

专知会员服务

50+阅读 · 2021年8月8日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

17+阅读 · 2020年11月17日

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

【SIGMOD2020】一个全面的主动学习方法的实体匹配基准框架，A Comprehensive Benchmark Framework for Active Learning Methods in Entity Matching

专知会员服务

23+阅读 · 2020年3月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

161+阅读 · 2020年3月18日

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

专知会员服务

25+阅读 · 2019年12月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

45+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

90+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

143+阅读 · 2019年10月12日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

39+阅读 · 2019年10月9日

热门VIP内容

相关资讯

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

23+阅读 · 2019年5月22日

TCN v2 + 3Dconv 运动信息

TCN v2 + 3Dconv 运动信息

CreateAMind

4+阅读 · 2019年1月8日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

17+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

41+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

16+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

12+阅读 · 2017年9月24日

相关论文

Unsupervised Matching of Data and Text

Arxiv

0+阅读 · 2021年12月16日

Dense Video Captioning Using Unsupervised Semantic Information

Arxiv

1+阅读 · 2021年12月15日

iReason: Multimodal Commonsense Reasoning using Videos and Natural Language with Interpretability

Arxiv

17+阅读 · 2021年6月25日

Graph-based Hierarchical Relevance Matching Signals for Ad-hoc Retrieval

Arxiv

10+阅读 · 2021年2月22日

Consensus-Aware Visual-Semantic Embedding for Image-Text Matching

Arxiv

4+阅读 · 2020年7月17日

Adversarial Representation Learning for Text-to-Image Matching

Adversarial Representation Learning for Text-to-Image Matching

Arxiv

6+阅读 · 2019年8月28日

Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning

Arxiv

6+阅读 · 2019年2月27日

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

Arxiv

4+阅读 · 2018年7月8日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Stacked Cross Attention for Image-Text Matching

Arxiv

3+阅读 · 2018年3月21日

微信扫码咨询专知VIP会员