Token Flow: 重新思考视野-语言检索中精细的跨模式协调 (TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval) - 专知论文

会员服务 ·

0

INFORMS · MoDELS · 相似度 · INTERACT · 特征向量 ·

2022 年 10 月 3 日

TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval

翻译：Token Flow: 重新思考视野-语言检索中精细的跨模式协调

Xiaohan Zou,Changqiao Wu,Lele Cheng,Zhongyuan Wang

Most existing methods in vision-language retrieval match two modalities by either comparing their global feature vectors which misses sufficient information and lacks interpretability, detecting objects in images or videos and aligning the text with fine-grained features which relies on complicated model designs, or modeling fine-grained interaction via cross-attention upon visual and textual tokens which suffers from inferior efficiency. To address these limitations, some recent works simply aggregate the token-wise similarities to achieve fine-grained alignment, but they lack intuitive explanations as well as neglect the relationships between token-level features and global representations with high-level semantics. In this work, we rethink fine-grained cross-modal alignment and devise a new model-agnostic formulation for it. We additionally demystify the recent popular works and subsume them into our scheme. Furthermore, inspired by optimal transport theory, we introduce TokenFlow, an instantiation of the proposed scheme. By modifying only the similarity function, the performance of our method is comparable to the SoTA algorithms with heavy model designs on major video-text retrieval benchmarks. The visualization further indicates that TokenFlow successfully leverages the fine-grained information and achieves better interpretability.

翻译：视觉-语言检索的大多数现有方法都与两种模式相匹配:要么比较缺乏足够信息且缺乏解释性的全球特征矢量,在图像或视频中探测对象,使文本与依赖复杂模型设计的精细雕刻特征相匹配,要么通过对低效率的视觉和文字象征的交叉关注进行微细的模拟互动。为了解决这些局限性,最近的一些工作只是汇总了象征性的相似之处,以达到细微的调整,但是它们缺乏直观的解释,并且忽视了象征性特征与高层次语义学的全球表现之间的关系。在这项工作中,我们重新思考了精细的跨模式对齐,并设计了一种新的模型-认知的配方。我们进一步解开最近流行的作品,并将其纳入我们的方案。此外,在最佳运输理论的启发下,我们引入了TokenFlowlow,这是拟议的办法的即即即即即是。通过只修改相似性功能,我们的方法的性能与SolowTA与主要视频-文字检索基准的重模型设计相似。视觉-视觉化进一步表明Token-Fregality 能够成功地理解。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

蛋白磷酸酶2A在NO供体诱导肝癌细胞凋亡中的调节作用

国家自然科学基金

0+阅读 · 2015年12月31日

肥胖基因FTO相互作用蛋白及下游信号分子的筛选与功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于人体姿态表示的动作识别方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

血小板电色谱法筛选典型活血化瘀中药活性成分研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子点标记A型核纤层蛋白前体及在衰老中的应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

人星状病毒非结构蛋白nsP1a C末端结构域致宿主细胞凋亡的机制

国家自然科学基金

0+阅读 · 2012年12月31日

Z-pin复合材料的空间非均匀纤维排布和残余应力

国家自然科学基金

0+阅读 · 2011年12月31日

携带凋亡素基因的靶向结肠癌的流感病毒载体研究

国家自然科学基金

0+阅读 · 2010年12月31日

高致病性禽流感病毒NS1相互作用蛋白的鉴定及功能验证

国家自然科学基金

0+阅读 · 2009年12月31日

岩藻糖基化海参硫酸软骨素结构的质谱分析和抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2008年12月31日

Rethinking Hierarchies in Pre-trained Plain Vision Transformer

Arxiv

0+阅读 · 2022年11月8日

Semantic Information Retrieval in Wireless Networks

Arxiv

0+阅读 · 2022年11月8日

Generative Transformers for Design Concept Generation

Arxiv

0+阅读 · 2022年11月7日

Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval

Arxiv

0+阅读 · 2022年11月7日

Progressive Denoising Model for Fine-Grained Text-to-Image Generation

Arxiv

0+阅读 · 2022年11月4日

Semantic Models for the First-stage Retrieval: A Comprehensive Review

Arxiv

19+阅读 · 2021年9月17日

Cross-Modal Discrete Representation Learning

Arxiv

18+阅读 · 2021年6月10日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

相关VIP内容

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

相关论文

Rethinking Hierarchies in Pre-trained Plain Vision Transformer

Arxiv

0+阅读 · 2022年11月8日

Semantic Information Retrieval in Wireless Networks

Arxiv

0+阅读 · 2022年11月8日

Generative Transformers for Design Concept Generation

Arxiv

0+阅读 · 2022年11月7日

Unified Loss of Pair Similarity Optimization for Vision-Language Retrieval

Arxiv

0+阅读 · 2022年11月7日

Progressive Denoising Model for Fine-Grained Text-to-Image Generation

Arxiv

0+阅读 · 2022年11月4日

Semantic Models for the First-stage Retrieval: A Comprehensive Review

Arxiv

19+阅读 · 2021年9月17日

Cross-Modal Discrete Representation Learning

Arxiv

18+阅读 · 2021年6月10日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

相关基金

蛋白磷酸酶2A在NO供体诱导肝癌细胞凋亡中的调节作用

国家自然科学基金

0+阅读 · 2015年12月31日

肥胖基因FTO相互作用蛋白及下游信号分子的筛选与功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于人体姿态表示的动作识别方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

血小板电色谱法筛选典型活血化瘀中药活性成分研究

国家自然科学基金

0+阅读 · 2012年12月31日

量子点标记A型核纤层蛋白前体及在衰老中的应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

人星状病毒非结构蛋白nsP1a C末端结构域致宿主细胞凋亡的机制

国家自然科学基金

0+阅读 · 2012年12月31日

Z-pin复合材料的空间非均匀纤维排布和残余应力

国家自然科学基金

0+阅读 · 2011年12月31日

携带凋亡素基因的靶向结肠癌的流感病毒载体研究

国家自然科学基金

0+阅读 · 2010年12月31日

高致病性禽流感病毒NS1相互作用蛋白的鉴定及功能验证

国家自然科学基金

0+阅读 · 2009年12月31日

岩藻糖基化海参硫酸软骨素结构的质谱分析和抗肿瘤活性研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员