LightningDOT: 实时图像检索实时图像-文本检索前训练视觉-图像嵌入器 (LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval)

Multimodal pre-training has propelled great advancement in vision-and-language research. These large-scale pre-trained models, although successful, fatefully suffer from slow inference speed due to enormous computation cost mainly from cross-modal attention in Transformer architecture. When applied to real-life applications, such latency and computation demand severely deter the practical use of pre-trained models. In this paper, we study Image-text retrieval (ITR), the most mature scenario of V+L application, which has been widely studied even prior to the emergence of recent pre-trained models. We propose a simple yet highly effective approach, LightningDOT that accelerates the inference time of ITR by thousands of times, without sacrificing accuracy. LightningDOT removes the time-consuming cross-modal attention by pre-training on three novel learning objectives, extracting feature indexes offline, and employing instant dot-product matching with further re-ranking, which significantly speeds up retrieval process. In fact, LightningDOT achieves new state of the art across multiple ITR benchmarks such as Flickr30k, COCO and Multi30K, outperforming existing pre-trained models that consume 1000x magnitude of computational hours. Code and pre-training checkpoints are available at https://github.com/intersun/LightningDOT.

翻译：培训前的多式培训推动了视觉和语言研究的巨大进步。这些大型的预先培训模型虽然成功,但最终会受到缓慢的推断速度的影响,因为计算成本巨大,主要来自变异器结构中的跨式关注。在应用到现实应用时,这种潜伏和计算需求严重妨碍了对预培训模型的实际使用。在本文中,我们研究V+L应用的最成熟的图像文本检索(ITR),这是甚至在最近经过培训的模型出现之前就已经广泛研究过的V+L应用的最成熟的情景。我们提出了一个简单但非常有效的方法,即闪电DOT在不牺牲准确性的情况下,将ITR的推断时间加速数千次。LightningDOT在三个新的学习目标的预培训中取消了耗时的跨式关注,提取了离线功能指数,并使用即时的 dot产品匹配进一步重新排序,大大加快了检索进程。事实上,LightningDOT在Flick30k、CO和MUFL30-D30-K等多个基准中实现了新的艺术状态,在1000级/Mex-comtrainal train trainmental codementalmental codestrational ammentalmentalmentalmentalmentalmentaldaldaldaldaldalbalgresmstrationaldalgalgalgresmmationalgalgalgalms。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日