利用视觉语言前培训的文字潜力进行基于文字的人搜索</s> (Exploiting the Textual Potential from Vision-Language Pre-training for Text-based Person Search)

Text-based Person Search (TPS), is targeted on retrieving pedestrians to match text descriptions instead of query images. Recent Vision-Language Pre-training (VLP) models can bring transferable knowledge to downstream TPS tasks, resulting in more efficient performance gains. However, existing TPS methods improved by VLP only utilize pre-trained visual encoders, neglecting the corresponding textual representation and breaking the significant modality alignment learned from large-scale pre-training. In this paper, we explore the full utilization of textual potential from VLP in TPS tasks. We build on the proposed VLP-TPS baseline model, which is the first TPS model with both pre-trained modalities. We propose the Multi-Integrity Description Constraints (MIDC) to enhance the robustness of the textual modality by incorporating different components of fine-grained corpus during training. Inspired by the prompt approach for zero-shot classification with VLP models, we propose the Dynamic Attribute Prompt (DAP) to provide a unified corpus of fine-grained attributes as language hints for the image modality. Extensive experiments show that our proposed TPS framework achieves state-of-the-art performance, exceeding the previous best method by a margin.

翻译：以文字为基础的个人搜索(TPS),目标是重新找行人,以匹配文本描述而不是查询图像。最近的VV-Language 培训前前(VLP)模型可以将知识转移给下游TPS任务,从而产生更有效的绩效收益。然而,VLP改进的现有TPS方法只使用经过预先培训的视觉编码器,忽视相应的文本代表器,并打破从大规模培训前培训中吸取的重要模式调整。在本文中,我们探索在TPS任务中充分利用VLP的文本潜力。我们以拟议的VLP-TPS基线模型为基础,这是第一个具有预先培训模式的TPS模型。我们建议采用多特征描述限制(MIDC),以加强文本模式的稳健性,在培训期间纳入微分体的不同组成部分。我们建议采用与VLPP模式零分分分分解的迅速方法,我们提议采用动态属性提示(DAP),以统一的精细属性组合作为图像模式的语言提示。我们提出的TPS-MLS框架将实现最佳业绩,范围超过我们提议的TPS-MS-MLUT-MLUTUT。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日