具有历史记忆力的文本到图像部分意识的人重新身份认同自成一体网络 (Semantically Self-Aligned Network for Text-to-Image Part-aware Person Re-identification)

Text-to-image person re-identification (ReID) aims to search for images containing a person of interest using textual descriptions. However, due to the significant modality gap and the large intra-class variance in textual descriptions, text-to-image ReID remains a challenging problem. Accordingly, in this paper, we propose a Semantically Self-Aligned Network (SSAN) to handle the above problems. First, we propose a novel method that automatically extracts semantically aligned part-level features from the two modalities. Second, we design a multi-view non-local network that captures the relationships between body parts, thereby establishing better correspondences between body parts and noun phrases. Third, we introduce a Compound Ranking (CR) loss that makes use of textual descriptions for other images of the same identity to provide extra supervision, thereby effectively reducing the intra-class variance in textual features. Finally, to expedite future research in text-to-image ReID, we build a new database named ICFG-PEDES. Extensive experiments demonstrate that SSAN outperforms state-of-the-art approaches by significant margins. Both the new ICFG-PEDES database and the SSAN code are available at https://github.com/zifyloo/SSAN.

翻译：文本到图像人重新身份识别(ReID)的目的是利用文本描述来搜索含有受关注人的图像,然而,由于模式差异很大,而且文本到图像的描述存在巨大的阶级内部差异,文本到图像ReID仍然是一个具有挑战性的问题。因此,在本文件中,我们提议建立一个模拟自成一体的网络,以处理上述问题。首先,我们提出一种新颖的方法,从两种模式中自动提取语义一致的部位特征。第二,我们设计了一个多视非本地网络,捕捉身体部分之间的关系,从而在身体部分和名词词之间建立更好的对应。第三,我们引入一种复合排层(CR)损失,利用同一身份的其他图像的文字描述来提供额外的监督,从而有效减少语言特征中的阶级内部差异。最后,为了加快对文本到图像ReID的今后研究,我们建立了一个名为ICFG-PEDES的新数据库。广泛的实验表明,SSAN超越状态-艺术部分和名词词词词词词词词词组。我们引入了一个复合排行式分级(CRR)损失,而新的ICFGGG/MESAG/MEPEDEDER系统数据库都是可用的重要边距。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日