Person search aims to jointly localize and identify a query person from natural, uncropped images, which has been actively studied over the past few years. In this paper, we delve into the rich context information globally and locally surrounding the target person, which we refer to as scene and group context, respectively. Unlike previous works that treat the two types of context individually, we exploit them in a unified global-local context network (GLCNet) with the intuitive aim of feature enhancement. Specifically, re-ID embeddings and context features are simultaneously learned in a multi-stage fashion, ultimately leading to enhanced, discriminative features for person search. We conduct the experiments on two person search benchmarks (i.e., CUHK-SYSU and PRW) as well as extend our approach to a more challenging setting (i.e., character search on MovieNet). Extensive experimental results demonstrate the consistent improvement of the proposed GLCNet over the state-of-the-art methods on all three datasets. Our source codes, pre-trained models, and the new dataset are publicly available at: https://github.com/ZhengPeng7/GLCNet.
翻译:个人搜索的目的是共同定位和识别来自自然、未标定图像的查询人,过去几年来对此进行了积极研究。在本文中,我们研究了全球和当地围绕目标人的丰富背景信息,我们分别称之为现场和群体背景。与以往分别处理两种类型背景的工作不同,我们利用这些背景进行统一的全球-地方背景网络(GLCNet),其直观目的是增强特征。具体地说,重新ID嵌入和上下文特征同时以多阶段方式学习,最终导致强化的、区别性特征供人搜索。我们用两种个人搜索基准(即CUHK-SYSU和PRW)进行实验,并将我们的方法扩大到更具挑战性的环境(即对MovicNet的字符搜索)。广泛的实验结果表明,拟议的GLCNet对所有三个数据集的状态-艺术方法都进行了一致的改进。我们的源代码、预先培训模式和新数据集公开提供:https://githhub.com/ZhengPng/LC。</s>