Person re-identification aims to retrieve persons in highly varying settings across different cameras and scenarios, in which robust and discriminative representation learning is crucial. Most research considers learning representations from single images, ignoring any potential interactions between them. However, due to the high intra-identity variations, ignoring such interactions typically leads to outlier features. To tackle this issue, we propose a Neighbor Transformer Network, or NFormer, which explicitly models interactions across all input images, thus suppressing outlier features and leading to more robust representations overall. As modelling interactions between enormous amount of images is a massive task with lots of distractors, NFormer introduces two novel modules, the Landmark Agent Attention, and the Reciprocal Neighbor Softmax. Specifically, the Landmark Agent Attention efficiently models the relation map between images by a low-rank factorization with a few landmarks in feature space. Moreover, the Reciprocal Neighbor Softmax achieves sparse attention to relevant -- rather than all -- neighbors only, which alleviates interference of irrelevant representations and further relieves the computational burden. In experiments on four large-scale datasets, NFormer achieves a new state-of-the-art. The code is released at \url{https://github.com/haochenheheda/NFormer}.
翻译:个人再身份确认旨在在不同摄像头和场景的高度不同环境中检索不同环境中的人,在不同的摄影机和场景中,强有力和有区别的代表性学习至关重要。大多数研究都考虑从单一图像中学习,忽视他们之间的任何潜在互动。然而,由于身份内部差异很大,忽视这种互动通常会导致异常特征。为了解决这一问题,我们提议建立一个邻里变异器网络,即NFormer,明确模拟所有输入图像之间的相互作用,从而抑制外部特征,导致总体更强有力的演示。由于大量图像之间的建模互动是许多分散器的庞大任务,NFormer引入了两个新型模块,即大地标记代理注意和对等的奈博软体。具体地说,地貌标志标记代理高效地标识别图像之间的模型是低级因子化,在地貌空间有几个地标。此外,对相关近邻的模拟只引起很少注意,而不是全部 -- -- 邻居,这缓解了不相干的演示的干扰,并进一步减轻了计算负担。在四个大型数据库的实验中,NFormamermals-mamas。