For person re-identification, existing deep networks often focus on representation learning. However, without transfer learning, the learned model is fixed as is, which is not adaptable for handling various unseen scenarios. In this paper, beyond representation learning, we consider how to formulate person image matching directly in deep feature maps. We treat image matching as finding local correspondences in feature maps, and construct query-adaptive convolution kernels on the fly to achieve local matching. In this way, the matching process and results are interpretable, and this explicit matching is more generalizable than representation features to unseen scenarios, such as unknown misalignments, pose or viewpoint changes. To facilitate end-to-end training of this architecture, we further build a class memory module to cache feature maps of the most recent samples of each class, so as to compute image matching losses for metric learning. Through direct cross-dataset evaluation, the proposed Query-Adaptive Convolution (QAConv) method gains large improvements over popular learning methods (about 10\%+ mAP), and achieves comparable results to many transfer learning methods. Besides, a model-free temporal cooccurrence based score weighting method called TLift is proposed, which improves the performance to a further extent, achieving state-of-the-art results in cross-dataset person re-identification. Code is available at \url{https://github.com/ShengcaiLiao/QAConv}.
翻译:对于个人再身份,现有的深层网络往往侧重于代表性学习。然而,如果不进行转移学习,那么所学的模型就是固定的,无法适应处理各种看不见的情景。在本文中,除了代表学习之外,我们考虑如何在深度地貌地图中直接匹配个人图像。我们把图像匹配视为在地貌地图中找到本地通信,并在飞行上建立调适式的共变内核以达到本地匹配。通过这种方式,匹配过程和结果是可以解释的,而这种明确的匹配比代表特征更普遍,与未知的错配、组合或观点变化等隐蔽情景相适应。为了便利这一架构的端至端培训,我们进一步构建一个班级记忆模块,以隐藏每个班级最新样本的特征地图,从而将图像匹配损失归为基准学习。通过直接的交叉数据评估,拟议的Query-Adaptalvial(QAConv)方法比普通学习方法(约10 ⁇ /maliaAP)大改进,并实现许多转移学习方法的类似结果。此外,一个模型-fle-o-corual-al-lax-al-laxal-de-deal-deal-deal-de-de-deal-de-deal-dealismalismal-stalismal-stal-stalismal