Existing person re-identification (re-id) methods either assume the availability of well-aligned person bounding box images as model input or rely on constrained attention selection mechanisms to calibrate misaligned images. They are therefore sub-optimal for re-id matching in arbitrarily aligned person images potentially with large human pose variations and unconstrained auto-detection errors. In this work, we show the advantages of jointly learning attention selection and feature representation in a Convolutional Neural Network (CNN) by maximising the complementary information of different levels of visual attention subject to re-id discriminative learning constraints. Specifically, we formulate a novel Harmonious Attention CNN (HA-CNN) model for joint learning of soft pixel attention and hard regional attention along with simultaneous optimisation of feature representations, dedicated to optimise person re-id in uncontrolled (misaligned) images. Extensive comparative evaluations validate the superiority of this new HA-CNN model for person re-id over a wide variety of state-of-the-art methods on three large-scale benchmarks including CUHK03, Market-1501, and DukeMTMC-ReID.
翻译:在这项工作中,我们展示了在革命神经网络(CNN)中联合学习关注选择和特征代表的优势,办法是在重新定位的歧视性学习限制下,最大限度地利用不同层次视觉关注的补充信息。具体地说,我们制定了一个新的协调关注CNN(HA-CNN)模式,以共同学习软像素关注和硬区域关注,同时同时优化地貌表现,专门优化个人在不受控制的(误相)图像中重新定位的功能表现。广泛的比较评价证实了在三种大规模基准(包括CUHK03、Mock-1501和DukuzMMMMC-ReID)上,新的HA-CNN人重新定位模型对人重新定位的优越性。