Discriminative representation is essential to keep a unique identifier for each target in Multiple object tracking (MOT). Some recent MOT methods extract features of the bounding box region or the center point as identity embeddings. However, when targets are occluded, these coarse-grained global representations become unreliable. To this end, we propose exploring diverse fine-grained representation, which describes appearance comprehensively from global and local perspectives. This fine-grained representation requires high feature resolution and precise semantic information. To effectively alleviate the semantic misalignment caused by indiscriminate contextual information aggregation, Flow Alignment FPN (FAFPN) is proposed for multi-scale feature alignment aggregation. It generates semantic flow among feature maps from different resolutions to transform their pixel positions. Furthermore, we present a Multi-head Part Mask Generator (MPMG) to extract fine-grained representation based on the aligned feature maps. Multiple parallel branches of MPMG allow it to focus on different parts of targets to generate local masks without label supervision. The diverse details in target masks facilitate fine-grained representation. Eventually, benefiting from a Shuffle-Group Sampling (SGS) training strategy with positive and negative samples balanced, we achieve state-of-the-art performance on MOT17 and MOT20 test sets. Even on DanceTrack, where the appearance of targets is extremely similar, our method significantly outperforms ByteTrack by 5.0% on HOTA and 5.6% on IDF1. Extensive experiments have proved that diverse fine-grained representation makes Re-ID great again in MOT.
翻译:多用途跟踪(MOT)中,对每个目标保持独特的识别特征至关重要。一些最新的MOT方法提取了约束框区域或中心点的特征,作为身份嵌入。然而,当目标被隐蔽时,这些粗粗粗的全球性表示变得不可靠。为此,我们提议探索各种细微代表,从全球和地方角度全面描述外观。这种细微代表需要高特征分辨率分辨率和准确的语义信息。为了有效减轻不加区分的背景信息汇总造成的语义错位,建议多尺度的特征校正组合采用流动调整 FPN(FFFFPN)方法。当目标被隐蔽时,这些粗粗粗的全球性表示方式将产生语义流,从而改变其像素位置。此外,我们提出一个多头部分遮掩模模型(MPMMMMMG),让目标的不同部分在无标签监督的情况下产生本地面面面面面面面部。 目标面罩中的各种细节有助于微缩缩缩缩图,最终从不同分辨率图表中产生语义流出。我们通过大幅的模缩缩缩缩缩缩缩缩图,从而实现了超缩缩缩缩图。</s>