基于图像与文本的行人重识别分层提示学习 (Hierarchical Prompt Learning for Image- and Text-Based Person Re-Identification)

Person re-identification (ReID) aims to retrieve target pedestrian images given either visual queries (image-to-image, I2I) or textual descriptions (text-to-image, T2I). Although both tasks share a common retrieval objective, they pose distinct challenges: I2I emphasizes discriminative identity learning, while T2I requires accurate cross-modal semantic alignment. Existing methods often treat these tasks separately, which may lead to representation entanglement and suboptimal performance. To address this, we propose a unified framework named Hierarchical Prompt Learning (HPL), which leverages task-aware prompt modeling to jointly optimize both tasks. Specifically, we first introduce a Task-Routed Transformer, which incorporates dual classification tokens into a shared visual encoder to route features for I2I and T2I branches respectively. On top of this, we develop a hierarchical prompt generation scheme that integrates identity-level learnable tokens with instance-level pseudo-text tokens. These pseudo-tokens are derived from image or text features via modality-specific inversion networks, injecting fine-grained, instance-specific semantics into the prompts. Furthermore, we propose a Cross-Modal Prompt Regularization strategy to enforce semantic alignment in the prompt token space, ensuring that pseudo-prompts preserve source-modality characteristics while enhancing cross-modal transferability. Extensive experiments on multiple ReID benchmarks validate the effectiveness of our method, achieving state-of-the-art performance on both I2I and T2I tasks.

翻译：行人重识别旨在通过视觉查询（图像到图像，I2I）或文本描述（文本到图像，T2I）检索目标行人图像。尽管这两项任务共享相同的检索目标，但各自面临独特挑战：I2I侧重于判别性身份学习，而T2I需要精确的跨模态语义对齐。现有方法通常将任务独立处理，可能导致表征纠缠与次优性能。为此，我们提出名为分层提示学习的统一框架，通过任务感知提示建模联合优化两项任务。具体而言，我们首先引入任务路由Transformer，将双分类令牌嵌入共享视觉编码器，分别路由至I2I与T2I分支。在此基础上，我们开发了分层提示生成方案，将身份级可学习令牌与实例级伪文本令牌相融合。这些伪令牌通过模态特定逆推网络从图像或文本特征中提取，向提示注入细粒度的实例特定语义。此外，我们提出跨模态提示正则化策略，在提示令牌空间强制语义对齐，确保伪提示在保持源模态特征的同时增强跨模态可迁移性。在多个行人重识别基准上的大量实验验证了本方法的有效性，在I2I与T2I任务上均取得了最先进的性能。