Given a natural language description, text-based person retrieval aims to identify images of a target person from a large-scale person image database. Existing methods generally face a \textbf{color over-reliance problem}, which means that the models rely heavily on color information when matching cross-modal data. Indeed, color information is an important decision-making accordance for retrieval, but the over-reliance on color would distract the model from other key clues (e.g. texture information, structural information, etc.), and thereby lead to a sub-optimal retrieval performance. To solve this problem, in this paper, we propose to \textbf{C}apture \textbf{A}ll-round \textbf{I}nformation \textbf{B}eyond \textbf{C}olor (\textbf{CAIBC}) via a jointly optimized multi-branch architecture for text-based person retrieval. CAIBC contains three branches including an RGB branch, a grayscale (GRS) branch and a color (CLR) branch. Besides, with the aim of making full use of all-round information in a balanced and effective way, a mutual learning mechanism is employed to enable the three branches which attend to varied aspects of information to communicate with and learn from each other. Extensive experimental analysis is carried out to evaluate our proposed CAIBC method on the CUHK-PEDES and RSTPReid datasets in both \textbf{supervised} and \textbf{weakly supervised} text-based person retrieval settings, which demonstrates that CAIBC significantly outperforms existing methods and achieves the state-of-the-art performance on all the three tasks.
翻译:根据自然语言描述,基于文本的人检索旨在从大型的人图像数据库中辨别目标人的图像 { 比例尺的人图像 。 现有的方法一般会面临 \ textbf{ color 过度依赖问题} 。 这意味着模型在匹配跨模式数据时大量依赖彩色信息。 事实上, 彩色信息是一项重要的决策, 符合检索要求, 但过度依赖彩色会转移模型从其他关键线索( 例如, 纹理信息、 结构信息等 ), 从而导致亚最佳的检索性能 。 为了解决这个问题, 在本文中, 我们提议要用\ textbff{ c; 现有的方法是:\ textfffrlor 来解决这个问题 。 CAIBC 包含三个分支和颜色( CLRK) 。 STLA: 将当前所有的信息都应用到 C- breadal 系统, 以及 C- breadal 中, 将所有的数据都用到 C- breal com 工具, 和 C- dreal sal comlevelyal sal ex ex ex 。