In recent years, Cross-Modal Retrieval (CMR) has made significant progress in the field of multi-modal analysis. However, since it is time-consuming and labor-intensive to collect large-scale and well-annotated data, the annotation of multi-modal data inevitably contains some noise. This will degrade the retrieval performance of the model. To tackle the problem, numerous robust CMR methods have been developed, including robust learning paradigms, label calibration strategies, and instance selection mechanisms. Unfortunately, they often fail to simultaneously satisfy model performance ceilings, calibration reliability, and data utilization rate. To overcome the limitations, we propose a novel robust cross-modal learning framework, namely Neighbor-aware Instance Refining with Noisy Labels (NIRNL). Specifically, we first propose Cross-modal Margin Preserving (CMP) to adjust the relative distance between positive and negative pairs, thereby enhancing the discrimination between sample pairs. Then, we propose Neighbor-aware Instance Refining (NIR) to identify pure subset, hard subset, and noisy subset through cross-modal neighborhood consensus. Afterward, we construct different tailored optimization strategies for this fine-grained partitioning, thereby maximizing the utilization of all available data while mitigating error propagation. Extensive experiments on three benchmark datasets demonstrate that NIRNL achieves state-of-the-art performance, exhibiting remarkable robustness, especially under high noise rates.
翻译:近年来,跨模态检索(CMR)在多模态分析领域取得了显著进展。然而,由于收集大规模且标注良好的数据耗时费力,多模态数据的标注不可避免地包含一些噪声。这将降低模型的检索性能。为解决该问题,研究者已开发出多种鲁棒的CMR方法,包括鲁棒学习范式、标签校准策略和实例选择机制。遗憾的是,这些方法往往难以同时满足模型性能上限、校准可靠性和数据利用率的要求。为克服这些局限,我们提出了一种新颖的鲁棒跨模态学习框架,即基于邻域感知的噪声标签实例精化(NIRNL)。具体而言,我们首先提出跨模态边界保持(CMP)来调整正负样本对之间的相对距离,从而增强样本对之间的区分度。随后,我们提出邻域感知实例精化(NIR),通过跨模态邻域一致性识别纯净子集、困难子集和噪声子集。接着,我们针对这种细粒度划分构建了不同的定制化优化策略,从而在缓解误差传播的同时最大化所有可用数据的利用率。在三个基准数据集上的大量实验表明,NIRNL实现了最先进的性能,展现出显著的鲁棒性,尤其在高噪声率条件下表现优异。