Source-free cross-modal knowledge transfer is a crucial yet challenging task, which aims to transfer knowledge from one source modality (e.g., RGB) to the target modality (e.g., depth or infrared) with no access to the task-relevant (TR) source data due to memory and privacy concerns. A recent attempt leverages the paired task-irrelevant (TI) data and directly matches the features from them to eliminate the modality gap. However, it ignores a pivotal clue that the paired TI data could be utilized to effectively estimate the source data distribution and better facilitate knowledge transfer to the target modality. To this end, we propose a novel yet concise framework to unlock the potential of paired TI data for enhancing source-free cross-modal knowledge transfer. Our work is buttressed by two key technical components. Firstly, to better estimate the source data distribution, we introduce a Task-irrelevant data-Guided Modality Bridging (TGMB) module. It translates the target modality data (e.g., infrared) into the source-like RGB images based on paired TI data and the guidance of the available source model to alleviate two key gaps: 1) inter-modality gap between the paired TI data; 2) intra-modality gap between TI and TR target data. We then propose a Task-irrelevant data-Guided Knowledge Transfer (TGKT) module that transfers knowledge from the source model to the target model by leveraging the paired TI data. Notably, due to the unavailability of labels for the TR target data and its less reliable prediction from the source model, our TGKT model incorporates a self-supervised pseudo-labeling approach to enable the target model to learn from its predictions. Extensive experiments show that our method achieves state-of-the-art performance on three datasets (RGB-to-depth and RGB-to-infrared).
翻译:暂无翻译