与混合-移动查询进行图像检索渐进学习 (Progressive Learning for Image Retrieval with Hybrid-Modality Queries)

Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR), is a retrieval task where the search intention is expressed in a more complex query format, involving both vision and text modalities. For example, a target product image is searched using a reference product image along with text about changing certain attributes of the reference image as the query. It is a more challenging image retrieval task that requires both semantic space learning and cross-modal fusion. Previous approaches that attempt to deal with both aspects achieve unsatisfactory performance. In this paper, we decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries. We first leverage the semantic embedding space for open-domain image-text retrieval, and then transfer the learned knowledge to the fashion-domain with fashion-related pre-training tasks. Finally, we enhance the pre-trained model from single-query to hybrid-modality query for the CTI-IR task. Furthermore, as the contribution of individual modality in the hybrid-modality query varies for different retrieval scenarios, we propose a self-supervised adaptive weighting strategy to dynamically determine the importance of image and text in the hybrid-modality query for better retrieval. Extensive experiments show that our proposed model significantly outperforms state-of-the-art methods in the mean of Recall@K by 24.9% and 9.5% on the Fashion-IQ and Shoes benchmark datasets respectively.

翻译：混合式图像检索混合式图像查询混合式图像检索混合式空间学习和交叉式聚合更具挑战性的图像检索任务先前试图处理这两个方面的方法都取得了不令人满意的业绩。在本文中,我们将 CTI-IR 任务分解成一个三阶段学习问题,以逐步学习复杂的图像检索知识,同时进行混合式查询和文本模式查询。例如,使用参考产品图像搜索目标产品图像,同时使用将参考图像的某些属性改变为查询的文本。这是一个更具挑战性的图像检索任务,既需要语义空间学习,也需要跨式混合式图像检索。我们试图处理这两个方面的方法都取得了不令人满意的业绩。此外,我们把CTI-IR 任务分解成一个三阶段学习问题,以逐步学习关于以混合式模式检索图像的复杂知识,同时进行混合式图像检索,我们首先利用静态嵌入空间图像检索,然后将所学知识转移到时装版与时装相关的培训前任务。最后,我们为CTI-IR 任务强化了从单式到混合式调式调调调调调调调调调调调调。此外,作为混合式的单个模式在混合- 混合- IM质调调调校平质调调调调调调制24级战略中, 调整调整型重度调重度调调调调调调调重度战略中,我们提出的自我调调调调调调调调调调调调调调调调制的系统,我们提议调调调调调调调调调调调调调调调调制的系统,我们调调调制的系统,我们分别提出调调制的自调制调制的调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调调调调调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制的调制