We investigate composed image retrieval with text feedback. Users gradually look for the target of interest by moving from coarse to fine-grained feedback. However, existing methods merely focus on the latter, i.e, fine-grained search, by harnessing positive and negative pairs during training. This pair-based paradigm only considers the one-to-one distance between a pair of specific points, which is not aligned with the one-to-many coarse-grained retrieval process and compromises the recall rate. In an attempt to fill this gap, we introduce a unified learning approach to simultaneously modeling the coarse- and fine-grained retrieval by considering the multi-grained uncertainty. The key idea underpinning the proposed method is to integrate fine- and coarse-grained retrieval as matching data points with small and large fluctuations, respectively. Specifically, our method contains two modules: uncertainty modeling and uncertainty regularization. (1) The uncertainty modeling simulates the multi-grained queries by introducing identically distributed fluctuations in the feature space. (2) Based on the uncertainty modeling, we further introduce uncertainty regularization to adapt the matching objective according to the fluctuation range. Compared with existing methods, the proposed strategy explicitly prevents the model from pushing away potential candidates in the early stage, and thus improves the recall rate. On the three public datasets, i.e., FashionIQ, Fashion200k, and Shoes, the proposed method has achieved +4.03%, + 3.38%, and + 2.40% Recall@50 accuracy over a strong baseline, respectively.
翻译:我们用文本反馈来调查图像检索。 用户通过从粗糙到细细的反馈逐渐寻找感兴趣的目标。 但是, 现有方法仅仅侧重于后者, 即精细的搜索, 在培训期间使用正对和负对对。 这种双向模式只考虑一对特定点之间的一到一距离, 这与一到多粗粗的检索进程不相符, 并会降低回调率。 为了填补这一空白, 我们引入了一种统一的学习方法, 以同时模拟粗略和细微的回补, 同时考虑到多重的不确定性。 但是, 现有方法的主要理念是将精细和粗粗粗的检索分别结合成小和大波动的数据点。 具体地说, 我们的方法包含两个模块: 不确定性模型和不确定性规范。 (1) 不确定性模拟多粗重的查询, 在特性空间中引入相同的分布波动。 (2) 基于不确定性模型, 我们进一步引入不确定性+精准的Q+精细的检索模型, 从多重的精确度模型到匹配目标的F- 3。 将精细的检索作为基础, 分别用于推进F- k 的F- 和精确的推移算, 因此, 将现有方法, 防止了现有方法的回调。