The huge domain gap between sketches and photos and the highly abstract sketch representations pose challenges for sketch-based image retrieval (\underline{SBIR}). The zero-shot sketch-based image retrieval (\underline{ZS-SBIR}) is more generic and practical but poses an even greater challenge because of the additional knowledge gap between the seen and unseen categories. To simultaneously mitigate both gaps, we propose an \textbf{A}pproaching-and-\textbf{C}entralizing \textbf{Net}work (termed "\textbf{ACNet}") to jointly optimize sketch-to-photo synthesis and the image retrieval. The retrieval module guides the synthesis module to generate large amounts of diverse photo-like images which gradually approach the photo domain, and thus better serve the retrieval module than ever to learn domain-agnostic representations and category-agnostic common knowledge for generalizing to unseen categories. These diverse images generated with retrieval guidance can effectively alleviate the overfitting problem troubling concrete category-specific training samples with high gradients. We also discover the use of proxy-based NormSoftmax loss is effective in the zero-shot setting because its centralizing effect can stabilize our joint training and promote the generalization ability to unseen categories. Our approach is simple yet effective, which achieves state-of-the-art performance on two widely used ZS-SBIR datasets and surpasses previous methods by a large margin.
翻译:草图和照片之间的巨大领域差距以及高度抽象的草图表达形式对基于草图的图像检索(\ underline{SBIR})提出了挑战。 零点草图的图像检索(\ underline{S-SBIR})更加通用和实用,但由于在可见类别和不可见类别之间存在更多的知识差距,因此构成更大的挑战。 为了同时缩小这两个差距,我们提议了一个\ textbf{A}( proppaching-and-\ textb{C}}) 快速调整 \ textb{{Net}work (termed“ textbf{ACNet}”), 以便共同优化草图到照片合成和图像检索。 零点草图的图像检索模块指导合成模块生成大量不同的照片相似图像,这些图像逐渐接近于光域域域,从而比以往更好的服务模块更好地学习域- 不可知异域表达和类别共同知识。 这些通过检索指导生成的图像可以有效地适应问题, 以高梯度为具体类别的培训样本。 我们还发现使用基于代理图到图像的大规模软件的图像合成合成合成合成的图像合成方法, 将我们用来实现常规的常规化的常规分析, 的常规分析, 实现常规的常规分析, 的常规的常规分析方法可以实现常规的常规的两种方法可以有效。