The practical value of existing supervised sketch-based image retrieval (SBIR) algorithms is largely limited by the requirement for intensive data collection and labeling. In this paper, we present the first attempt at unsupervised SBIR to remove the labeling cost (both category annotations and sketch-photo pairings) that is conventionally needed for training. Existing single-domain unsupervised representation learning methods perform poorly in this application, due to the unique cross-domain (sketch and photo) nature of the problem. We therefore introduce a novel framework that simultaneously performs sketch-photo domain alignment and semantic-aware representation learning. Technically this is underpinned by introducing joint distribution optimal transport (JDOT) to align data from different domains, which we extend with trainable cluster prototypes and feature memory banks to further improve scalability and efficacy. Extensive experiments show that our framework achieves excellent performance in the new unsupervised setting, and performs comparably or better than state-of-the-art in the zero-shot setting.
翻译:在本文中,我们首次尝试在未受监督的情况下将培训通常需要的标签成本(类别说明和草图-照片配对)去除。由于问题的独特跨领域(相片和照片)性质,现有单一领域未受监督的代表学习方法在这一应用中表现不佳。因此,我们引入了一个新颖的框架,同时进行草图-照片域对齐和语义-认知代表性学习。在技术上,这是通过采用联合分配最佳运输(JDOT)以统一不同领域的数据来支撑的,我们用可培训的集群原型和特征记忆库来推广,以进一步提高可缩放性和有效性。广泛的实验表明,我们的框架在新的未受监督环境中取得了出色的业绩,并且比零发环境中的状态有可比较性或更好。