Instance-level image retrieval in fashion is a challenging issue owing to its increasing importance in real-scenario visual fashion search. Cross-domain fashion retrieval aims to match the unconstrained customer images as queries for photographs provided by retailers; however, it is a difficult task due to a wide range of consumer-to-shop (C2S) domain discrepancies and also considering that clothing image is vulnerable to various non-rigid deformations. To this end, we propose a novel multi-scale and multi-granularity feature learning network (MMFL-Net), which can jointly learn global-local aggregation feature representations of clothing images in a unified framework, aiming to train a cross-domain model for C2S fashion visual similarity. First, a new semantic-spatial feature fusion part is designed to bridge the semantic-spatial gap by applying top-down and bottom-up bidirectional multi-scale feature fusion. Next, a multi-branch deep network architecture is introduced to capture global salient, part-informed, and local detailed information, and extracting robust and discrimination feature embedding by integrating the similarity learning of coarse-to-fine embedding with the multiple granularities. Finally, the improved trihard loss, center loss, and multi-task classification loss are adopted for our MMFL-Net, which can jointly optimize intra-class and inter-class distance and thus explicitly improve intra-class compactness and inter-class discriminability between its visual representations for feature learning. Furthermore, our proposed model also combines the multi-task attribute recognition and classification module with multi-label semantic attributes and product ID labels. Experimental results demonstrate that our proposed MMFL-Net achieves significant improvement over the state-of-the-art methods on the two datasets, DeepFashion-C2S and Street2Shop.
翻译:由于在真实的视觉时装搜索中的重要性日益提高,因此在时装上恢复正文一级图像是一个具有挑战性的问题。 跨部时装检索旨在将未受限制的客户图像匹配为零售商提供的照片查询;然而,由于消费者到商店(C2S)的域差异范围很广,还考虑到服装图像易受各种非硬性变形的影响,因此这是一个艰巨的任务。 为此,我们提议建立一个新的多级和多级级功能学习网络(MMMFL-Net)新颖的多级和多级特征学习网络(MMMFL-Net)网络,它可以在一个统一的框架内共同学习全球的本地组合图像,旨在为C2级的时装图像进行多级化,旨在为C2级的时装图像训练一个多级的跨级图像模型;然而,一个新的语系-Stapal-spilal-spalalalalalal-alalalalalal-licalal-al-al-al-al-al-al-al-legal-lational-al-lational-lational-lation-lation-lational-lation-lation-lation-lation-lation-lation-lation-lation-lation-lational-lational-lational-lational-lational-lational-lational-lation-lational-lational-l-lation-lational-lation-lation-lational-lation-lational-lation-lation-lation-lation-lation-lation-lation-l-l-l-l-l-l-lation-lation-l-l-l-l-l-lation-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-lation-lation-lation-l-lal-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-lal-lal-lal-lal-l-l-lal-lal-l-