Complementary fashion recommendation aims at identifying items from different categories (e.g. shirt, footwear, etc.) that "go well together" as an outfit. Most existing approaches learn representation for this task using labeled outfit datasets containing manually curated compatible item combinations. In this work, we propose to learn representations for compatibility prediction from in-the-wild street fashion images through self-supervised learning by leveraging the fact that people often wear compatible outfits. Our pretext task is formulated such that the representations of different items worn by the same person are closer compared to those worn by other people. Additionally, to reduce the domain gap between in-the-wild and catalog images during inference, we introduce an adversarial loss that minimizes the difference in feature distribution between the two domains. We conduct our experiments on two popular fashion compatibility benchmarks - Polyvore and Polyvore-Disjoint outfits, and outperform existing self-supervised approaches, particularly significant in cross-dataset setting where training and testing images are from different sources.
翻译:补充性时装建议旨在确定不同类别(例如衬衫、鞋类等)中“步调一致”的物品(例如衬衫、鞋类等),作为一种服饰。大多数现有办法都使用带有手工整理兼容物品组合的标签装饰数据集来学习这项任务的代表性。在这项工作中,我们提议通过利用人们经常穿戴兼容服装这一事实进行自我监督的学习,从在街上进行兼容性预测的演示,通过自我监督学习来学习。我们的托辞任务是,使同一人穿戴的不同物品的表示方式与其他人穿戴的不同物品的表示方式相比更加接近。此外,为了在推断中缩小电动和编目图像之间的域间差距,我们引入了一种对抗性损失,以尽量减少两个领域间地貌分布的差异。我们在两种流行的时装兼容性基准――聚vore和聚变不调服装上进行实验,并超越了现有的自我监督办法,特别是在交叉数据集中培训和测试图像来自不同来源的地方。