Many self-supervised learning (SSL) methods have been successful in learning semantically meaningful visual representations by solving pretext tasks. However, prior work in SSL focuses on tasks like object recognition or detection, which aim to learn object shapes and assume that the features should be invariant to concepts like colors and textures. Thus, these SSL methods perform poorly on downstream tasks where these concepts provide critical information. In this paper, we present an SSL framework that enables us to learn color and texture-aware features without requiring any labels during training. Our approach consists of three self-supervised tasks designed to capture different concepts that are neglected in prior work that we can select from depending on the needs of our downstream tasks. Our tasks include learning to predict color histograms and discriminate shapeless local patches and textures from each instance. We evaluate our approach on fashion compatibility using Polyvore Outfits and In-Shop Clothing Retrieval using Deepfashion, improving upon prior SSL methods by 9.5-16%, and even outperforming some supervised approaches on Polyvore Outfits despite using no labels. We also show that our approach can be used for transfer learning, demonstrating that we can train on one dataset while achieving high performance on a different dataset.
翻译:许多自我监督的学习方法(SSL)通过解决托辞任务,成功地学习了语义上有意义的视觉表现。然而,SSL先前的工作侧重于对象识别或探测等任务,目的是学习对象形状,并假设这些特征对颜色和纹理等概念具有差异性。因此,这些自我监督的学习方法在这些概念提供关键信息的下游任务上表现不佳。在本文件中,我们提出了一个SSL框架,使我们能够学习颜色和纹理认知特征,而无需在培训中加任何标签。我们的方法包括三个自我监督任务,旨在捕捉在先前工作中被忽略的不同概念,这些概念可根据我们下游任务的需求而选择。我们的任务包括学习预测直方图的颜色并区分每个实例的无形状的本地补丁和纹理。我们用Polyvore Outfiget和In-Shoptomas Retrival来评估我们的时装兼容性方法,用Deepfasian改进了以前的SSL方法9.5-16 %,甚至超越了对多调的多功能方法,尽管没有使用高标签,但我们可以在高标签进行一项培训中学习。我们使用不同的数据。我们使用不同的数据。