Most existing distance metric learning approaches use fully labeled data to learn the sample similarities in an embedding space. We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data. We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data. We then train a student model on both labels and pseudo labels to generate final feature embeddings. We use self-supervised representation learning to initialize the teacher model. To better deal with noisy pseudo labels generated by the teacher network, we design a new feature basis learning component for the student network, which learns basis functions of feature representations for unlabeled data. The learned basis vectors better measure the pairwise similarity and are used to select high-confident samples for training the student network. We evaluate our method on standard retrieval benchmarks: CUB-200, Cars-196 and In-shop. Experimental results demonstrate that our approach significantly improves the performance over the state-of-the-art methods.
翻译:大多数现有的远程计量学习方法都使用完全贴标签的数据来学习嵌入空间的样本相似之处。我们提出了一个自我培训框架,即SLADE,通过利用额外的未贴标签数据来改进检索性能。我们首先在标签数据上培训教师模型,然后用它来生成未贴标签数据的假标签。然后在标签和假标签上培训学生模型,以产生最后的特征嵌入。我们用自监督的演示学习来启动教师模型。为了更好地处理教师网络产生的杂音假标签,我们为学生网络设计了新的特征学习组件,学习未贴标签数据特征显示的基础功能。学习的基基向矢量更好地测量对称相似性,并用于选择学生网络培训的高自信样本。我们评估标准检索基准的方法:CUB-200、Cars-196和In-shop。实验结果表明,我们的方法大大改进了最新方法的性能。