We present an analysis of embeddings extracted from different pre-trained models for content-based image retrieval. Specifically, we study embeddings from image classification and object detection models. We discover that even with additional human annotations such as bounding boxes and segmentation masks, the discriminative power of the embeddings based on modern object detection models is significantly worse than their classification counterparts for the retrieval task. At the same time, our analysis also unearths that object detection model can help retrieval task by acting as a hard attention module for extracting object embeddings that focus on salient region from the convolutional feature map. In order to efficiently extract object embeddings, we introduce a simple guided student-teacher training paradigm for learning discriminative embeddings within the object detection framework. We support our findings with strong experimental results.
翻译:我们分析了从不同培训前的图像检索模型中提取的嵌入内容。 具体地说, 我们研究图像分类和对象探测模型中的嵌入内容。 我们发现,即使增加了人文说明,例如捆绑盒和分割面罩,基于现代天体探测模型的嵌入的歧视性力量也大大低于其在检索任务的分类对应方。 同时, 我们的分析还发现, 对象探测模型可以帮助检索任务, 作为一种硬性关注模块, 将主要物体嵌入于古代地貌图中的显著区域。 为了有效地提取对象嵌入内容, 我们引入了一个简单的师生指导培训模式, 用于学习在对象探测框架内的歧视性嵌入。 我们以强大的实验结果支持我们的发现。