Deep Metric Learning trains a neural network to map input images to a lower-dimensional embedding space such that similar images are closer together than dissimilar images. When used for item retrieval, a query image is embedded using the trained model and the closest items from a database storing their respective embeddings are returned as the most similar items for the query. Especially in product retrieval, where a user searches for a certain product by taking a photo of it, the image background is usually not important and thus should not influence the embedding process. Ideally, the retrieval process always returns fitting items for the photographed object, regardless of the environment the photo was taken in. In this paper, we analyze the influence of the image background on Deep Metric Learning models by utilizing five common loss functions and three common datasets. We find that Deep Metric Learning networks are prone to so-called background bias, which can lead to a severe decrease in retrieval performance when changing the image background during inference. We also show that replacing the background of images during training with random background images alleviates this issue. Since we use an automatic background removal method to do this background replacement, no additional manual labeling work and model changes are required while inference time stays the same. Qualitative and quantitative analyses, for which we introduce a new evaluation metric, confirm that models trained with replaced backgrounds attend more to the main object in the image, benefitting item retrieval systems.
翻译:深磁学习训练一个神经网络, 将图像输入到一个低维嵌入空间, 这样类似的图像会比不同图像更加接近。 当用于项目检索时, 使用经过训练的模型嵌入一个查询图像, 存储各自嵌入内容的数据库中最接近的项目会作为最相似的查询项目返回。 特别是在产品检索中, 用户通过拍照搜索某个产品, 其图像背景通常并不重要, 因此不应影响嵌入过程。 理想的是, 检索过程总是返回被拍摄对象的合适项目, 不论照片是在何种环境中拍摄的。 在本文中, 我们利用五个常见的损失函数和三个共同数据集来分析深米学习模型中图像背景背景的影响。 我们发现, 深米学习网络容易出现所谓的背景偏差, 这可能导致在推断中改变图像背景时, 检索性表现会严重下降。 我们还表明, 以随机背景图像来取代图像的背景, 这一问题会有所缓解。 由于我们使用自动的背景移除方法来进行背景替换, 没有额外的手工标签, 而在新的图像检索中, 需要更精确地使用新的格式分析。