Off-the-shelf convolutional neural network features achieve outstanding results in many image retrieval tasks. However, their invariance to target data is pre-defined by the network architecture and training data. Existing image retrieval approaches require fine-tuning or modification of pre-trained networks to adapt to variations unique to the target data. In contrast, our method enhances the invariance of off-the-shelf features by aggregating features extracted from images augmented at test-time, with augmentations guided by a policy learned through reinforcement learning. The learned policy assigns different magnitudes and weights to the selected transformations, which are selected from a list of image transformations. Policies are evaluated using a metric learning protocol to learn the optimal policy. The model converges quickly and the cost of each policy iteration is minimal as we propose an off-line caching technique to greatly reduce the computational cost of extracting features from augmented images. Experimental results on large trademark retrieval (METU trademark dataset) and landmark retrieval (ROxford5k and RParis6k scene datasets) tasks show that the learned ensemble of transformations is highly effective for improving performance, and is practical, and transferable.
翻译:现有图像检索方法要求对预先培训的网络进行微调或修改,以适应目标数据特有的变异。相比之下,我们的方法则通过将测试时放大的图像所提取的特征汇总起来,并辅之以通过强化学习所学习的政策而获得的增强。学习的政策为从图像转换列表中选定的选定变异(从图像转换列表中挑选出)分配了不同大小和重量。对政策进行了评估,使用一个计量学习协议来学习最佳政策。模型快速趋同,每项政策变异的成本微乎其微,因为我们提议了一种离线缩技术,以大幅降低从增强的图像中提取变异的计算成本。大型商标检索(METU商标数据集)和标志检索(ROxford5k和RPa6k现场数据集)的实验结果显示,学到的变异组合对于改进性、可转让性、可转让性和可转让性都非常有效。