Existing data-dependent hashing methods use large backbone networks with millions of parameters and are computationally complex. Existing knowledge distillation methods use logits and other features of the deep (teacher) model and as knowledge for the compact (student) model, which requires the teacher's network to be fine-tuned on the context in parallel with the student model on the context. Training teacher on the target context requires more time and computational resources. In this paper, we propose context unaware knowledge distillation that uses the knowledge of the teacher model without fine-tuning it on the target context. We also propose a new efficient student model architecture for knowledge distillation. The proposed approach follows a two-step process. The first step involves pre-training the student model with the help of context unaware knowledge distillation from the teacher model. The second step involves fine-tuning the student model on the context of image retrieval. In order to show the efficacy of the proposed approach, we compare the retrieval results, no. of parameters and no. of operations of the student models with the teacher models under different retrieval frameworks, including deep cauchy hashing (DCH) and central similarity quantization (CSQ). The experimental results confirm that the proposed approach provides a promising trade-off between the retrieval results and efficiency. The code used in this paper is released publicly at \url{https://github.com/satoru2001/CUKDFIR}.
翻译:现有知识蒸馏方法使用深(教师)模式的逻辑和其他特征,并用作紧凑(学生)模式的知识,该模式要求教师网络与学生模型平行地对背景进行微调; 目标背景下的培训师需要更多的时间和计算资源; 本文建议,在使用教师模型知识而不对目标背景进行微调的情况下,进行背景不知情的蒸馏; 我们还提出一个新的高效学生模型结构,用于知识蒸馏; 拟议的方法遵循两步进程; 第一步是先对学生模型进行培训,同时借助教师模型的不知情知识蒸馏; 第二步是在图像检索方面对学生模型进行微调; 为了显示拟议方法的功效,我们将学生模型的检索结果、参数和操作的无与不同检索框架下的教师模型进行比较,包括深度Cacyhing(DCH)和核心版本的回收结果; 在公开文件中,在使用这一模拟/核心版本的回收结果中,提供了一种相似的结果。