While fine-tuning based methods for few-shot object detection have achieved remarkable progress, a crucial challenge that has not been addressed well is the potential class-specific overfitting on base classes and sample-specific overfitting on novel classes. In this work we design a novel knowledge distillation framework to guide the learning of the object detector and thereby restrain the overfitting in both the pre-training stage on base classes and fine-tuning stage on novel classes. To be specific, we first present a novel Position-Aware Bag-of-Visual-Words model for learning a representative bag of visual words (BoVW) from a limited size of image set, which is used to encode general images based on the similarities between the learned visual words and an image. Then we perform knowledge distillation based on the fact that an image should have consistent BoVW representations in two different feature spaces. To this end, we pre-learn a feature space independently from the object detection, and encode images using BoVW in this space. The obtained BoVW representation for an image can be considered as distilled knowledge to guide the learning of object detector: the extracted features by the object detector for the same image are expected to derive the consistent BoVW representations with the distilled knowledge. Extensive experiments validate the effectiveness of our method and demonstrate the superiority over other state-of-the-art methods.
翻译:尽管基于微光天体探测的微调方法取得了显著进展,但一个尚未很好解决的关键挑战就是,在基础类和样本类中可能存在特定等级的超模,在新类中可能存在特定等级的超模,在这项工作中,我们设计了一个新的知识蒸馏框架,指导对天体探测器的学习,从而限制在基础类和新类微调阶段的培训前阶段的超模。具体来说,我们首先展示了一个创新的“位置-软件包-视觉-文字”模型,用于学习一组有限的图像集中具有代表性的视觉词包(BOVW),用于根据所学的视觉词和图像之间的相似性对一般图像进行编码。然后,我们根据图像在两个不同的特征空间中应具有一致的 BOVW 图像表达方式。我们为此,我们先将一个独立于天体探测的特质空间,然后在这个空间中使用BoVW 的编码图像。获得的 BoVW 表示方式可以被视为一种精细的知识,用以指导对连续的天体探测和图像的图像的高级分析方法,然后通过BOV 提取的图像的演示方法对图像进行相同的图像的图像的模拟的演示。