多式低温多式小热物体探测与基于元的以学习为基础的跨模式提示 (Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting)

We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection, which are complementary to each other by definition. Most of the previous works on multi-modal FSOD are fine-tuning-based which are inefficient for online applications. Moreover, these methods usually require expertise like class names to extract class semantic embedding, which are hard to get for rare classes. Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning to learn generalizable few-shot and zero-shot object detection models respectively without fine-tuning. Specifically, we combine the few-shot visual classifier and text classifier learned via meta-learning and prompt-based learning respectively to build the multi-modal classifier and detection models. In addition, to fully exploit the pre-trained language models, we propose meta-learning-based cross-modal prompting to generate soft prompts for novel classes present in few-shot visual examples, which are then used to learn the text classifier. Knowledge distillation is introduced to learn the soft prompt generator without using human prior knowledge of class names, which may not be available for rare classes. Our insight is that the few-shot support images naturally include related context information and semantics of the class. We comprehensively evaluate the proposed multi-modal FSOD models on multiple few-shot object detection benchmarks, achieving promising results.

翻译：在本文中,我们研究多式少发点物体探测(FSOD),同时使用少发目视实例和分类语义信息进行探测,从定义上加以补充。以前关于多发型物体探测(FSOD)的工作大多以微调为基础,对在线应用程序来说效率不高。此外,这些方法通常需要诸如类名等专门知识来提取类语义嵌入,而稀有类则难以获得。我们的方法的动机是高层次概念相似性(基于计量的)元学习和快速学习,以分别学习通用的少发和零发对象探测模型,而无需微调。具体地说,我们把通过元化学习而学到的少数发式视觉分类师和文本分类师结合起来,分别用来建立多发式分类和检测模型。此外,我们建议基于元学习的跨模式,为少数发式视觉范例中的新课提供软提示,然后用来学习文字分类分解的微和零发式物体探测模型,然后用来学习文本分解的物体探测模型。我们通过多发式学习微的视觉模型来学习软化的图像。我们先前的感知知的图像,不易读的模型可以用来学习。我们用来了解。

相关内容

小样本学习

关注 215

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日