Few-shot (FS) and zero-shot (ZS) learning are two different approaches for scaling temporal action detection (TAD) to new classes. The former adapts a pretrained vision model to a new task represented by as few as a single video per class, whilst the latter requires no training examples by exploiting a semantic description of the new class. In this work, we introduce a new multi-modality few-shot (MMFS) TAD problem, which can be considered as a marriage of FS-TAD and ZS-TAD by leveraging few-shot support videos and new class names jointly. To tackle this problem, we further introduce a novel MUlti-modality PromPt mETa-learning (MUPPET) method. This is enabled by efficiently bridging pretrained vision and language models whilst maximally reusing already learned capacity. Concretely, we construct multi-modal prompts by mapping support videos into the textual token space of a vision-language model using a meta-learned adapter-equipped visual semantics tokenizer. To tackle large intra-class variation, we further design a query feature regulation scheme. Extensive experiments on ActivityNetv1.3 and THUMOS14 demonstrate that our MUPPET outperforms state-of-the-art alternative methods, often by a large margin. We also show that our MUPPET can be easily extended to tackle the few-shot object detection problem and again achieves the state-of-the-art performance on MS-COCO dataset. The code will be available in https://github.com/sauradip/MUPPET
翻译:少发( FS) 和零发( ZS) 学习是将时间行动探测( TAD) 放大到新班级的两种不同方法。 前者将预先训练的视觉模型改造成一个新任务, 以每班只有少量的单一视频为代表, 而后者则不需要通过对新班级进行语义描述来进行训练。 在这项工作中, 我们引入一个新的多式少发( MMMFFS) TAD (MMFS) 问题, 它可以被视作 FS- TAD 和 ZS- TAD 的结合, 通过联合使用少发支持视频和新类名称。 为了解决这个问题, 我们进一步引入了新型 MULTA- Modality PromPt mETa- 学习( MUPETETETET) 的新模式。 通过高效地连接预先训练的视野和语言模型, 并尽量重新使用已经学习过的能力。 具体地, 我们通过元- AS- PPFS 的调控点/ 设置视觉标像标符空间。 我们还可以在内部进行大规模的 AS- AS- ISDR AS- AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS ASU NA AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS NA NA AS AS AS NA NA NA AS NA NA NA NA NA NA NA NA NA NA AS NA NA NA NA NA NA AS AS AS AS AS AS AS AS AS AS AS AS AS AS AS NA AS AS AS AS AS AS AS AS AS MA