We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals to aggregate feature representation for the target video object; 3) a strategically improved Temporal Matching Network (TMN+) to match representative query tube features and supports with better discriminative ability. Our TPN and TMN+ are jointly and end-to-end trained. Extensive experiments demonstrate that our method produces significantly better detection results on two few-shot video object detection datasets compared to image-based methods and other naive video-based extensions. Codes and datasets will be released at https://github.com/fanq15/FewX.
翻译:我们引入了几小片视频物体探测(FSVOD),有三个重要贡献:(1) 大型视频数据集FSVOD-500,由500个班组成,每类中各有500个带级平衡的视频,进行几发学习;(2) 新型Tube建议网络(TPN),以产生高质量的视频管建议,以汇总目标视频物体的特征;(3) 战略上改进的时空匹配网络(TMN+),以匹配有代表性的查询管特征,并以更好的歧视能力提供支持。我们的主题方案网络和TMN+受到联合和端至端培训。广泛的实验表明,与基于图像的方法和其他天真的视频扩展相比,我们的方法在两个几发视频物体探测数据集上产生更好的检测结果。代码和数据集将在https://github.com/fanq15/FewX上发布。