Most existing works on few-shot object detection (FSOD) focus on a setting where both pre-training and few-shot learning datasets are from a similar domain. However, few-shot algorithms are important in multiple domains; hence evaluation needs to reflect the broad applications. We propose a Multi-dOmain Few-Shot Object Detection (MoFSOD) benchmark consisting of 10 datasets from a wide range of domains to evaluate FSOD algorithms. We comprehensively analyze the impacts of freezing layers, different architectures, and different pre-training datasets on FSOD performance. Our empirical results show several key factors that have not been explored in previous works: 1) contrary to previous belief, on a multi-domain benchmark, fine-tuning (FT) is a strong baseline for FSOD, performing on par or better than the state-of-the-art (SOTA) algorithms; 2) utilizing FT as the baseline allows us to explore multiple architectures, and we found them to have a significant impact on down-stream few-shot tasks, even with similar pre-training performances; 3) by decoupling pre-training and few-shot learning, MoFSOD allows us to explore the impact of different pre-training datasets, and the right choice can boost the performance of the down-stream tasks significantly. Based on these findings, we list possible avenues of investigation for improving FSOD performance and propose two simple modifications to existing algorithms that lead to SOTA performance on the MoFSOD benchmark. The code is available at https://github.com/amazon-research/few-shot-object-detection-benchmark.
翻译:有关微小物体探测(FSOD)的现有工作大多集中在培训前层、不同架构和不同培训前数据集对FSOD业绩的影响。我们的经验结果表明,在以往工作中没有探讨的一些关键因素:1) 与以往的信念相反,在多面体基准上,微调(FT)是FSOD的坚实基准,其执行水平等于或好于艺术状态(SOTA)的算法;2) 利用FT作为基线,使我们能够探索多种结构,我们发现这些结构对下流几发任务有重大影响,甚至具有类似的培训前性能;3) 通过在多面体体体基基准上与先前的信念相反,微调(FSOD)是FSOD的强大基准;2) 利用FT作为基线,使我们能够探索多种结构,我们发现它们对下流体层几发任务有重大影响,甚至具有类似的培训前性能;3) 通过在多面体体体基前和几面体型的LODFS(MO) 测试前阶段,我们大幅地研究SOD的绩效,可以研究这些现有分析。 MoFSOD数据库中可能改进的轨道,可以改进现有基准。