We tackle a new task of few-shot object counting and detection. Given a few exemplar bounding boxes of a target object class, we seek to count and detect all objects of the target class. This task shares the same supervision as the few-shot object counting but additionally outputs the object bounding boxes along with the total object count. To address this challenging problem, we introduce a novel two-stage training strategy and a novel uncertainty-aware few-shot object detector: Counting-DETR. The former is aimed at generating pseudo ground-truth bounding boxes to train the latter. The latter leverages the pseudo ground-truth provided by the former but takes the necessary steps to account for the imperfection of pseudo ground-truth. To validate the performance of our method on the new task, we introduce two new datasets named FSCD-147 and FSCD-LVIS. Both datasets contain images with complex scenes, multiple object classes per image, and a huge variation in object shapes, sizes, and appearance. Our proposed approach outperforms very strong baselines adapted from few-shot object counting and few-shot object detection with a large margin in both counting and detection metrics. The code and models are available at \url{https://github.com/VinAIResearch/Counting-DETR}.
翻译:我们处理的是一项新任务就是数点天体的计数和探测。 在一个目标对象类的几个示例框中, 我们试图计算和检测目标类的所有对象。 这个任务与数点天体的计数和总天体计数的不完善性能相同。 为了验证我们新任务的方法的性能, 我们引入了两个新的数据集, 名为 FSCD-147 和 FSCD- LVIS 。 这两个数据集包含复杂的图像、 每个图像的多对象类别, 以及对象形状、 大小和外观的巨大变异。 我们提出的方法超越了从数点天体/ 地面图提供的非常强的基线, 但采取了必要的步骤来说明假地标的不完善性。 用于检测新任务的方法, 我们引入了两个新的数据集, 名为 FSCD-147 和 FSCD- LVIS。 这两个数据集包含复杂的图像, 每个图像的多个对象类别, 以及对象形状、 大小和外观的巨大变异。 我们提出的方法超越了从几点天体的物体/ TR_ 测量/ tragmagard 的模型。 和几张的天体探测模型。