We tackle a new task of few-shot object counting and detection. Given a few exemplar bounding boxes of a target object class, we seek to count and detect all objects of the target class. This task shares the same supervision as the few-shot object counting but additionally outputs the object bounding boxes along with the total object count. To address this challenging problem, we introduce a novel two-stage training strategy and a novel uncertainty-aware few-shot object detector: Counting-DETR. The former is aimed at generating pseudo ground-truth bounding boxes to train the latter. The latter leverages the pseudo ground-truth provided by the former but takes the necessary steps to account for the imperfection of pseudo ground-truth. To validate the performance of our method on the new task, we introduce two new datasets named FSCD-147 and FSCD-LVIS. Both datasets contain images with complex scenes, multiple object classes per image, and a huge variation in object shapes, sizes, and appearance. Our proposed approach outperforms very strong baselines adapted from few-shot object counting and few-shot object detection with a large margin in both counting and detection metrics. The code and models are available at https://github.com/VinAIResearch/Counting-DETR.
翻译:我们处理的是一项新任务就是数点天体的计数和探测。 在一个目标对象类的几个示例框中, 我们试图计算和检测目标类的所有对象。 这个任务与数点天体的计数和总天体计数具有相同的监督力, 但它与数点天体的计数和额外输出物体的捆绑框和总天体计数相同。 为了应对这个具有挑战性的问题, 我们引入了一个新型的两阶段培训战略和一个新的有知觉的不确定性的微小物体探测器: 计数- DETR。 前者的目的是生成假的地面图盘框框以训练后者。 后者利用了前者提供的假地真真真真真真假, 但采取了必要的步骤来说明假地真假天体的不完美。 为了验证我们新任务的方法的性能, 我们引入了两个新的数据集, 名为 FSCD-147 和 FSCD- LVIS。 这两个数据集包含复杂场景的图像, 多个天体级, 以及天体形状、 大小和外观的巨大变化。 我们提议的方法超越了从几颗物体的基线, TR 计/ 正在测量 进行大的测试 。