Annotating bounding boxes for object detection is expensive, time-consuming, and error-prone. In this work, we propose a DETR based framework called ComplETR that is designed to explicitly complete missing annotations in partially annotated dense scene datasets. This reduces the need to annotate every object instance in the scene thereby reducing annotation cost. ComplETR augments object queries in DETR decoder with patch information of objects in the image. Combined with a matching loss, it can effectively find objects that are similar to the input patch and complete the missing annotations. We show that our framework outperforms the state-of-the-art methods such as Soft Sampling and Unbiased Teacher by itself, while at the same time can be used in conjunction with these methods to further improve their performance. Our framework is also agnostic to the choice of the downstream object detectors; we show performance improvement for several popular detectors such as Faster R-CNN, Cascade R-CNN, CenterNet2, and Deformable DETR on multiple dense scene datasets.
翻译:在这项工作中,我们提议了一个以DETR为基础的框架,称为ComplETR,旨在明确完成部分附加说明的密集场景数据集中缺失的注释。这减少了对现场每个物体实例进行注释的必要性,从而降低了注释成本。CompelETR用图像中物体的补丁信息来增加 DERTR 解码器中的物体查询。加上匹配丢失,它可以有效地找到与输入补丁相似的物体并完成缺失的注释。我们显示,我们的框架本身超越了软取样和无偏见教师等最先进的方法,同时可以与这些方法同时使用,以进一步改进其性能。我们的框架对于下游物体探测器的选择也是不可知的;我们对一些流行的探测器,如快速R-CNN、Cascade R-CN、CentNet2和多个密度稠密的场景数据集的可变式 DETTR,我们显示了其性能的改进情况。