Open-world object detection (OWOD), as a more general and challenging goal, requires the model trained from data on known objects to detect both known and unknown objects and incrementally learn to identify these unknown objects. The existing works which employ standard detection framework and fixed pseudo-labelling mechanism (PLM) have the following problems: (i) The inclusion of detecting unknown objects substantially reduces the model's ability to detect known ones. (ii) The PLM does not adequately utilize the priori knowledge of inputs. (iii) The fixed selection manner of PLM cannot guarantee that the model is trained in the right direction. We observe that humans subconsciously prefer to focus on all foreground objects and then identify each one in detail, rather than localize and identify a single object simultaneously, for alleviating the confusion. This motivates us to propose a novel solution called CAT: LoCalization and IdentificAtion Cascade Detection Transformer which decouples the detection process via the shared decoder in the cascade decoding way. In the meanwhile, we propose the self-adaptive pseudo-labelling mechanism which combines the model-driven with input-driven PLM and self-adaptively generates robust pseudo-labels for unknown objects, significantly improving the ability of CAT to retrieve unknown objects. Comprehensive experiments on two benchmark datasets, i.e., MS-COCO and PASCAL VOC, show that our model outperforms the state-of-the-art in terms of all metrics in the task of OWOD, incremental object detection (IOD) and open-set detection.
翻译:开放世界物体检测(OWOD)是一项更为普遍和具有挑战性的目标,要求在已知物体的数据基础上训练的模型检测已知和未知物体,并逐步学习识别这些未知物体。现有的采用标准检测框架和固定伪标签机制(PLM)的方法存在以下问题:(i)检测未知物体的包含显著降低了模型检测已知物体的能力.(ii) PLM 没有充分利用输入的先验知识.(iii) PLM 的固定选择方式不能保证模型朝着正确的方向进行训练。我们发现人类下意识地更喜欢先集中精力关注所有前景物体,然后逐一详细识别它们,以缓解混淆。这激发了我们提出一种新的解决方案,称为CAT:定位和识别级联检测Transformer,它通过级联解码的方式,通过共享解码器解耦检测过程。同时,我们提出了自适应伪标注机制,将模型驱动和输入驱动的PLM相结合,自适应地生成稳健的未知物体伪标注,显著提高CAT检测未知物体的能力。在两个基准数据集MS-COCO和PASCAL VOC上的全面实验结果表明,我们的模型在OWOD任务、增量物体检测(IOD)和开放集检测方面优于现有技术。