We address the task of open-world class-agnostic object detection, i.e., detecting every object in an image by learning from a limited number of base object classes. State-of-the-art RGB-based models suffer from overfitting the training classes and often fail at detecting novel-looking objects. This is because RGB-based models primarily rely on appearance similarity to detect novel objects and are also prone to overfitting short-cut cues such as textures and discriminative parts. To address these shortcomings of RGB-based object detectors, we propose incorporating geometric cues such as depth and normals, predicted by general-purpose monocular estimators. Specifically, we use the geometric cues to train an object proposal network for pseudo-labeling unannotated novel objects in the training set. Our resulting Geometry-guided Open-world Object Detector (GOOD) significantly improves detection recall for novel object categories and already performs well with only a few training classes. Using a single "person" class for training on the COCO dataset, GOOD surpasses SOTA methods by 5.0% AR@100, a relative improvement of 24%.
翻译:我们处理开放世界类不可知物体探测的任务,即通过从数量有限的基本物体类别中学习,探测图像中的每个物体,从数量有限的基本物体类别中发现每个物体。以RGB为主的先进模型在训练课程中存在过量的缺陷,而且往往无法探测新颖的物体。这是因为基于RGB的模型主要依赖外观相似性来探测新物体,并且也容易过分适应像纹理和歧视性部件这样的短线。为了解决RGB基物体探测器的这些缺陷,我们提议纳入诸如深度和正常度等几何指示器,由通用单星测算器预测。具体地说,我们使用几何指针来训练一个在训练成套训练中假标签无注释的新物体的物体的物体提议网络。我们由此形成的以几何制制导的开放世界物体探测器(GOOD)大大改进了新物体类别的探测,并且只在少数训练课程中进行良好表现。我们使用单一的“人”课程来进行CO数据集培训,GNE超SOTA方法,由5.0%的AR@100进行相对的24%的改进。