We tackle the problem of novel class discovery and localization (NCDL). In this setting, we assume a source dataset with supervision for only some object classes. Instances of other classes need to be discovered, classified, and localized automatically based on visual similarity without any human supervision. To tackle NCDL, we propose a two-stage object detection network Region-based NCDL (RNCDL) that uses a region proposal network to localize regions of interest (RoIs). We then train our network to learn to classify each RoI, either as one of the known classes, seen in the source dataset, or one of the novel classes, with a long-tail distribution constraint on the class assignments, reflecting the natural frequency of classes in the real world. By training our detection network with this objective in an end-to-end manner, it learns to classify all region proposals for a large variety of classes, including those not part of the labeled object class vocabulary. Our experiments conducted using COCO and LVIS datasets reveal that our method is significantly more effective than multi-stage pipelines that rely on traditional clustering algorithms. Furthermore, we demonstrate the generality of our approach by applying our method to a large-scale Visual Genome dataset, where our network successfully learns to detect various semantic classes without direct supervision.
翻译:我们处理新类发现和本地化问题。 在这一背景下,我们假设一个源数据集,仅监督某些对象类。 需要根据视觉相似性自动发现、分类和本地化其他类的事例,无需人监督。 为了应对NCDL,我们提议一个两阶段天体探测网络 区域基于NCDL(RNCDL),利用一个区域建议网络将感兴趣的区域(Rois)本地化。然后,我们培训我们的网络,学习将每个RoI分类,作为已知的类别之一,在源数据集或新类中看到,对类任务有长尾分发限制,反映真实世界中班级的自然频率。为了以端到端的方式培训我们的探测网络,我们学习将所有区域关于大类的建议分类,包括没有贴上标签的物体类词汇。我们使用COCOCO和LVIS数据集进行的实验表明,我们的方法比依赖传统集群算法的多级管道要有效得多。 此外,我们通过以端到端的方式,通过不直接的层次来学习我们各种视觉数据。 我们通过直观的基因学方法, 成功地将我们的各种网络的方法去探测一个大层次。