We tackle the problem of novel class discovery, detection, and localization (NCDL). In this setting, we assume a source dataset with labels for objects of commonly observed classes. Instances of other classes need to be discovered, classified, and localized automatically based on visual similarity, without human supervision. To this end, we propose a two-stage object detection network Region-based NCDL (RNCDL), that uses a region proposal network to localize object candidates and is trained to classify each candidate, either as one of the known classes, seen in the source dataset, or one of the extended set of novel classes, with a long-tail distribution constraint on the class assignments, reflecting the natural frequency of classes in the real world. By training our detection network with this objective in an end-to-end manner, it learns to classify all region proposals for a large variety of classes, including those that are not part of the labeled object class vocabulary. Our experiments conducted using COCO and LVIS datasets reveal that our method is significantly more effective compared to multi-stage pipelines that rely on traditional clustering algorithms or use pre-extracted crops. Furthermore, we demonstrate the generality of our approach by applying our method to a large-scale Visual Genome dataset, where our network successfully learns to detect various semantic classes without explicit supervision.
翻译:我们处理新类发现、检测和本地化问题。 在这种环境下, 我们假设一个源数据集, 标有常见分类对象的标签。 其他类的事例需要根据视觉相似性自动发现、分类和本地化, 无需人监督。 为此, 我们提议一个两阶段对象探测网络 区域基于NCDL(RNCDL), 使用一个区域建议网络将对象候选人本地化, 并经过培训, 将每个候选人分类为已知类别之一, 在源数据集中看到, 或者在扩大的一组新类中看到, 对类分配有长尾分配限制, 反映真实世界班级的自然频率。 通过以端到端的方式培训我们以此目标的探测网络, 它学会对大类的所有区域建议进行分类, 包括不属于标定对象类词汇的一部分。 我们使用COCO 和 LVIS 数据集进行的实验显示, 我们的方法比多级管道要有效得多, 依赖传统的集群算法, 或者使用我们前的直观性网络方法, 并且我们不用大规模的测算方法, 学习各种直观作物。