Instance segmentation of novel objects instances in RGB images, given some example images for each object, is a well known problem in computer vision. Designing a model general enough to be employed for all kinds of novel objects without (re-) training has proven to be a difficult task. To handle this, we present a new training-free framework, called: Novel Object Cyclic Threshold based Instance Segmentation (NOCTIS). NOCTIS integrates two pre-trained models: Grounded-SAM 2 for object proposals with precise bounding boxes and corresponding segmentation masks; and DINOv2 for robust class and patch embeddings, due to its zero-shot capabilities. Internally, the proposal-object matching is realized by determining an object matching score based on the similarity of the class embeddings and the average maximum similarity of the patch embeddings with a new cyclic thresholding (CT) mechanism that mitigates unstable matches caused by repetitive textures or visually similar patterns. Beyond CT, NOCTIS introduces: (i) an appearance score that is unaffected by object selection bias; (ii) the usage of the average confidence of the proposals' bounding box and mask as a scoring component; and (iii) an RGB-only pipeline that performs even better than RGB-D ones. We empirically show that NOCTIS, without further training/fine tuning, outperforms the best RGB and RGB-D methods regarding the mean AP score on the seven core datasets of the BOP 2023 challenge for the "Model-based 2D segmentation of unseen objects" task.
翻译:在给定每个物体若干示例图像的情况下,对RGB图像中新颖物体实例进行实例分割,是计算机视觉领域一个广为人知的问题。设计一个足够通用、无需(重新)训练即可适用于各类新颖物体的模型已被证明是一项艰巨的任务。为此,我们提出了一种新的免训练框架:基于循环阈值的新颖物体实例分割(NOCTIS)。NOCTIS整合了两个预训练模型:Grounded-SAM 2用于生成具有精确边界框及对应分割掩码的物体候选区域;DINOv2则凭借其零样本能力提供鲁棒的类别与图像块嵌入。在内部,候选区域与物体的匹配通过计算物体匹配分数实现,该分数基于类别嵌入的相似度以及图像块嵌入的平均最大相似度,并采用一种新的循环阈值(CT)机制来缓解由重复纹理或视觉相似图案引起的不稳定匹配。除CT外,NOCTIS还引入了:(i)不受物体选择偏差影响的外观分数;(ii)将候选边界框与掩码的平均置信度作为评分组成部分;(iii)仅使用RGB数据的处理流程,其性能甚至优于RGB-D方法。我们通过实验证明,在BOP 2023挑战赛的七个核心数据集上,针对“未见物体的模型驱动二维分割”任务,未经进一步训练/微调的NOCTIS在平均AP分数上超越了所有最佳RGB与RGB-D方法。