Instance segmentation of novel objects instances in RGB images, given some example images for each object, is a well known problem in computer vision. Designing a model general enough to be employed for all kinds of novel objects without (re-) training has proven to be a difficult task. To handle this, we present a new training-free framework, called: Novel Object Cyclic Threshold based Instance Segmentation (NOCTIS). NOCTIS integrates two pre-trained models: Grounded-SAM 2 for object proposals with precise bounding boxes and corresponding segmentation masks; and DINOv2 for robust class and patch embeddings, due to its zero-shot capabilities. Internally, the proposal-object matching is realized by determining an object matching score based on the similarity of the class embeddings and the average maximum similarity of the patch embeddings with a new cyclic thresholding (CT) mechanism that mitigates unstable matches caused by repetitive textures or visually similar patterns. Beyond CT, NOCTIS introduces: (i) an appearance score that is unaffected by object selection bias; (ii) the usage of the average confidence of the proposals' bounding box and mask as a scoring component; and (iii) an RGB-only pipeline that performs even better than RGB-D ones. We empirically show that NOCTIS, without further training/fine tuning, outperforms the best RGB and RGB-D methods regarding the mean AP score on the seven core datasets of the BOP 2023 challenge for the "Model-based 2D segmentation of unseen objects" task.
翻译:在给定每个物体若干示例图像的前提下,对RGB图像中新颖物体实例进行实例分割是计算机视觉领域一个广为人知的问题。设计一个足够通用、无需(重新)训练即可适用于各类新颖物体的模型已被证明是一项艰巨的任务。为此,我们提出了一种无需训练的新框架:基于循环阈值的新颖物体实例分割(NOCTIS)。NOCTIS整合了两个预训练模型:利用Grounded-SAM 2生成具有精确边界框及对应分割掩码的物体候选区域;借助DINOv2的零样本能力获取鲁棒的类别与图像块嵌入特征。在内部,通过计算类别嵌入相似度与图像块嵌入平均最大相似度来确定物体匹配分数,并引入新的循环阈值(CT)机制实现候选区域与目标物体的匹配,该机制能有效缓解因重复纹理或视觉相似模式导致的不稳定匹配问题。除CT机制外,NOCTIS还提出:(i)不受物体选择偏差影响的外观相似度评分;(ii)将候选区域边界框与掩码的平均置信度作为评分组成部分;(iii)纯RGB处理流程,其性能甚至优于RGB-D方法。我们通过实验证明,在BOP 2023挑战赛“未见物体的模型驱动二维分割”任务的七个核心数据集上,未经额外训练/微调的NOCTIS在平均AP指标上超越了所有最佳RGB与RGB-D方法。