How can we segment varying numbers of objects where each specific object represents its own separate class? To make the problem even more realistic, how can we add and delete classes on the fly without retraining? This is the case of robotic applications where no datasets of the objects exist or application that includes thousands of objects (E.g., in logistics) where it is impossible to train a single model to learn all of the objects. Most current research on object segmentation for robotic grasping focuses on class-level object segmentation (E.g., box, cup, bottle), closed sets (specific objects of a dataset; for example, YCB dataset), or deep learning-based template matching. In this work, we are interested in open sets where the number of classes is unknown, varying, and without pre-knowledge about the objects' types. We consider each specific object as its own separate class. Our goal is to develop a zero-shot object detector that requires no training and can add any object as a class just by capturing a few images of the object. Our main idea is to break the segmentation pipelines into two steps by combining unseen object segmentation networks cascaded by zero-shot classifiers. We evaluate our zero-shot object detector on unseen datasets and compare it to a trained Mask R-CNN on those datasets. The results show that the performance varies from practical to unsuitable depending on the environment setup and the objects being handled. The code is available in our DoUnseen library repository.
翻译:如何在每个用特定物体代表其自身分类的情况下对不同数量的物体进行分段?为使问题更加现实,如何在不重新训练的情况下随时添加和删除类别?这就是机器人应用中不存在物体数据集或包含数千个物体(例如物流中)的情况,很难训练单个模型学习所有物体的情况。目前关于机器人抓取的物体分割研究大多集中在类别级物体分割(例如盒子、杯子、瓶子)、封闭集合(特定数据集中的特定物体;例如YCB数据集)或基于深度学习的模板匹配上。在此研究中,我们关注的是类别数量不确定、变化,并且不具备有关物体类型的预先知识的开放式集合。我们将每个特定物体视为其自身的独立类别。我们的目标是开发一种零样本物体检测器,不需要训练,只需捕捉物体的几张图像就可以将任何物体添加为一个类别。我们的主要想法是通过级联伪类别化的不可见物体分割网络将分割管道分为两个步骤。我们将我们的零样本物体检测器在未见过的数据集上进行评估,并将其与在这些数据集上训练的Mask R-CNN进行比较。结果表明,性能从实用到不太合适不等,具体取决于环境设置和所处理的物体。代码可在我们的DoUnseen库库存中获得。