Detecting both known and unknown objects is a fundamental skill for robot manipulation in unstructured environments. Open-set object detection (OSOD) is a promising direction to handle the problem consisting of two subtasks: objects and background separation, and open-set object classification. In this paper, we present Openset RCNN to address the challenging OSOD. To disambiguate unknown objects and background in the first subtask, we propose to use classification-free region proposal network (CF-RPN) which estimates the objectness score of each region purely using cues from object's location and shape preventing overfitting to the training categories. To identify unknown objects in the second subtask, we propose to represent them using the complementary region of known categories in a latent space which is accomplished by a prototype learning network (PLN). PLN performs instance-level contrastive learning to encode proposals to a latent space and builds a compact region centering with a prototype for each known category. Further, we note that the detection performance of unknown objects can not be unbiasedly evaluated on the situation that commonly used object detection datasets are not fully annotated. Thus, a new benchmark is introduced by reorganizing GraspNet-1billion, a robotic grasp pose detection dataset with complete annotation. Extensive experiments demonstrate the merits of our method. We finally show that our Openset RCNN can endow the robot with an open-set perception ability to support robotic rearrangement tasks in cluttered environments. More details can be found in https://sites.google.com/view/openest-rcnn/
翻译:检测已知和未知天体是在非结构化环境中进行机器人操纵的基本技能。 开放设置天体检测( OSOD) 是处理由两个子任务组成的问题的有希望的方向: 对象和背景分离, 以及开放设置天体分类。 在此文件中, 我们介绍 Openset RCNN 来应对具有挑战性的 OSOD 。 要在第一个子任务中分解未知天体和背景, 我们提议使用无分类的区域建议网络( CF- RPN) 来估计每个区域的目标分数。 开放设置天体检测( 仅使用天体位置的提示和形状来防止过度适应培训类别 。 为了在第二个子任务中识别未知天体, 我们提议在由原型学习网络( PLN) 完成的潜在天体分类中代表它们。 PLN 进行实级对比性学习, 将建议编码为潜在空间, 并构建一个以每个已知类别原型为中心的紧凑区域。 此外, 我们指出, 未知天体的天体检测性功能不能被公正评估通常使用的天体探测天体探测器数据集的状态, 无法完全被注解。 因此, 我们的机器人- 将更能测试- 将展示的机能基准显示, 我们的机器人- 将展示- 将展示- 将展示的轨道- 将展示的轨道- 将展示一个基础 将展示- 将展示一个基础 。