Affordance detection refers to identifying the potential action possibilities of objects in an image, which is an important ability for robot perception and manipulation. To empower robots with this ability in unseen scenarios, we consider the challenging one-shot affordance detection problem in this paper, i.e., given a support image that depicts the action purpose, all objects in a scene with the common affordance should be detected. To this end, we devise a One-Shot Affordance Detection (OS-AD) network that firstly estimates the purpose and then transfers it to help detect the common affordance from all candidate images. Through collaboration learning, OS-AD can capture the common characteristics between objects having the same underlying affordance and learn a good adaptation capability for perceiving unseen affordances. Besides, we build a Purpose-driven Affordance Dataset (PAD) by collecting and labeling 4k images from 31 affordance and 72 object categories. Experimental results demonstrate the superiority of our model over previous representative ones in terms of both objective metrics and visual quality. The benchmark suite is at ProjectPage.
翻译:为了在不可见的情景中赋予机器人这种能力,我们考虑了本文中具有挑战性的单发价格检测问题,即,鉴于一个描述动作目的的辅助图像,应当探测出所有在有共同价格的场景中的所有物体。为此,我们设计了一个“一拍价格检测”网络,首先对目的进行估计,然后将其传输,以帮助从所有候选图像中探测出共同价格。通过合作学习,OS-AD可以捕捉具有同样基本价格的物体之间的共同特点,并学习一种良好的适应能力,以洞察到看不见的价格。此外,我们通过收集和标注31价格和72个对象类别的4k图像,建立一个目的驱动价格数据集。实验结果显示,从客观指标和视觉质量两方面看,我们的模型优于以往的代表性。基准套装在ProjectPage中。