Affordance detection refers to identifying the potential action possibilities of objects in an image, which is a crucial ability for robot perception and manipulation. To empower robots with this ability in unseen scenarios, we first study the challenging one-shot affordance detection problem in this paper, i.e., given a support image that depicts the action purpose, all objects in a scene with the common affordance should be detected. To this end, we devise a One-Shot Affordance Detection Network (OSAD-Net) that firstly estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images. Through collaboration learning, OSAD-Net can capture the common characteristics between objects having the same underlying affordance and learn a good adaptation capability for perceiving unseen affordances. Besides, we build a large-scale Purpose-driven Affordance Dataset v2 (PADv2) by collecting and labeling 30k images from 39 affordance and 103 object categories. With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods and may also facilitate downstream vision tasks, such as scene understanding, action recognition, and robot manipulation. Specifically, we conducted comprehensive experiments on PADv2 dataset by including 11 advanced models from several related research fields. Experimental results demonstrate the superiority of our model over previous representative ones in terms of both objective metrics and visual quality. The benchmark suite is available at https://github.com/lhc1224/OSAD Net.
翻译:发价检测是指确定图像中物体的潜在行动可能性,这是机器人感知和操纵的关键能力。为了赋予机器人以这种能力,在不可见的情景中赋予机器人以这种能力,我们首先研究本文件中具有挑战性的一发发供货的检测问题,即,如果有一个能描述动作目的的支持图像,那么,应当检测到所有在具有共同发价的场景中的所有物体。为此,我们设计了一个“一发供货”检测网络(OSAD-Net),首先对人的行动目的进行估算,然后将其传输,帮助从所有候选图像中检测通用的支付能力。通过合作学习,OSAD-Net可以捕捉到具有相同基本支付能力的对象之间的共同特征,并学习一种良好的适应能力,以洞察看看不见的支付款。此外,我们通过收集和标注39发价和103个对象类别的30k图像(OSAD-Net),我们PADv2数据集可以用作测试床位的测试床位,用以测量基准检测方法,并且也可以在以下的视野实验场域中,包括试算。