We propose AffordanceNet, a new deep learning approach to simultaneously detect multiple objects and their affordances from RGB images. Our AffordanceNet has two branches: an object detection branch to localize and classify the object, and an affordance detection branch to assign each pixel in the object to its most probable affordance label. The proposed framework employs three key components for effectively handling the multiclass problem in the affordance mask: a sequence of deconvolutional layers, a robust resizing strategy, and a multi-task loss function. The experimental results on the public datasets show that our AffordanceNet outperforms recent state-of-the-art methods by a fair margin, while its end-to-end architecture allows the inference at the speed of 150ms per image. This makes our AffordanceNet well suitable for real-time robotic applications. Furthermore, we demonstrate the effectiveness of AffordanceNet in different testing environments and in real robotic applications. The source code is available at https://github.com/nqanh/affordance-net
翻译:我们提议AffordanceNet, 这是一种从 RGB 图像中同时探测多个天体及其承载的新的深层次学习方法。 我们的 AffordanceNet 有两个分支: 一个物体探测分支, 用来对天体进行本地化和分类, 和一个使天体中每个像素被分配到最可能发酵的象素标签上。 这个拟议框架在有效处理发酵面具中多级问题方面采用了三个关键组成部分: 分层序列、 强有力的重整战略和多任务损失功能。 公共数据集的实验结果表明, 我们的AffordanceNet以公平幅度的方式超越了最新的最新最新技术, 而其端对端结构允许以每图像150米的速度进行推断。 这使我们的AffordanceNet非常适合实时机器人应用。 此外, 我们展示了AfordanceNet在不同测试环境和实际机器人应用中的有效性。 源代码可在 https://github.com/nqanh/affordance-net 上查阅。