Affordance detection from visual input is a fundamental step in autonomous robotic manipulation. Existing solutions to the problem of affordance detection rely on convolutional neural networks. However, these networks do not consider the spatial arrangement of the input data and miss parts-to-whole relationships. Therefore, they fall short when confronted with novel, previously unseen object instances or new viewpoints. One solution to overcome such limitations can be to resort to capsule networks. In this paper, we introduce the first affordance detection network based on dynamic tree-structured capsules for sparse 3D point clouds. We show that our capsule-based network outperforms current state-of-the-art models on viewpoint invariance and parts-segmentation of new object instances through a novel dataset we only used for evaluation and it is publicly available from github.com/gipfelen/DTCG-Net. In the experimental evaluation we will show that our algorithm is superior to current affordance detection methods when faced with grasping previously unseen objects thanks to our Capsule Network enforcing a parts-to-whole representation.
翻译:以视觉输入为主的检测是自动机器人操作中的一个基本步骤。 以进化神经网络为主, 现有致富检测问题解决方案依赖于进化神经网络。 但是, 这些网络并不考虑输入数据的空间安排, 也不考虑部件到整体关系。 因此, 当遇到新颖的、 先前不为人知的天体实例或新观点时, 它们就显得不足。 克服这些限制的一个解决方案可以是借助胶囊网络。 在本文中, 我们引入了第一个致富检测网络, 以动态树形胶囊为基础, 用于稀疏的 3D点云。 我们显示, 我们基于胶囊的网络通过我们仅用于评估的新数据集, 并且从 github.com/ gipfelelen/ DTCG-Net 上公开提供的新数据集, 超越了当前对新天体景景景景景景的状态模型。 在实验性评估中, 我们的算法将显示, 由于我们的Capsule Net, 我们的算法在捕捉取先前看不见的天体时, 优于目前的致的探测方法。