Conventional works that learn grasping affordance from demonstrations need to explicitly predict grasping configurations, such as gripper approaching angles or grasping preshapes. Classic motion planners could then sample trajectories by using such predicted configurations. In this work, our goal is instead to integrate the two objectives of affordance discovery and affordance-aware policy learning in an end-to-end imitation learning framework based on deep neural networks. From a psychological perspective, there is a close association between attention and affordance. Therefore, with an end-to-end neural network, we propose to learn affordance cues as visual attention that serves as a useful indicating signal of how a demonstrator accomplishes tasks. To achieve this, we propose a contrastive learning framework that consists of a Siamese encoder and a trajectory decoder. We further introduce a coupled triplet loss to encourage the discovered affordance cues to be more affordance-relevant. Our experimental results demonstrate that our model with the coupled triplet loss achieves the highest grasping success rate.
翻译:常规工程,从演示中学会掌握发酵能力,需要明确预测捕捉者接近角度或捕捉前形形形形形形形形形形形形形形形形形形形形形形形形色色的配置。 经典运动规划者可以通过使用这种预测的配置对轨迹进行取样。 在这项工作中,我们的目标是将发酵发现和发酵意识政策学习这两个目标纳入基于深层神经网络的端到端模拟学习框架中。 从心理角度看,注意力和发酵之间有着密切的联系。 因此,通过终端到终端神经网络,我们建议学习发货提示作为视觉关注,作为显示演示者如何完成任务的有用信号。 为了实现这一目标,我们提出了由暹米编码器和轨迹分解器组成的对比式学习框架。 我们进一步引入了三重损失,鼓励发现发酵信号更具有发酵相关性。 我们的实验结果显示,与三重损失同时出现的模型取得了最高的成功率。