Conventional works that learn grasping affordance from demonstrations need to explicitly predict grasping configurations, such as gripper approaching angles or grasping preshapes. Classic motion planners could then sample trajectories by using such predicted configurations. In this work, our goal is instead to fill the gap between affordance discovery and affordance-based policy learning by integrating the two objectives in an end-to-end imitation learning framework based on deep neural networks. From a psychological perspective, there is a close association between attention and affordance. Therefore, with an end-to-end neural network, we propose to learn affordance cues as visual attention that serves as a useful indicating signal of how a demonstrator accomplishes tasks, instead of explicitly modeling affordances. To achieve this, we propose a contrastive learning framework that consists of a Siamese encoder and a trajectory decoder. We further introduce a coupled triplet loss to encourage the discovered affordance cues to be more affordance-relevant. Our experimental results demonstrate that our model with the coupled triplet loss achieves the highest grasping success rate in a simulated robot environment. Our project website can be accessed at https://sites.google.com/asu.edu/affordance-aware-imitation/project.
翻译:常规工程,从演示中学会从展示中掌握发价。 常规工程,从演示中学会从发价, 需要明确预测握住的配置。 经典运动规划者可以通过使用这种预测的配置对轨迹进行取样。 在这项工作中, 我们的目标是将发价发现与发价政策学习之间的差距加以填补, 将这两项目标纳入一个基于深层神经网络的端到端模拟学习框架。 从心理角度看, 关注与发价之间有着密切的联系。 因此, 我们的实验结果显示, 我们的模型与三联式神经网络以视觉关注的形式学习发价提示, 作为一种有用的信号, 表明一个演示者如何完成任务, 而不是明确的建模。 为了实现这一目标, 我们提议了一个对比式学习框架, 包括一个暹米的编码器和一个轨迹解码。 我们还引入了三重损失, 以鼓励被发现的发价提示更具有发价相关性。 我们的实验结果显示, 我们的模型与三联式的三联式损失在模拟机器人环境中取得了最高的成功率。 我们的项目可以访问 http/ acrodumental