This paper presents a reinforcement learning framework that incorporates a Contextual Reward Machine for task-oriented grasping. The Contextual Reward Machine reduces task complexity by decomposing grasping tasks into manageable sub-tasks. Each sub-task is associated with a stage-specific context, including a reward function, an action space, and a state abstraction function. This contextual information enables efficient intra-stage guidance and improves learning efficiency by reducing the state-action space and guiding exploration within clearly defined boundaries. In addition, transition rewards are introduced to encourage or penalize transitions between stages which guides the model toward desirable stage sequences and further accelerates convergence. When integrated with the Proximal Policy Optimization algorithm, the proposed method achieved a 95% success rate across 1,000 simulated grasping tasks encompassing diverse objects, affordances, and grasp topologies. It outperformed the state-of-the-art methods in both learning speed and success rate. The approach was transferred to a real robot, where it achieved a success rate of 83.3% in 60 grasping tasks over six affordances. These experimental results demonstrate superior accuracy, data efficiency, and learning efficiency. They underscore the model's potential to advance task-oriented grasping in both simulated and real-world settings.
翻译:本文提出一种融合上下文奖励机的强化学习框架,用于任务导向抓取。上下文奖励机通过将抓取任务分解为可管理的子任务来降低任务复杂度。每个子任务关联一个阶段特定的上下文,包括奖励函数、动作空间和状态抽象函数。该上下文信息能够实现高效的阶段内引导,并通过缩减状态-动作空间、在明确定义的边界内指导探索来提升学习效率。此外,本文引入转移奖励机制以鼓励或惩罚阶段间的转移,从而引导模型朝向理想的阶段序列,进一步加速收敛。当与近端策略优化算法结合时,所提方法在涵盖多样物体、功能可供性与抓取拓扑的1,000个仿真抓取任务中实现了95%的成功率,在收敛速度和成功率上均优于现有最优方法。该方法迁移至真实机器人后,在六类功能可供性的60次抓取任务中取得了83.3%的成功率。实验结果验证了该方法在精度、数据效率与学习效率上的优越性,凸显了其在仿真与真实场景中推动任务导向抓取技术发展的潜力。