具有奖分机制的学习四分居政策 (Learning Quadruped Locomotion Policies with Reward Machines)

Legged robots have been shown to be effective in navigating unstructured environments. Although there has been much success in learning locomotion policies for quadruped robots, there is little research on how to incorporate human knowledge to facilitate this learning process. In this paper, we demonstrate that human knowledge in the form of LTL formulas can be applied to quadruped locomotion learning within a Reward Machine (RM) framework. Experimental results in simulation show that our RM-based approach enables easily defining diverse locomotion styles, and efficiently learning locomotion policies of the defined styles.

翻译：虽然在学习四重机器人的移动政策方面取得了很大成功,但对于如何将人类知识纳入到促进这一学习过程的研究却很少。在本文中,我们证明以立特L公式形式的人类知识可以适用于在奖励机器(RM)框架内的四重移动学习。模拟实验结果显示,我们基于RM的方法可以方便地界定不同的移动风格,并有效地学习特定风格的移动政策。