Brachiation is the primary form of locomotion for gibbons and siamangs, in which these primates swing from tree limb to tree limb using only their arms. It is challenging to control because of the limited control authority, the required advance planning, and the precision of the required grasps. We present a novel approach to this problem using reinforcement learning, and as demonstrated on a finger-less 14-link planar model that learns to brachiate across challenging handhold sequences. Key to our method is the use of a simplified model, a point mass with a virtual arm, for which we first learn a policy that can brachiate across handhold sequences with a prescribed order. This facilitates the learning of the policy for the full model, for which it provides guidance by providing an overall center-of-mass trajectory to imitate, as well as for the timing of the holds. Lastly, the simplified model can also readily be used for planning suitable sequences of handholds in a given environment. Our results demonstrate brachiation motions with a variety of durations for the flight and hold phases, as well as emergent extra back-and-forth swings when this proves useful. The system is evaluated with a variety of ablations. The method enables future work towards more general 3D brachiation, as well as using simplified model imitation in other settings.
翻译:布拉奇化是长臂和长臂动物的主要摇动形式,这些灵长类动物只能用手臂从树肢到树肢的摇摆形式;由于控制权力有限、需要提前规划和所需把握的精确性,控制起来十分困难;我们利用强化学习,并用一个不手指的14连线平板模型来解决这个问题,该模型学会跨越具有挑战性的手掌序列。我们方法的关键在于使用一个简化模型,即一个带有虚拟臂的点质量,我们首先学习一种能够用规定顺序跨越手掌序列的政策;这有利于学习整个模型的政策,因为这一政策提供了指导,通过提供全面心脏中心模拟的轨迹以及搁置的时间。最后,简化模型还可用于在特定环境中规划适当的手握序列。我们的结果显示,在飞行和保持各个阶段都有不同的时间段,以及前置的后置顺序,以及前置的后置顺序。在使用一个更简化的系统时,采用更简化的轮动方法,使整个系统能够进行更有用的后置。