Dynamic platforms that operate over manyunique terrain conditions typically require multiple controllers.To transition safely between controllers, there must be anoverlap of states between adjacent controllers. We developa novel method for training Setup Policies that bridge thetrajectories between pre-trained Deep Reinforcement Learning(DRL) policies. We demonstrate our method with a simulatedbiped traversing a difficult jump terrain, where a single policyfails to learn the task, and switching between pre-trainedpolicies without Setup Policies also fails. We perform anablation of key components of our system, and show thatour method outperforms others that learn transition policies.We demonstrate our method with several difficult and diverseterrain types, and show that we can use Setup Policies as partof a modular control suite to successfully traverse a sequence ofcomplex terrains. We show that using Setup Policies improvesthe success rate for traversing a single difficult jump terrain(from 1.5%success rate without Setup Policies to 82%), and asequence of various terrains (from 6.5%without Setup Policiesto 29.1%).
翻译:在多种独特地形条件下运行的动态平台通常需要多个控制器。 要在控制器之间安全过渡, 在相邻控制器之间必须存在重叠状态。 我们开发了一种新型的设置政策培训方法, 将经过训练的深强化学习( DRL) 政策之间的轨迹连接起来。 我们展示了我们的方法, 模拟地跨了困难的跳跃地形, 其中单项政策无法学习任务, 而在没有设置政策的情况下转换了培训前的政策也失败了。 我们对系统的关键组成部分进行了校验, 并展示了我们的方法优于学习过渡政策的其他人。 我们用几种困难和多样的地形类型展示了我们的方法。 我们展示了我们可以使用设置政策作为模块控制套件的一部分来成功穿越一系列复杂地形。 我们显示, 使用设置政策可以提高单项艰难的跳跃地形( 从没有设置政策的1.5%成功率到82%) 的成功率, 以及各种地形( 从6.5%没有设置政策到29.1%) 的成功率。