Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time. However, a key limitation of ES is its scalability to large capacity models, including modern neural network architectures. In this work, we develop Predictive Information Augmented Random Search (PI-ARS) to mitigate this limitation by leveraging recent advancements in representation learning to reduce the parameter search space for ES. Namely, PI-ARS combines a gradient-based representation learning technique, Predictive Information (PI), with a gradient-free ES algorithm, Augmented Random Search (ARS), to train policies that can process complex robot sensory inputs and handle highly nonlinear robot dynamics. We evaluate PI-ARS on a set of challenging visual-locomotion tasks where a quadruped robot needs to walk on uneven stepping stones, quincuncial piles, and moving platforms, as well as to complete an indoor navigation task. Across all tasks, PI-ARS demonstrates significantly better learning efficiency and performance compared to the ARS baseline. We further validate our algorithm by demonstrating that the learned policies can successfully transfer to a real quadruped robot, for example, achieving a 100% success rate on the real-world stepping stone environment, dramatically improving prior results achieving 40% success.
翻译:进化战略(ES)算法在培训复杂的机器人控制政策方面显示出了有希望的成果,因为它们具有巨大的平行能力、简单的实施、有效的参数空间探索和快速培训时间。然而,ES的关键局限性在于其可扩缩到大型能力模型,包括现代神经网络结构。在这项工作中,我们开发了预测信息增强随机搜索(PI-ARS),以通过利用最近的代表性学习进步来减少ES的参数搜索空间来缓解这一局限性。也就是说,PI-ARS将基于梯度的代表性学习技术、预测信息(PI)与无梯度ES算法、扩大随机搜索(ARS)结合起来,以培训能够处理复杂的机器人感官输入和处理高度非线性机器人动态的政策。我们评估了PI-ARS在一系列具有挑战性的视觉-传动任务上,使一个四分立的机器人需要走在不均匀的踏垫石、五金星堆、移动平台上,以及完成40个室内导航任务。在所有任务中,PI-ARS展示了比ARS标准成功学习效率和业绩要好得多得多,以达到ARSAR标准级标准级标准环境。我们进一步验证我们所学前的模型成功。