In real-world environments, robots need to be resilient to damages and robust to unforeseen scenarios. Quality-Diversity (QD) algorithms have been successfully used to make robots adapt to damages in seconds by leveraging a diverse set of learned skills. A high diversity of skills increases the chances of a robot to succeed at overcoming new situations since there are more potential alternatives to solve a new task.However, finding and storing a large behavioural diversity of multiple skills often leads to an increase in computational complexity. Furthermore, robot planning in a large skill space is an additional challenge that arises with an increased number of skills. Hierarchical structures can help reducing this search and storage complexity by breaking down skills into primitive skills. In this paper, we introduce the Hierarchical Trial and Error algorithm, which uses a hierarchical behavioural repertoire to learn diverse skills and leverages them to make the robot adapt quickly in the physical world. We show that the hierarchical decomposition of skills enables the robot to learn more complex behaviours while keeping the learning of the repertoire tractable. Experiments with a hexapod robot show that our method solves a maze navigation tasks with 20% less actions in simulation, and 43% less actions in the physical world, for the most challenging scenarios than the best baselines while having 78% less complete failures.
翻译:在现实世界环境中,机器人需要具有抵御损害的复原力,并能够应对无法预见的情景。质量-多样性(QD)算法已经成功地利用多种技能使机器人在几秒钟内适应损害。技能的高度多样性增加了机器人成功克服新情况的机会,因为有更可能的新任务。然而,找到和储存大量多种技能的行为多样性往往导致计算复杂性的增加。此外,在大型技能空间进行机器人规划是随着技能的增加而出现的又一个挑战。等级结构可以通过将技能破碎成原始技能来帮助降低这种搜索和储存的复杂性。在本文件中,我们引入了等级性试验和错误算法,它使用等级行为模型学习多种技能,并利用它们使机器人在物理世界迅速适应新任务。我们显示,技能的等级分级变异使机器人能够学习更复杂的行为,同时不断学习可移动的技能。六元机器人的实验可以帮助降低这种搜索和存储的复杂性,通过将技能转换成原始技能来降低这种复杂性。在本文中,我们引入了等级化的试验和错误算法,用一种最具有挑战性的操作力的操作能力来降低世界20 % 。我们的方法在模型模拟中解决了最困难的操作失败,同时用比不到20 % 。