Quality-Diversity algorithms provide efficient mechanisms to generate large collections of diverse and high-performing solutions, which have shown to be instrumental for solving downstream tasks. However, most of those algorithms rely on a behavioural descriptor to characterise the diversity that is hand-coded, hence requiring prior knowledge about the considered tasks. In this work, we introduce Relevance-guided Unsupervised Discovery of Abilities; a Quality-Diversity algorithm that autonomously finds a behavioural characterisation tailored to the task at hand. In particular, our method introduces a custom diversity metric that leads to higher densities of solutions near the areas of interest in the learnt behavioural descriptor space. We evaluate our approach on a simulated robotic environment, where the robot has to autonomously discover its abilities based on its full sensory data. We evaluated the algorithms on three tasks: navigation to random targets, moving forward with a high velocity, and performing half-rolls. The experimental results show that our method manages to discover collections of solutions that are not only diverse, but also well-adapted to the considered downstream task.
翻译:质量多样性算法提供了高效机制,以产生大量多样和高性能的解决方案集,这些解决方案已证明有助于解决下游任务。然而,大多数这些算法依赖一种行为描述符来描述手工编码的多样性,因此需要事先了解所考虑的任务。在这项工作中,我们引入了相关性引导的、不受监督的概率发现;质量多样性算法,自主地发现一种适合当前任务的行为特征。特别是,我们的方法引入了一种习惯多样性衡量标准,导致在所学行为描述空间感兴趣的地区附近出现更高的解决方案密度。我们评估了我们对于模拟机器人环境的做法,在这种环境中,机器人必须凭借其完整的感官数据自主地发现自己的能力。我们评估了三种任务的算法:向随机目标导航,以高速前进,以及进行半轨。实验结果显示,我们的方法能够发现不仅多样化,而且完全适应深思的下游任务。