以质量-多样性优化发现不受监督的行为 (Unsupervised Behaviour Discovery with Quality-Diversity Optimisation)

Quality-Diversity algorithms refer to a class of evolutionary algorithms designed to find a collection of diverse and high-performing solutions to a given problem. In robotics, such algorithms can be used for generating a collection of controllers covering most of the possible behaviours of a robot. To do so, these algorithms associate a behavioural descriptor to each of these behaviours. Each behavioural descriptor is used for estimating the novelty of one behaviour compared to the others. In most existing algorithms, the behavioural descriptor needs to be hand-coded, thus requiring prior knowledge about the task to solve. In this paper, we introduce: Autonomous Robots Realising their Abilities, an algorithm that uses a dimensionality reduction technique to automatically learn behavioural descriptors based on raw sensory data. The performance of this algorithm is assessed on three robotic tasks in simulation. The experimental results show that it performs similarly to traditional hand-coded approaches without the requirement to provide any hand-coded behavioural descriptor. In the collection of diverse and high-performing solutions, it also manages to find behaviours that are novel with respect to more features than its hand-coded baselines. Finally, we introduce a variant of the algorithm which is robust to the dimensionality of the behavioural descriptor space.

翻译：质量- 多样性算法指的是一组进化算法, 旨在为特定问题找到多种高效解决方案的集合。在机器人中, 这种算法可用于生成一个控制器集, 涵盖机器人可能发生的大多数行为。要做到这一点, 这些算法将行为描述符与每种行为联系起来。每个行为描述符都用来估计一种行为与其他行为相比的新颖性。在大多数现有的算法中, 行为描述符需要手工编码, 从而需要事先了解解决任务的知识。在本文中, 我们引入了: 自主机器人实现其能力, 一种使用维度减少技术自动学习基于原始感官数据的行为描述符的算法。这种算法的性能在模拟中被评估为三种机器人任务。实验结果显示, 它与传统的手码方法相似, 不需要提供任何手码的行为描述符。在收集多样性和高性能解决方案时, 我们还设法找到一些与比其强度的空间描述模型的变异性更近的行为, 从而引入了比强的硬度的硬度的硬度矩阵。最后, 实验结果表明, 它表现了比强度的硬度的硬度的硬度的硬度解码。