Quality-Diversity (QD) algorithms are a well-known approach to generate large collections of diverse and high-quality policies. However, QD algorithms are also known to be data-inefficient, requiring large amounts of computational resources and are slow when used in practice for robotics tasks. Policy evaluations are already commonly performed in parallel to speed up QD algorithms but have limited capabilities on a single machine as most physics simulators run on CPUs. With recent advances in simulators that run on accelerators, thousands of evaluations can performed in parallel on single GPU/TPU. In this paper, we present QDax, an implementation of MAP-Elites which leverages massive parallelism on accelerators to make QD algorithms more accessible. We first demonstrate the improvements on the number of evaluations per second that parallelism using accelerated simulators can offer. More importantly, we show that QD algorithms are ideal candidates and can scale with massive parallelism to be run at interactive timescales. The increase in parallelism does not significantly affect the performance of QD algorithms, while reducing experiment runtimes by two factors of magnitudes, turning days of computation into minutes. These results show that QD can now benefit from hardware acceleration, which contributed significantly to the bloom of deep learning.
翻译:质量差异算法(QD)是一种广为人知的方法,可以产生大量多样和高质量的政策集集。然而,QD算法也众所周知,数据效率低,需要大量计算资源,实际操作机器人任务时速度缓慢。政策评价通常在加速QD算法的同时进行,但作为大多数物理模拟器在CPU上运行的物理模拟器,单一机器的能力有限。由于最近在模拟器加速器上运行的模拟器上的进展,数以千计的评价可以同时进行。在本文件中,我们介绍QDax,执行MAP-Elites在加速器上利用大规模平行法使QD算法更容易被利用。我们首先展示了利用加速模拟器进行的每秒一次评价数量的改进。更重要的是,我们显示QD算法是理想的人选,可以与大规模平行制成一个互动的时标。我们介绍QD算法的增长现在不会大大影响加速器的运行,而快速递增的QD算法则会大大地影响加速计算结果。