SERRL: 样本高效组合强化学习 (SEERL: Sample Efficient Ensemble Reinforcement Learning)

Ensemble learning is a very prevalent method employed in machine learning. The relative success of ensemble methods is attributed to their ability to tackle a wide range of instances and complex problems that require different low-level approaches. However, ensemble methods are relatively less popular in reinforcement learning owing to the high sample complexity and computational expense involved in obtaining a diverse ensemble. We present a novel training and model selection framework for model-free reinforcement algorithms that use ensembles of policies obtained from a single training run. These policies are diverse in nature and are learned through directed perturbation of the model parameters at regular intervals. We show that learning and selecting an adequately diverse set of policies is required for a good ensemble while extreme diversity can prove detrimental to overall performance. Selection of an adequately diverse set of policies is done through our novel policy selection framework. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state-of-the-art (SOTA) scores in Atari 2600 and Mujoco.

翻译：综合学习是机器学习中的一种非常普遍的方法,混合方法的相对成功可归因于它们有能力处理需要不同低层次方法的各种实例和复杂问题,然而,由于在获得多种组合时需要大量抽样的复杂性和计算费用,在强化学习方面,混合方法相对不那么受欢迎。我们为使用从单一培训运行中获得的政策组合的无模型强化算法提出了一个新的培训和模式选择框架。这些政策的性质多种多样,通过定期直接对模型参数的渗透来学习。我们表明,需要学习和选择一套充分多样化的政策,才能形成良好的组合,而极端多样性则可能不利于总体业绩。通过我们的新的政策选择框架来选择一套充分多样化的政策。我们评估我们关于挑战离散和连续控制任务的方法,并讨论各种组合战略。我们的框架具有很高的样本效率,计算成本低廉,并且被视为超越了Atari00和Mujoco的状态-艺术分数。