The Reinforcement Learning field is strong on achievements and weak on reapplication; a computer playing GO at a super-human level is still terrible at Tic-Tac-Toe. This paper asks whether the method of training networks improves their generalization. Specifically we explore core quality diversity algorithms, compare against two recent algorithms, and propose a new algorithm to deal with shortcomings in existing methods. Although results of these methods are well below the performance hoped for, our work raises important points about the choice of behavior criterion in quality diversity, the interaction of differential and evolutionary training methods, and the role of offline reinforcement learning and randomized learning in evolutionary search.
翻译:强化学习领域在取得成就方面非常强大,在重新应用方面却很薄弱;即使是一个能够以超人水平玩GO的计算机,在玩井字棋方面也很糟糕。本文探讨了训练网络是否可以改善其泛化能力的方法。具体地,我们探索了核心多样质量算法,并与两种最近的算法进行了比较,提出了一种新的算法来解决现有方法的缺点。虽然这些方法的结果远低于预期的性能,但我们的工作提出了有关多样质量中行为准则选择、不同训练方法之间的相互作用,以及离线强化学习和随机学习在进化搜索中的作用等重要问题。