Reinforcement learning (RL) research focuses on general solutions that can be applied across different domains. This results in methods that RL practitioners can use in almost any domain. However, recent studies often lack the engineering steps ("tricks") which may be needed to effectively use RL, such as reward shaping, curriculum learning, and splitting a large task into smaller chunks. Such tricks are common, if not necessary, to achieve state-of-the-art results and win RL competitions. To ease the engineering efforts, we distill descriptions of tricks from state-of-the-art results and study how well these tricks can improve a standard deep Q-learning agent. The long-term goal of this work is to enable combining proven RL methods with domain-specific tricks by providing a unified software framework and accompanying insights in multiple domains.
翻译:强化学习(RL)研究侧重于可在不同领域应用的一般解决方案,这导致RL实践者几乎可以在任何领域使用的方法。然而,最近的研究往往缺乏有效利用RL可能需要的工程步骤(“tricks ”),如奖赏制、课程学习和将大任务分成小块。这些技巧是常见的,如果没有必要的话,可以实现最先进的成果并赢得RL竞赛。为了方便工程工作,我们从最新的成果中提取技巧的描述,研究这些技巧如何改善标准的深层次Q学习工具。这项工作的长期目标是通过提供统一的软件框架和多个领域的相关洞察力,将已经证明的RL方法与特定领域的技巧结合起来。