The Deep Q-Network proposed by Mnih et al. [2015] has become a benchmark and building point for much deep reinforcement learning research. However, replicating results for complex systems is often challenging since original scientific publications are not always able to describe in detail every important parameter setting and software engineering solution. In this paper, we present results from our work reproducing the results of the DQN paper. We highlight key areas in the implementation that were not covered in great detail in the original paper to make it easier for researchers to replicate these results, including termination conditions and gradient descent algorithms. Finally, we discuss methods for improving the computational performance and provide our own implementation that is designed to work with a range of domains, and not just the original Arcade Learning Environment [Bellemare et al., 2013].
翻译:Mnih等人提议的深Q网络[2015]已成为深入强化学习研究的基准和建设点,然而,为复杂系统复制成果往往具有挑战性,因为原始科学出版物并不总是能够详细描述每一个重要的参数设置和软件工程解决方案。本文介绍了我们复制DQN文件结果的工作成果。我们强调了原始文件未详细涵盖的执行关键领域,以使研究人员更容易复制这些成果,包括终止条件和梯度下行算法。最后,我们讨论了改进计算性能的方法,并提供了我们自己设计用于一系列领域,而不仅仅是原始的Arcade学习环境[Bellemare等人,2013年]的实施。