Q-learning is one of the most well-known Reinforcement Learning algorithms. There have been tremendous efforts to develop this algorithm using neural networks. Bootstrapped Deep Q-Learning Network is amongst them. It utilizes multiple neural network heads to introduce diversity into Q-learning. Diversity can sometimes be viewed as the amount of reasonable moves an agent can take at a given state, analogous to the definition of the exploration ratio in RL. Thus, the performance of Bootstrapped Deep Q-Learning Network is deeply connected with the level of diversity within the algorithm. In the original research, it was pointed out that a random prior could improve the performance of the model. In this article, we further explore the possibility of replacing priors with noise and sample the noise from a Gaussian distribution to introduce more diversity into this algorithm. We conduct our experiment on the Atari benchmark and compare our algorithm to both the original and other related algorithms. The results show that our modification of the Bootstrapped Deep Q-Learning algorithm achieves significantly higher evaluation scores across different types of Atari games. Thus, we conclude that replacing priors with noise can improve Bootstrapped Deep Q-Learning's performance by ensuring the integrity of diversities.
翻译:Q- 学习是已知最著名的加强学习算法之一。 已经做出了巨大的努力来开发使用神经网络的算法。 启动深Q- 学习网络是其中之一。 它使用多个神经网络头将多样性引入Q- 学习。 有时可以将Q- 学习视为代理人在特定状态上可以采取的合理行动的数量, 类似于RL 中勘探比率的定义。 因此, 启动深Q- 学习网络的性能与算法的多样性水平有着密切的联系。 在最初的研究中, 发现随机的先行可以改善模型的性能。 在本文中, 我们进一步探索以噪音取代先前的神经网络头, 并抽样从高山分布中发出的噪音, 将更多的多样性引入到这个算法中。 我们用阿塔里基准进行实验, 并将我们的算法与原始算法和其他相关算法进行比较。 因此, 我们修改的深Q- 学习算算算算算法的性能与不同类型Atari 游戏中的评估分数要高得多。 因此, 我们的结论是, 将先前的性性能取代升级改进的BOI- 。