The success of deep learning in the computer vision and natural language processing communities can be attributed to training of very deep neural networks with millions or billions of parameters which can then be trained with massive amounts of data. However, similar trend has largely eluded training of deep reinforcement learning (RL) algorithms where larger networks do not lead to performance improvement. Previous work has shown that this is mostly due to instability during training of deep RL agents when using larger networks. In this paper, we make an attempt to understand and address training of larger networks for deep RL. We first show that naively increasing network capacity does not improve performance. Then, we propose a novel method that consists of 1) wider networks with DenseNet connection, 2) decoupling representation learning from training of RL, 3) a distributed training method to mitigate overfitting problems. Using this three-fold technique, we show that we can train very large networks that result in significant performance gains. We present several ablation studies to demonstrate the efficacy of the proposed method and some intuitive understanding of the reasons for performance gain. We show that our proposed method outperforms other baseline algorithms on several challenging locomotion tasks.
翻译:计算机视野和自然语言处理社区深层次学习的成功可归功于对具有数以百万或数十亿参数的非常深层神经网络的培训,这些参数随后可以用大量的数据进行培训。然而,类似的趋势在很大程度上没有在大型网络不能导致业绩改善的情况下进行深强化学习算法的培训。以前的工作表明,这主要是因为在利用大型网络培训深层RL代理人员期间不稳定。在本文件中,我们试图了解和解决深层RL大网络的培训问题。我们首先表明,天真地增加网络能力并不能改善业绩。然后,我们提出一种新颖的方法,其中包括:(1) 扩大DenseNet连接的网络,(2) 从培训RL中分离代表的学习,(3) 分散的培训方法,以减轻过度的问题。我们利用这一三重技术表明,我们可以培训非常大型的网络,从而取得显著的业绩收益。我们提出若干相关的研究,以证明拟议方法的有效性,并对取得业绩的原因有一些直觉的理解。我们提出的方法显示,在几项具有挑战性的流动任务上,我们提出的方法比其他基线算得更差。