Deep reinforcement learning has achieved significant success in many decision-making tasks in various fields. However, it requires a large training time of dense neural networks to obtain a good performance. This hinders its applicability on low-resource devices where memory and computation are strictly constrained. In a step towards enabling deep reinforcement learning agents to be applied to low-resource devices, in this work, we propose for the first time to dynamically train deep reinforcement learning agents with sparse neural networks from scratch. We adopt the evolution principles of dynamic sparse training in the reinforcement learning paradigm and introduce a training algorithm that optimizes the sparse topology and the weight values jointly to dynamically fit the incoming data. Our approach is easy to be integrated into existing deep reinforcement learning algorithms and has many favorable advantages. First, it allows for significant compression of the network size which reduces the memory and computation costs substantially. This would accelerate not only the agent inference but also its training process. Second, it speeds up the agent learning process and allows for reducing the number of required training steps. Third, it can achieve higher performance than training the dense counterpart network. We evaluate our approach on OpenAI gym continuous control tasks. The experimental results show the effectiveness of our approach in achieving higher performance than one of the state-of-art baselines with a 50\% reduction in the network size and floating-point operations (FLOPs). Moreover, our proposed approach can reach the same performance achieved by the dense network with a 40-50\% reduction in the number of training steps.
翻译:深入强化学习在各个领域的许多决策任务中取得了巨大成功,然而,这需要大量密集神经网络的密集神经网络培训时间,才能取得良好的业绩。这妨碍了其在记忆和计算严格受限的低资源设备上的适用性。在使深强化学习代理机构能够应用到低资源设备上的一个步骤中,在这项工作中,我们首次提议从零开始,用稀有的神经网络对深度强化学习代理机构进行动态培训。我们采纳了强化学习模式中动态稀少培训的演变原则,并引入了一种培训算法,使稀薄的表层和重量值共同优化,以动态方式适应不断收到的数据。我们的方法很容易被纳入现有的深度强化学习算法,并具有许多有利优势。首先,它允许大量压缩网络规模,从而大大降低记忆和计算成本。这将不仅加快动力推论,而且还加快其培训过程。第二,它加快了代理学习过程,并允许减少所需培训步骤的数量。第三,它能够比培训密集的对应网络取得更高的业绩。我们评估了OTIA连续控制步骤的方法,我们评估了现有的深度升级的深度操作,在50级网络上比升级的基线上实现了50的升级。实验结果。