Deep reinforcement learning has achieved great success in various fields with its super decision-making ability. However, the policy learning process requires a large amount of training time, causing energy consumption. Inspired by the redundancy of neural networks, we propose a lightweight parallel training framework based on neural network compression, AcceRL, to accelerate the policy learning while ensuring policy quality. Specifically, AcceRL speeds up the experience collection by flexibly combining various neural network compression methods. Overall, the AcceRL consists of five components, namely Actor, Learner, Compressor, Corrector, and Monitor. The Actor uses the Compressor to compress the Learner's policy network to interact with the environment. And the generated experiences are transformed by the Corrector with Off-Policy methods, such as V-trace, Retrace and so on. Then the corrected experiences are feed to the Learner for policy learning. We believe this is the first general reinforcement learning framework that incorporates multiple neural network compression techniques. Extensive experiments conducted in gym show that the AcceRL reduces the time cost of the actor by about 2.0 X to 4.13 X compared to the traditional methods. Furthermore, the AcceRL reduces the whole training time by about 29.8% to 40.3% compared to the traditional methods while keeps the same policy quality.
翻译:深度强化学习在多个领域取得了巨大成功,具有超强的决策能力。然而,政策学习过程需要大量培训时间,从而导致能源消耗。在神经网络冗余的启发下,我们提议了一个基于神经网络压缩的轻量平行培训框架,以加速政策学习,同时确保政策质量。具体地说,ACCERL通过灵活地结合各种神经网络压缩方法,加快了经验收集速度。总体而言,ACCERL由五个组成部分组成,即Acor、Lander、Compressor、Compressor、Recroor和Monitor。Acer使用压缩器压缩学习者的政策网络与环境互动。在神经网络压缩的启发下,我们提出了一个基于神经网络压缩、AcceRL的轻量的平行培训框架。所产生的经验由校正者以非政策方法(如V-trace、Retrace等)来转变。随后,校正的经验被反馈给Ler用于政策学习。我们认为,这是第一个包含多个神经网络压缩技术的总体强化学习框架。在健身房中进行的广泛实验显示,AceRL降低了演员的时间成本,大约为40.3%,而传统的Arc将传统的A.13比传统方法降低整个Axxxxxxxxxx。