Deep reinforcement learning continues to show tremendous potential in achieving task-level autonomy, however, its computational and energy demands remain prohibitively high. In this paper, we tackle this problem by applying quantization to reinforcement learning. To that end, we introduce a novel Reinforcement Learning (RL) training paradigm, \textit{ActorQ}, to speed up actor-learner distributed RL training. \textit{ActorQ} leverages 8-bit quantized actors to speed up data collection without affecting learning convergence. Our quantized distributed RL training system, \textit{ActorQ}, demonstrates end-to-end speedups \blue{between 1.5 $\times$ and 5.41$\times$}, and faster convergence over full precision training on a range of tasks (Deepmind Control Suite) and different RL algorithms (D4PG, DQN). Furthermore, we compare the carbon emissions (Kgs of CO2) of \textit{ActorQ} versus standard reinforcement learning \blue{algorithms} on various tasks. Across various settings, we show that \textit{ActorQ} enables more environmentally friendly reinforcement learning by achieving \blue{carbon emission improvements between 1.9$\times$ and 3.76$\times$} compared to training RL-agents in full-precision. We believe that this is the first of many future works on enabling computationally energy-efficient and sustainable reinforcement learning. The source code is available here for the public to use: \url{https://github.com/harvard-edge/QuaRL}.
翻译:深加学习继续显示实现任务层面自主的巨大潜力,然而,它的计算和能源需求仍然高得令人望而却步。在本文中,我们通过对强化学习应用定量化来解决这个问题。为此,我们推出一个新的强化学习(RL)培训范式,\ textit{Actor}{Actor{},以加快分发的演员-利纳培训。\ textit{Actor{} 杠杆化8位四分化的行为体在不影响学习趋同的情况下加快数据收集速度。我们量化分布的RL培训系统,\ textit{Actor},展示了终端到终端速度的加速=Blue{Blue{1.5美元到5.41美元}。为此,我们引入了一个新的强化(深控控制套)和不同的RL算法(D4PG,DQN)的精密培训。此外,我们比较了\ textlexxxxxxx 源的碳排放(Kggs) (Ogs flent) 学习\ colviewlemental redustrual_cal train train train train train_lievilding rial_lishal_lishild_liviews) 在各种环境上,我们展示了当前学习学习的智能学习了多项校税 校税 校内, 我们显示更多的校税 校内,我们展示了更多校税 校税的校税 校税 校税 校税 校税 校税 校的升级的升级。