Randomized ensemble double Q-learning (REDQ) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC). To make REDQ more computationally efficient, we propose a method of improving computational efficiency called Dr.Q, which is a variant of REDQ that uses a small ensemble of dropout Q-functions. Our dropout Q-functions are simple Q-functions equipped with dropout connection and layer normalization. Despite its simplicity of implementation, our experimental results indicate that Dr.Q is doubly (sample and computationally) efficient. It achieved comparable sample efficiency with REDQ and much better computational efficiency than REDQ and comparable computational efficiency with that of SAC.
翻译:最近,在连续行动强化学习基准方面,随机组合式双重求学(REDQ)实现了最新的抽样效率。通过使用大型功能组合,可以实现这种优异的抽样效率。然而,REDQ比Soft Acor-Critic(SAC)等非组合式对应方的计算效率要低得多。为了使REDQ更具计算效率,我们提议了一种提高计算效率的方法,称为Dr.Q,这是REDQ的变种,它使用少量的辍学功能组合。我们的辍学功能是简单的、具有辍学连接和分层正常化功能的功能。尽管执行简洁,但我们的实验结果表明,Q博士具有双重(抽样和计算)效率,与REDQ的抽样效率相当,比REDQ的计算效率和与SAC的可比的计算效率要好得多。