Recent advances in model-free deep reinforcement learning (DRL) show that simple model-free methods can be highly effective in challenging high-dimensional continuous control tasks. In particular, Truncated Quantile Critics (TQC) achieves state-of-the-art asymptotic training performance on the MuJoCo benchmark with a distributional representation of critics; and Randomized Ensemble Double Q-Learning (REDQ) achieves high sample efficiency that is competitive with state-of-the-art model-based methods using a high update-to-data ratio and target randomization. In this paper, we propose a novel model-free algorithm, Aggressive Q-Learning with Ensembles (AQE), which improves the sample-efficiency performance of REDQ and the asymptotic performance of TQC, thereby providing overall state-of-the-art performance during all stages of training. Moreover, AQE is very simple, requiring neither distributional representation of critics nor target randomization. The effectiveness of AQE is further supported by our extensive experiments, ablations, and theoretical results.
翻译:在无模型深度强化学习(DRL)方面最近取得的进展表明,简单的无模型方法在挑战高维连续控制任务方面可能非常有效,特别是,快速量控器(TQC)在MuJoCo基准上取得了最新水平的无线培训业绩,有批量的批评者代表;随机的组合双轨学习(REDQ)取得了高样本效率,与使用高更新数据比率和目标随机化的最先进模型方法相比具有竞争力。在本文中,我们提出了一个新型的无模式算法,即“与Ensembles(AQE)进行侵略性Q-学习”,它改进了REDQ的抽样效率业绩和TQC的无线性表现,从而提供了所有培训阶段的总体最新业绩。此外,AQE非常简单,既不需要批量的批评者代表,也不需要目标随机化。AQE的有效性得到我们广泛实验、列表和理论结果的进一步支持。