A key theme in the past decade has been that when large neural networks and large datasets combine they can produce remarkable results. In deep reinforcement learning (RL), this paradigm is commonly made possible through experience replay, whereby a dataset of past experiences is used to train a policy or value function. However, unlike in supervised or self-supervised learning, an RL agent has to collect its own data, which is often limited. Thus, it is challenging to reap the benefits of deep learning, and even small neural networks can overfit at the start of training. In this work, we leverage the tremendous recent progress in generative modeling and propose Synthetic Experience Replay (SynthER), a diffusion-based approach to arbitrarily upsample an agent's collected experience. We show that SynthER is an effective method for training RL agents across offline and online settings. In offline settings, we observe drastic improvements both when upsampling small offline datasets and when training larger networks with additional synthetic data. Furthermore, SynthER enables online agents to train with a much higher update-to-data ratio than before, leading to a large increase in sample efficiency, without any algorithmic changes. We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data.
翻译:过去十年的一个关键主题是,当大型神经网络和大型数据集合并起来时,当大型神经网络和大型数据集能够产生显著的成果时,它们就会产生显著的成果。在深层强化学习(RL)中,这种范例通常是通过经验重放而得以实现的,即利用过去经验的数据集来培训政策或价值功能。然而,与监督或自我监督的学习不同,RL代理必须收集自己的数据,而这些数据往往有限。因此,在离线环境中,获取深层学习的好处,甚至小型神经网络在培训开始时也能过度使用。此外,SynthER使在线代理能够利用最近在基因模型建模方面的巨大进展,并提出合成经验重现(Synther),这是对任意上层一个抽样的推广方法,用来培训一个代理人所收集的经验。我们表明,SynthER是培训RL代理的有效方法,在离线和在线环境中,当采集小型离线数据集时,以及在培训更多的合成数据时,我们观察到了巨大的改进。此外,SynthER使在线代理能够用比以前高得多的更新到数据比数据的比例,从而使得我们能够全面学习任何大型的模型数据。</s>