We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis and show that it improves the sample efficiency of both state-based and image-based RL. We perform infinite-width analysis of our architecture using the Neural Tangent Kernel and theoretically show that tuning the initial variance of the Fourier basis is equivalent to functional regularization of the learned deep network. That is, these learned Fourier features allow for adjusting the degree to which networks underfit or overfit different frequencies in the training data, and hence provide a controlled mechanism to improve the stability and performance of RL optimization. Empirically, this allows us to prioritize learning low-frequency functions and speed up learning by reducing networks' susceptibility to noise in the optimization process, such as during Bellman updates. Experiments on standard state-based and image-based RL benchmarks show clear benefits of our architecture over the baselines. Website at https://alexanderli.com/learned-fourier-features
翻译:我们提出一个简单的深层强化学习架构,将投入嵌入一个有学识的Fourier基础,并表明它提高了基于州和基于图像的RL的抽样效率。我们使用神经唐氏内尔对我们的建筑进行无限宽度分析,从理论上表明,调整Fourier基础的初始差异相当于对学习的深层网络进行功能正规化。这就是说,这些学习的Fourier功能允许调整网络在培训数据中不适应或过度适应不同频率的程度,从而提供一个控制机制,改善RL优化的稳定性和性能。 简而言之,这使我们能够优先学习低频率功能,加快学习速度,减少网络在优化过程中对噪音的易感性,例如在贝尔曼更新过程中。基于标准的州基和基于图像的RL基准实验显示了我们建筑在基线上的明显好处。 https://alexanderli.com/pain-fourier-featatures网站。