In this paper, we address two key challenges in deep reinforcement learning setting, sample inefficiency and slow learning, with a dual NN-driven learning approach. In the proposed approach, we use two deep NNs with independent initialization to robustly approximate the action-value function in the presence of image inputs. In particular, we develop a temporal difference (TD) error-driven learning approach, where we introduce a set of linear transformations of the TD error to directly update the parameters of each layer in the deep NN. We demonstrate theoretically that the cost minimized by the error-driven learning (EDL) regime is an approximation of the empirical cost and the approximation error reduces as learning progresses, irrespective of the size of the network. Using simulation analysis, we show that the proposed methods enables faster learning and convergence and requires reduced buffer size (thereby increasing the sample efficiency).
翻译:在本文中,我们探讨了在深层强化学习设置、低效率和缓慢学习方面的两个主要挑战,采用了双重的NN驱动学习方法。在拟议办法中,我们使用两个独立初始化的深度非驻地单位,在图像输入的情况下,强力地近似行动价值功能。特别是,我们开发了时间差异(TD)错误驱动的学习方法,我们引入了一套TD错误的线性转换,以直接更新深度非驻地单位的每个层次的参数。我们从理论上表明,错误驱动学习(EDL)制度的成本最小化是经验成本的近似值,而近似错误随着学习的进展而减少,而不论网络的规模如何。我们通过模拟分析表明,拟议的方法可以加快学习和趋同速度,需要减少缓冲规模(从而提高抽样效率 ) 。