A core operation in reinforcement learning (RL) is finding an action that is optimal with respect to a learned value function. This operation is often challenging when the learned value function takes continuous actions as input. We introduce deep radial-basis value functions (RBVFs): value functions learned using a deep network with a radial-basis function (RBF) output layer. We show that the maximum action-value with respect to a deep RBVF can be approximated easily and accurately. Moreover, deep RBVFs can represent any true value function owing to their support for universal function approximation. We extend the standard DQN algorithm to continuous control by endowing the agent with a deep RBVF. We show that the resultant agent, called RBF-DQN, significantly outperforms value-function-only baselines, and is competitive with state-of-the-art actor-critic algorithms.
翻译:强化学习的核心操作( RL) 正在找到一个与学习到的价值函数相比最合适的行动。 当学习到的价值函数以连续的行动作为输入时, 此操作往往具有挑战性。 我们引入了深弧基值函数( RBVF ): 我们引入了深弧基值函数( RBVF ) : 使用具有辐射基函数( RBF ) 输出层的深网络学习到的价值函数( RBF- DQN ) 。 我们显示, 深度RBF- DQN 的最大行动值可以很容易和准确地接近。 此外, 深度的 RBFF 能够代表任何真正的价值函数, 因为他们支持通用的功能近似值。 我们把标准 DQN 算法扩展为持续控制, 将代理用深重的 RBVF 。 我们显示, 结果的代理器叫做 RBF- DQN, 明显地超越了仅有价值的基线, 并且与最先进的演算法具有竞争力 。