了解通过Greedy勘探在加强学习中深神经功能近似值 (Understanding Deep Neural Function Approximation in Reinforcement Learning via $ε$-Greedy Exploration)

This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL) with the $\epsilon$-greedy exploration under the online setting. This problem setting is motivated by the successful deep Q-networks (DQN) framework that falls in this regime. In this work, we provide an initial attempt on theoretical understanding deep RL from the perspective of function class and neural networks architectures (e.g., width and depth) beyond the "linear" regime. To be specific, we focus on the value based algorithm with the $\epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces, respectively, which aims at approximating an $\alpha$-smooth Q-function in a $d$-dimensional feature space. We prove that, with $T$ episodes, scaling the width $m = \widetilde{\mathcal{O}}(T^{\frac{d}{2\alpha + d}})$ and the depth $L=\mathcal{O}(\log T)$ of the neural network for deep RL is sufficient for learning with sublinear regret in Besov spaces. Moreover, for a two layer neural network endowed by the Barron space, scaling the width $\Omega(\sqrt{T})$ is sufficient. To achieve this, the key issue in our analysis is how to estimate the temporal difference error under deep neural function approximation as the $\epsilon$-greedy exploration is not enough to ensure "optimism". Our analysis reformulates the temporal difference error in an $L^2(\mathrm{d}\mu)$-integrable space over a certain averaged measure $\mu$, and transforms it to a generalization problem under the non-iid setting. This might have its own interest in RL theory for better understanding $\epsilon$-greedy exploration in deep RL.

翻译：本文对强化学习中的深神经功能近似值( RL) 进行理论研究。具体地说, 我们侧重于基于数值的算法, 通过深度( 和两层) 的深度( 美元) 探索。这个问题的设置是由这个制度下的成功的深Q网络( DQN) 框架驱动的。在这项工作中, 我们初步尝试从功能类和神经网络架构( 例如, 宽度和深度) 的角度来理解深度 RLL( 线性) 制度之外的深度( 宽度 ) 。具体地说, 我们的深度( 美元) 以深度( 美元) 的深度( 双层) 利息( 美元) 的深度( 双层) 深度( 双层( 双层) 深度( 美元) 的深度( 美元) 内层( R) 内, 我们的深度( L) 度( 美元) 的深度( 平面( 美元) 平面( ) 平面( 美元) 平面) 平面( 平面) 平面( 平面) 平面( 平面) 平面) 使我们的深度( 平面) 平面( ) 平面( 平面) 平面) 平面( 平面) ) 平面) 平面) 使深度( 平面( 平面( ) 平面) ) 平面( 平面) 平面( 平面) 平面( 分析( 平面) 的) 平面( ) 平面( ) 平面( ) 平面) ) ) 平面( 平面( ) ( ) ) ( ) ) ) ) ) ) ) ( ) ) ( ) ( ) ( 平面( ) ( ) ( ) ( ) 平面) ( 平面) ) ( ) ) ) ( ) ( ) ) ( ) ( ) ( ) ) ( 平面( ) ( 平面( ) ) ( 平面) ( ) ( ) ( )