In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available. We call this counterpart as the approximate value iteration (AVI) scheme. Neural networks are often used as function approximators, in order to counter Bellman's curse of dimensionality. In this paper, they are used to approximate the Bellman operator. Since neural networks are typically trained using sample data, errors and biases may be introduced. The design of AVI accounts for implementations with biased approximations of the Bellman operator and sampling errors. We present verifiable sufficient conditions under which AVI is stable (almost surely bounded) and converges to a fixed point of the approximate Bellman operator. To ensure the stability of AVI, we present three different yet related sets of sufficient conditions that are based on the existence of an appropriate Lyapunov function. These Lyapunov function based conditions are easily verifiable and new to the literature. The verifiability is enhanced by the fact that a recipe for the construction of the necessary Lyapunov function is also provided. We also show that the stability analysis of AVI can be readily extended to the general case of set-valued stochastic approximations. Finally, we show that AVI can also be used in more general circumstances, i.e., for finding fixed points of contractive set-valued maps.
翻译:在本文中,我们考虑的是价值迭代办法的随机迭代对应方,其中只有贝尔曼操作员的噪音和可能有偏差的近似值。我们将此对应方称为大致值迭代(AVI)方案。神经网络常常用作功能近似器,以对抗贝尔曼对维度的诅咒。在本文中,这些网络被用来接近贝尔曼操作员。由于神经网络通常使用抽样数据、错误和偏差来培训。设计 AVI 账户,用于实施贝尔曼操作员偏差近和抽样错误。我们提出了可核实的充足条件,使AVI 稳定(几乎肯定是受约束的),并汇合到接近贝尔曼操作员的固定点。为了确保AVI 的稳定性,我们提出了三组不同但相关的充分条件,这些条件基于适当的Lyapunov 功能的存在,这些条件很容易核实,对文献来说是新鲜的。由于建造必要的 Lyapunov 操作员的偏差近近近近和取样错误,我们提出了可以核查的充足性说明。我们也可以在总估定值中展示稳定性的精确性分析结果。我们也可以在最后显示对一般的定值的定值进行。我们的估价的精确的估价。