分析接近值迭代值 (Analyzing Approximate Value Iteration Algorithms)

In this paper, we consider the stochastic iterative counterpart of the value iteration scheme wherein only noisy and possibly biased approximations of the Bellman operator are available. We call this counterpart as the approximate value iteration (AVI) scheme. Neural networks are often used as function approximators, in order to counter Bellman's curse of dimensionality. In this paper, they are used to approximate the Bellman operator. Since neural networks are typically trained using sample data, errors and biases may be introduced. The design of AVI accounts for implementations with biased approximations of the Bellman operator and sampling errors. We present verifiable sufficient conditions under which AVI is stable (almost surely bounded) and converges to a fixed point of the approximate Bellman operator. To ensure the stability of AVI, we present three different yet related sets of sufficient conditions that are based on the existence of an appropriate Lyapunov function. These Lyapunov function based conditions are easily verifiable and new to the literature. The verifiability is enhanced by the fact that a recipe for the construction of the necessary Lyapunov function is also provided. We also show that the stability analysis of AVI can be readily extended to the general case of set-valued stochastic approximations. Finally, we show that AVI can also be used in more general circumstances, i.e., for finding fixed points of contractive set-valued maps.

翻译：在本文中,我们考虑的是价值迭代办法的随机迭代对应方,其中只有贝尔曼操作员的噪音和可能有偏差的近似值。我们将此对应方称为大致值迭代(AVI)方案。神经网络常常用作功能近似器,以对抗贝尔曼对维度的诅咒。在本文中,这些网络被用来接近贝尔曼操作员。由于神经网络通常使用抽样数据、错误和偏差来培训。设计 AVI 账户,用于实施贝尔曼操作员偏差近和抽样错误。我们提出了可核实的充足条件,使AVI 稳定(几乎肯定是受约束的),并汇合到接近贝尔曼操作员的固定点。为了确保AVI 的稳定性,我们提出了三组不同但相关的充分条件,这些条件基于适当的Lyapunov 功能的存在,这些条件很容易核实,对文献来说是新鲜的。由于建造必要的 Lyapunov 操作员的偏差近近近近和取样错误,我们提出了可以核查的充足性说明。我们也可以在总估定值中展示稳定性的精确性分析结果。我们也可以在最后显示对一般的定值的定值进行。我们的估价的精确的估价。

相关内容

AVI

关注 0

在过去的二十多年里，会议吸引了来自世界各地的人机交互（HCI）的主要研究人员，提供了一个论坛来展示和传播HCI和用户界面的新技术成果、范式和愿景。由于先进的技术和用户交互的新可能性，AVI已经拓宽了它所涵盖的主题，但仍主要关注于新的视觉界面的概念、设计、实现和评估。官网链接：https://sites.google.com/dis.uniroma1.it/avi2018?utm_source=researchbib

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

策略梯度方法的算子视图，An operator view of policy gradient methods

专知会员服务

11+阅读 · 2020年6月23日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

123+阅读 · 2020年5月30日