在非线性持续的国家空间问题中加强学习 (On the Convergence of Reinforcement Learning in Nonlinear Continuous State Space Problems)

We consider the problem of Reinforcement Learning for nonlinear stochastic dynamical systems. We show that in the RL setting, there is an inherent ``Curse of Variance" in addition to Bellman's infamous ``Curse of Dimensionality", in particular, we show that the variance in the solution grows factorial-exponentially in the order of the approximation. A fundamental consequence is that this precludes the search for anything other than ``local" feedback solutions in RL, in order to control the explosive variance growth, and thus, ensure accuracy. We further show that the deterministic optimal control has a perturbation structure, in that the higher order terms do not affect the calculation of lower order terms, which can be utilized in RL to get accurate local solutions.

翻译：我们考虑的是非线性随机动态系统的强化学习问题。我们发现,在RL设置中,除了Bellman的臭名昭著的“尺寸曲线”外,还有内在的“差异诅咒”问题,特别是,我们表明,解决方案的差异会以近似值的先后顺序增长。一个根本后果是,这排除了在RL中寻找“本地”反馈解决方案以外的任何东西,以便控制爆炸性差异增长,从而确保准确性。我们进一步表明,确定性的最佳控制有一个干扰结构,因为更高的顺序条件不会影响下级条件的计算,而低级条件在RL中可以用来获得准确的本地解决方案。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

深度强化学习方法及其在经济学中的应用综述，Comprehensive Review of Deep Reinforcement Learning Methods and Applicationsin Economic

专知会员服务

52+阅读 · 2020年4月7日

MIT-深度学习Deep Learning State of the Art in 2020，87页ppt

专知会员服务

62+阅读 · 2020年2月17日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日