*从强化学习到最佳控制:顺序决策的统一框架

Powell W B . From Reinforcement Learning to Optimal Control: A unified framework for sequential decisions[J]. 2019.

There are over 15 distinct communities that work in the general area of sequential decisions and information, often referred to as decisions under uncertainty or stochastic optimization. We focus on two of the most important fields: stochastic optimal control, with its roots in deterministic optimal control, and reinforcement learning, with its roots in Markov decision processes. Building on prior work, we describe a unified framework that covers all 15 different communities, and note the strong parallels with the modeling framework of stochastic optimal control. By contrast, we make the case that the modeling framework of reinforcement learning, inherited from discrete Markov decision processes, is quite limited. Our framework (and that of stochastic control) is based on the core problem of optimizing over policies. We describe four classes of policies that we claim are universal, and show that each of these two fields have, in their own way, evolved to include examples of each of these four classes.

有超过15个不同的社区在顺序决策和信息的一般区域中工作,通常称为不确定性或随机优化下的决策。 我们关注两个最重要的领域:随机最优控制(其根源是确定性最优控制)和强化学习(其根源于马尔可夫决策过程)。 在以前的工作的基础上,我们描述了一个涵盖所有15个不同社区的统一框架,并注意到与随机最优控制的建模框架有很强的相似性。 相比之下,我们认为从离散的马尔可夫决策过程继承的强化学习的建模框架非常有限。 我们的框架(以及随机控制框架)基于优化策略的核心问题。 我们描述了四类我们认为具有普遍性的政策,并显示​​出这两个领域中的每一个都以自己的方式演变为包括这四个类中的每一个的示例。

发布于 2020-08-04 15:56