Echo State Networks (ESNs) are a type of single-layer recurrent neural network with randomly-chosen internal weights and a trainable output layer. We prove under mild conditions that a sufficiently large Echo State Network (ESN) can approximate the value function of a broad class of stochastic and deterministic control problems. Such control problems are generally non-Markovian. We describe how the ESN can form the basis for novel (and computationally efficient) reinforcement learning algorithms in a non-Markovian framework. We demonstrate this theory with two examples. In the first, we use an ESN to solve a deterministic, partially observed, control problem which is a simple game we call `Bee World'. In the second example, we consider a stochastic control problem inspired by a market making problem in mathematical finance. In both cases we can compare the dynamics of the algorithms with analytic solutions to show that even after only a single reinforcement policy iteration the algorithms perform with reasonable skill.
翻译:Echo State Network (ESNs) 是一种单层经常性神经网络, 随机地选择内部重量和可训练的产出层。 我们证明, 在温和的条件下, 一个足够大的回声状态网络(ESN) 能够接近一大批随机性和确定性控制问题的价值功能。 这种控制问题一般都是非马尔科维安的。 我们描述 ESN 如何在非马尔科维安框架内形成新的(和计算效率的)强化学习算法的基础。 我们用两个例子来展示这一理论。 首先, 我们用 ESN 来解决一个确定性、 部分观察到的控制问题, 我们称之为“ Bee World ” 。 在第二个例子中, 我们考虑一个由数学融资中引起问题的市场引发的随机控制问题。 在这两个例子中, 我们可以将算法的动态与分析解决方案相比较, 以显示即使在单项加固政策之后, 算法仍然以合理的技能运作。