An Echo State Network (ESN) is a type of single-layer recurrent neural network with randomly-chosen internal weights and a trainable output layer. We prove under mild conditions that a sufficiently large Echo State Network can approximate the value function of a broad class of stochastic and deterministic control problems. Such control problems are generally non-Markovian. We describe how the ESN can form the basis for novel and computationally efficient reinforcement learning algorithms in a non-Markovian framework. We demonstrate this theory with two examples. In the first, we use an ESN to solve a deterministic, partially observed, control problem which is a simple game we call `Bee World'. In the second example, we consider a stochastic control problem inspired by a market making problem in mathematical finance. In both cases we can compare the dynamics of the algorithms with analytic solutions to show that even after only a single reinforcement policy iteration the algorithms arrive at a good policy.
翻译:回声国家网络(ESN)是一种单层经常性神经网络,有随机选择的内部重量和可训练的产出层。我们证明,在温和的条件下,一个足够大的回声国家网络可以接近一大批随机和决定性控制问题的价值功能。这种控制问题一般不是马尔科文。我们描述了ESN如何在非马尔科维安框架内形成新型和计算高效的增强学习算法的基础。我们用两个例子来证明这一理论。在第一个例子中,我们用ESN来解决一个确定性、部分观测到的控制问题,这是一个简单游戏,我们称之为“ Bee World ” 。在第二个例子中,我们考虑一个由在数学融资中制造问题的市场引发的随机控制问题。在这两个例子中,我们可以将算法的动态与分析解决办法相比较,以表明即使只经过单一的加力政策,算法也达到了一个良好的政策。