The echo state network (ESN) is a special type of recurrent neural networks for processing the time-series dataset. However, limited by the strong correlation among sequential samples of the agent, ESN-based policy control algorithms are difficult to use the recursive least squares (RLS) algorithm to update the ESN's parameters. To solve this problem, we propose two novel policy control algorithms, ESNRLS-Q and ESNRLS-Sarsa. Firstly, to reduce the correlation of training samples, we use the leaky integrator ESN and the mini-batch learning mode. Secondly, to make RLS suitable for training ESN in mini-batch mode, we present a new mean-approximation method for updating the RLS correlation matrix. Thirdly, to prevent ESN from over-fitting, we use the L1 regularization technique. Lastly, to prevent the target state-action value from overestimation, we employ the Mellowmax method. Simulation results show that our algorithms have good convergence performance.
翻译:反响状态网络(ESN)是用于处理时间序列数据集的一种特殊类型的经常性神经网络。然而,由于该物剂序列样本之间密切的关联性而受到限制,基于ESN的政策控制算法很难使用循环最小平方算法来更新ESN的参数。为了解决这个问题,我们提议了两种新的政策控制算法,即ESNRLS-Q和ESNRLS-Sarsa。首先,为了减少培训样本的关联性,我们使用泄漏的整合器ESN和微型批量学习模式。第二,为了使RLS适合培训ESN的微型批量模式,我们提出了一种用于更新RLS相关矩阵的新的平均代表法。第三,为了防止EN值过度配置,我们使用L1正规化技术。最后,为了防止目标国家行动值被高估,我们使用Mellowmax方法。模拟结果显示我们的算法具有良好的趋同性。