This paper studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed which is able to find iteratively near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness.
翻译:本文研究利用强化学习技术,对具有添加和倍增噪音的连续线性随机系统进行适应性最佳固定控制。根据政策迭代,提出了一种新的非政策强化学习算法,称为最不偏狭的政策迭代,它能够直接从输入/状态数据中找到适应性最佳固定控制问题的迭接最优化政策,而没有从初步可接受控制政策开始明确确定任何系统矩阵。拟议的最不偏差政策迭代提供的解决办法被证明在温和条件下,在概率一和概率一小块最佳解决方案中相汇而成。将拟议的算法应用于三重倒转的圆形模型证明了其可行性和有效性。