This paper studies the optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed which is able to iteratively find near-optimal policies of the optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness.
翻译:本文研究利用强化学习技术对具有添加和倍增噪音的连续线性随机系统的最佳固定控制。根据政策迭代,提出了一种新的非政策强化学习算法,称为最不偏狭的政策迭代,可以直接从输入/状态数据中迭接找到最优化的固定控制问题近乎最佳的政策,而不必从最初可接受的控制政策开始,明确确定任何系统矩阵。拟议的最不偏差的政策迭代所给出的解决方案被证明在温和条件下,在概率一和概率一的一小块最佳解决方案中相汇而成。将拟议的算法应用于三重倒转的圆形示例验证了其可行性和有效性。