This paper studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed which is able to iteratively find near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness.
翻译:本文研究利用强化学习技术,对具有添加和倍增噪音的连续线性随机系统进行适应性最佳固定控制。根据政策迭代,提出了一种新的非政策强化学习算法,称为最不偏狭的政策迭代,可以直接从输入/状态数据中迭接地找到适应性最佳固定控制问题的近最佳政策,而没有从最初可接受控制政策开始明确确定任何系统矩阵。拟议的最不偏差政策迭代提供的解决办法被证明在温和条件下,在概率一和概率一小块最佳解决方案附近汇合。将拟议的算法应用于三重倒转的圆形示例验证了其可行性和有效性。