This paper presents a control-theoretic framework which stably combines optimal feedback policies with online learning for control of uncertain nonlinear systems. Given unknown parameters within a bounded range, the resulting adaptive control laws guarantee convergence of the closed-loop system to the state of zero cost. The proposed framework is able to employ the certainty equivalence principle when designing optimal policies and value functions by online adjustment of the learning rate - a mechanism needed to guarantee stable learning and control. The approach is demonstrated on the familiar mountain car problem, where it is shown to yield near-optimal behavior despite the presence of parametric uncertainty.
翻译:本文件提出了一个控制理论框架,将最佳反馈政策与控制不确定的非线性系统的在线学习稳步地结合起来。鉴于在封闭范围内的未知参数,由此产生的适应性控制法保证封闭环系统与零成本状态相融合。拟议框架能够在通过在线调整学习率来设计最佳政策和价值功能时采用确定性等值原则,这是保证稳定学习和控制的一种必要机制。这一方法在熟悉的山地汽车问题上得到了证明,在那里,尽管存在参数不确定性,但仍显示出近乎最佳的行为。