Nonlinear control systems with partial information to the decision maker are prevalent in a variety of applications. As a step toward studying such nonlinear systems, this work explores reinforcement learning methods for finding the optimal policy in the nearly linear-quadratic regulator systems. In particular, we consider a dynamic system that combines linear and nonlinear components, and is governed by a policy with the same structure. Assuming that the nonlinear component comprises kernels with small Lipschitz coefficients, we characterize the optimization landscape of the cost function. Although the cost function is nonconvex in general, we establish the local strong convexity and smoothness in the vicinity of the global optimizer. Additionally, we propose an initialization mechanism to leverage these properties. Building on the developments, we design a policy gradient algorithm that is guaranteed to converge to the globally optimal policy with a linear rate.
翻译:非线性控制系统在决策者只有部分信息的情况下广泛应用于各种应用中。本文探讨了寻找近似线性二次调节器系统中最优策略的强化学习方法。特别是,我们考虑一个组合了线性和非线性部分的动态系统,并由一个具有相同结构的策略控制。假设非线性部分包含具有小利普西茨系数的核函数,我们刻画了成本函数的优化方向。尽管成本函数通常是非凸的,但我们确立了全局优化器附近的局部强凸性和平滑性。此外,我们提出了一种初始化机制来利用这些特性。基于这些发展,我们设计了一种策略梯度算法,该算法保证会以线性速率收敛到全局最优策略。