We introduce a new closed-loop architecture for the online solution of approximate optimal control problems in the context of continuous-time systems. Specifically, we introduce the first algorithm that incorporates dynamic momentum in actor-critic structures to control continuous-time dynamic plants with an affine structure in the input. By incorporating dynamic momentum in our algorithm, we are able to accelerate the convergence properties of the closed-loop system, achieving superior transient performance compared to traditional gradient-descent based techniques. In addition, by leveraging the existence of past recorded data with sufficiently rich information properties, we dispense with the persistence of excitation condition traditionally imposed on the regressors of the critic and the actor. Given that our continuous-time momentum-based dynamics also incorporate periodic discrete-time resets that emulate restarting techniques used in the machine learning literature, we leverage tools from hybrid dynamical systems theory to establish asymptotic stability properties for the closed-loop system. We illustrate our results with a numerical example.
翻译:在连续时间系统中,我们引入了一种新的封闭环状结构,以在线解决近似最佳控制问题。 具体地说,我们引入了第一个算法,将行为者- 批评结构的动态动力纳入到输入中, 以控制连续时间动态的动态工厂, 其输入结构中有一个折线结构。 通过将动态动力纳入到我们的算法中, 我们能够加快闭环系统的趋同特性, 实现与传统的梯度- 日光技术相比的超强瞬间性性能。 此外, 通过利用过去记录的数据的存在, 以及足够丰富的信息属性, 我们避免了传统上对评论家和演员的递增者所强加的引力条件的持续存在。 鉴于我们的持续时间动力动态还包含定期的离散时间再置, 以效仿机器学习文献中使用的重新开动技术, 我们利用混合动态系统理论的工具, 来为闭环系系统建立无源稳定特性。 我们用一个数字例子来说明我们的结果。