Neural ordinary differential equations (Neural ODEs) model continuous time dynamics as differential equations parametrized with neural networks. Thanks to their modeling flexibility, they have been adopted for multiple tasks where the continuous time nature of the process is specially relevant, as in system identification and time series analysis. When applied in a control setting, it is possible to adapt their use to approximate optimal nonlinear feedback policies. This formulation follows the same approach as policy gradients in reinforcement learning, covering the case where the environment consists of known deterministic dynamics given by a system of differential equations. The white box nature of the model specification allows the direct calculation of policy gradients through sensitivity analysis, avoiding the inexact and inefficient gradient estimation through sampling. In this work we propose the use of a neural control policy posed as a Neural ODE to solve general nonlinear optimal control problems while satisfying both state and control constraints, which are crucial for real world scenarios. Since the state feedback policy partially modifies the model dynamics, the whole space phase of the system is reshaped upon the optimization. This approach is a sensible approximation to the historically intractable closed loop solution of nonlinear control problems that efficiently exploits the availability of a dynamical system model.
翻译:神经普通差异方程式( Neural CODEs) 模型连续时间动态, 与神经网络相匹配的差别方程式。 由于它们具有模型灵活性, 它们被采用用于多个任务, 过程的连续时间性质特别相关, 如系统识别和时间序列分析。 当在控制环境中应用时, 可以将其使用调整为最接近最佳的非线性反馈政策。 这种配方采用与强化学习政策梯度相同的方法, 包括环境由差异方程式系统提供的已知的确定性动态构成的情况。 模型规格的白箱性质允许通过敏感度分析直接计算政策梯度, 避免不精确和低效率的梯度估计。 在这项工作中, 我们提议使用神经控制政策作为神经运行模型, 解决一般的非线性最佳控制问题, 同时满足对真实世界情景至关重要的状态和控制制约。 由于国家反馈政策部分修改了模型动态, 系统的整个空间阶段在优化后被重新组合。 这种方法是对非线性动态控制问题的历史坚固的封闭循环解决方案进行明智的近近近。