根据持续时间政策逐步进展,优化结构型神经控制器 (Optimisation of Structured Neural Controller Based on Continuous-Time Policy Gradient)

This study presents a policy optimisation framework for structured nonlinear control of continuous-time (deterministic) dynamic systems. The proposed approach prescribes a structure for the controller based on relevant scientific knowledge (such as Lyapunov stability theory or domain experiences) while considering the tunable elements inside the given structure as the point of parametrisation with neural networks. To optimise a cost represented as a function of the neural network weights, the proposed approach utilises the continuous-time policy gradient method based on adjoint sensitivity analysis as a means for correct and performant computation of cost gradient. This enables combining the stability, robustness, and physical interpretability of an analytically-derived structure for the feedback controller with the representational flexibility and optimised resulting performance provided by machine learning techniques. Such a hybrid paradigm for fixed-structure control synthesis is particularly useful for optimising adaptive nonlinear controllers to achieve improved performance in online operation, an area where the existing theory prevails the design of structure while lacking clear analytical understandings about tuning of the gains and the uncertainty model basis functions that govern the performance characteristics. Numerical experiments on aerospace applications illustrate the utility of the structured nonlinear controller optimisation framework.

翻译：本研究为连续时间(确定性)动态系统的结构性非线性非线性控制提供了一个政策优化框架。拟议方法根据相关科学知识(如Lyapunov稳定性理论或域经验)为控制器规定了一种结构结构,同时将特定结构中的金枪鱼元素作为神经网络的平衡点。优化作为神经网络重量函数的成本,拟议方法利用基于联合敏感度分析的连续时间政策梯度方法,作为正确和运行成本梯度计算的一种手段。这能够将反馈控制器的分析衍生结构的稳定性、稳健性和物理解释性与机器学习技术提供的代表性灵活性和优化性能相结合。固定结构控制合成的混合模式对于优化适应性非线性控制器实现在线操作绩效的优化特别有用,而目前理论在结构设计上占上的位置,同时缺乏关于调整收益和计算成本梯度的明确分析理解。关于航空航天应用的预测性结构非线性框架的优化性能。