This work is concerned with solving neural network-based feedback controllers efficiently for optimal control problems. We first conduct a comparative study of two mainstream approaches: offline supervised learning and online direct policy optimization. Albeit the training part of the supervised learning approach is relatively easy, the success of the method heavily depends on the optimal control dataset generated by open-loop optimal control solvers. In contrast, direct optimization turns the optimal control problem into an optimization problem directly without any requirement of pre-computing, but the dynamics-related objective can be hard to optimize when the problem is complicated. Our results highlight the priority of offline supervised learning in terms of both optimality and training time. To overcome the main challenges, dataset, and optimization, in the two approaches respectively, we complement them and propose the Pre-train and Fine-tune strategy as a unified training paradigm for optimal feedback control, which further improves the performance and robustness significantly. Our code is available at https://github.com/yzhao98/DeepOptimalControl.
翻译:这项工作涉及高效解决神经网络反馈控制器以优化控制问题。我们首先对两种主流方法进行比较研究:离线监督学习和在线直接政策优化。尽管受监督学习方法的培训部分相对容易,但这种方法的成功在很大程度上取决于开放环最佳控制解答器产生的最佳控制数据集。相比之下,直接优化直接将最佳控制问题转化为最佳控制问题,无需事先计算,但在问题复杂时,动态相关目标可能难以优化。我们的结果突出了离线监督学习在最佳性和培训时间两方面的优先事项。为了克服主要挑战、数据集和优化,我们在两种方法中分别加以补充,并提议将培训前和微调战略作为最佳反馈控制的统一培训模式,以进一步提高业绩和稳健性。我们的代码可在https://github.com/yzhao98/DeepOptimal Contal查阅。