In this work, we propose a framework to learn feedback control policies with guarantees on closed-loop generalization and adversarial robustness. These policies are learned directly from expert demonstrations, contained in a dataset of state-control input pairs, without any prior knowledge of the task and system model. We use a Lipschitz-constrained loss minimization scheme to learn feedback policies with certified closed-loop robustness, wherein the Lipschitz constraint serves as a mechanism to tune the generalization performance and robustness to adversarial disturbances. Our analysis exploits the Lipschitz property to obtain closed-loop guarantees on generalization and robustness of the learned policies. In particular, we derive a finite sample bound on the policy learning error and establish robust closed-loop stability under the learned control policy. We also derive bounds on the closed-loop regret with respect to the expert policy and the deterioration of closed-loop performance under bounded (adversarial) disturbances to the state measurements. Numerical results validate our analysis and demonstrate the effectiveness of our robust feedback policy learning framework. Finally, our results suggest the existence of a potential tradeoff between nominal closed-loop performance and adversarial robustness, and that improvements in nominal closed-loop performance can only be made at the expense of robustness to adversarial perturbations.
翻译:在这项工作中,我们提出一个框架,学习反馈控制政策,对封闭环状一般化和对抗性稳健性提供保障;这些政策直接从专家演示中学习,见于国家控制投入配对的数据集中,事先对任务和系统模式没有任何了解;我们采用Lipschitz受限制的损失最小化计划,学习经认证封闭环状稳健性的反馈政策,其中利普申茨限制是调和一般化业绩和对抗性对抗性骚乱的稳健性的机制;我们的分析利用Lipschitz属性获得关于知识化政策普遍化和稳健性的封闭环状保证;特别是,我们从政策学习错误中抽取了有限的样本,并在学习的控制政策中建立了稳健的封闭环状稳定;我们还从封闭环状对专家政策和受约束的(对抗性)约束性(对抗性)干扰下封闭环状工作业绩与国家测量的恶化感到遗憾。数字结果证实了我们的分析,并显示了我们强有力的反馈政策学习框架的有效性。最后,我们的结果显示,在名义封闭式和对抗性开支的稳健健性业绩方面,只有名义封闭性业绩才能改进。