In this work we propose and analyze a new framework to learn feedback control policies that exhibit provable guarantees on the closed-loop performance and robustness to bounded (adversarial) perturbations. These policies are learned from expert demonstrations without any prior knowledge of the task, its cost function, and system dynamics. In contrast to the existing algorithms in imitation learning and inverse reinforcement learning, we use a Lipschitz-constrained loss minimization scheme to learn control policies with certified robustness. We establish robust stability of the closed-loop system under the learned control policy and derive an upper bound on its regret, which bounds the sub-optimality of the closed-loop performance with respect to the expert policy. We also derive a robustness bound for the deterioration of the closed-loop performance under bounded (adversarial) perturbations on the state measurements. Ultimately, our results suggest the existence of an underlying tradeoff between nominal closed-loop performance and adversarial robustness, and that improvements in nominal closed-loop performance can only be made at the expense of robustness to adversarial perturbations. Numerical results validate our analysis and demonstrate the effectiveness of our robust feedback policy learning framework.
翻译:在这项工作中,我们提出并分析一个新的框架,以学习对闭环性能和稳健性表现出对闭环性能和约束性(对抗性)扰动的可靠保证的反馈控制政策。这些政策是从专家演示中学习的,事先对任务、成本功能和系统动态没有任何了解。与模仿性学习和反反强化性能学习中现有的算法相比,我们采用利普施茨控制最小化计划,学习经认证的稳健性能的控制政策。我们在学习的控制政策下建立闭环性能的稳健稳定性,并在遗憾上加分,这限制了闭环性能与专家政策的次优性。我们还从对受约束性(对抗性)对州度的干扰下闭环性性能的恶化中获得了一定的稳健健性。最终,我们的结果表明名义闭环性性性性能与对抗性能的稳健健健性,而名义上闭环性性能的改进只能以牺牲我们的稳健性抗性为代价。