When applying imitation learning techniques to fit a policy from expert demonstrations, one can take advantage of prior stability/robustness assumptions on the expert's policy and incorporate such control-theoretic prior knowledge explicitly into the learning process. In this paper, we formulate the imitation learning of linear policies as a constrained optimization problem, and present efficient methods which can be used to enforce stability and robustness constraints during the learning processes. Specifically, we show that one can guarantee the closed-loop stability and robustness by posing linear matrix inequality (LMI) constraints on the fitted policy. Then both the projected gradient descent method and the alternating direction method of multipliers (ADMM) method can be applied to solve the resulting constrained policy fitting problem. Finally, we provide numerical results to demonstrate the effectiveness of our methods in producing linear polices with various stability and robustness guarantees.
翻译:在应用模仿学习技术以适应专家示范的政策时,可以利用专家政策先前的稳定/严谨假设,并将这种先前的控制理论知识明确纳入学习过程;在本文件中,我们将线性政策的模拟学习作为限制优化的问题,并提出有效的方法,用以在学习过程中加强稳定性和稳健性限制;具体地说,我们表明,通过对适合的政策提出线性矩阵不平等限制,可以保证封闭性环流的稳定性和稳健性;然后,预测的梯度梯度下降法和乘数交替方向法都可用于解决由此产生的制约性政策适当性问题;最后,我们提供了数字结果,以证明我们以各种稳定和稳健保证的方式制定线性政策的有效性。