Motivated by the recent empirical success of policy-based reinforcement learning (RL), there has been a research trend studying the performance of policy-based RL methods on standard control benchmark problems. In this paper, we examine the effectiveness of policy-based RL methods on an important robust control problem, namely $\mu$ synthesis. We build a connection between robust adversarial RL and $\mu$ synthesis, and develop a model-free version of the well-known $DK$-iteration for solving state-feedback $\mu$ synthesis with static $D$-scaling. In the proposed algorithm, the $K$ step mimics the classical central path algorithm via incorporating a recently-developed double-loop adversarial RL method as a subroutine, and the $D$ step is based on model-free finite difference approximation. Extensive numerical study is also presented to demonstrate the utility of our proposed model-free algorithm. Our study sheds new light on the connections between adversarial RL and robust control.
翻译:以基于政策的强化学习(RL)最近的成功经验为动力,出现了一种研究趋势,研究基于政策的RL方法在标准控制基准问题方面的绩效。在本文件中,我们研究了基于政策的RL方法在重要的稳健控制问题(即$\mu$合成)上的有效性。我们建立了强大的对抗性RL和$\mu$合成之间的连接,并开发了著名的无模型版的 $K$-treail 用于用静态的D美元缩放法解决国家反馈的$\mu$合成。在拟议的算法中, $K$的阶梯模拟了经典中央路径算法, 将最近开发的双环对立式RL法作为子例, $D$的步骤以无模型的有限近似值为基础。还进行了广泛的数字研究,以展示我们提议的无模型算法的效用。我们的研究为对抗性RL和强力控制之间的联系提供了新的线索。