This paper presents a robust reinforcement learning algorithm called robust deterministic policy gradient (RDPG), which reformulates the H-infinity control problem as a two-player zero-sum dynamic game between a user and an adversary. The method combines deterministic policy gradients with deep reinforcement learning to train a robust policy that attenuates disturbances efficiently. A practical variant, robust deep deterministic policy gradient (RDDPG), integrates twin-delayed updates for stability and sample efficiency. Experiments on an unmanned aerial vehicle demonstrate superior robustness and tracking accuracy under severe disturbance conditions.
翻译:本文提出了一种名为鲁棒确定性策略梯度(RDPG)的强化学习算法,该算法将H∞控制问题重新表述为用户与对手之间的双人零和动态博弈。该方法将确定性策略梯度与深度强化学习相结合,训练出一种能有效抑制干扰的鲁棒策略。其实用变体——鲁棒深度确定性策略梯度(RDDPG)——集成了双延迟更新机制以提高稳定性和样本效率。在无人飞行器上的实验表明,该方法在严重干扰条件下展现出卓越的鲁棒性和跟踪精度。