We devise a control-theoretic reinforcement learning approach to support direct learning of the optimal policy. We establish various theoretical properties of our approach, such as convergence and optimality of our control-theoretic operator, a new control-policy-parameter gradient ascent theorem, and a specific gradient ascent algorithm based on this theorem. As a representative example, we adapt our approach to a particular control-theoretic framework and empirically evaluate its performance on several classical reinforcement learning tasks, demonstrating significant improvements in solution quality, sample complexity, and running time of our control-theoretic approach over state-of-the-art baseline methods.
翻译:暂无翻译