Reinforcement learning is a model-free optimal control method that optimizes a control policy through direct interaction with the environment. For reaching tasks that end in regulation, popular discrete-action methods are not well suited due to chattering in the goal state. We compare three different ways to solve this problem through combining reinforcement learning with classical LQR control. In particular, we introduce a method that integrates LQR control into the action set, allowing generalization and avoiding fixing the computed control in the replay memory if it is based on learned dynamics. We also embed LQR control into a continuous-action method. In all cases, we show that adding LQR control can improve performance, although the effect is more profound if it can be used to augment a discrete action set.
翻译:强化学习是一种无模式的最佳控制方法,它通过与环境的直接互动优化了控制政策。为了完成在监管中结束的任务,由于在目标状态中闲聊,流行的离散行动方法并不十分适合。我们比较了三种不同的方法,通过将强化学习与经典 LQR 控制相结合来解决这个问题。特别是,我们引入了一种方法,将LQR 控制纳入行动集,允许一般化,并避免在基于学习动态的回放存储中固定计算中的控制。我们还将LQR 控制嵌入一个连续操作方法中。我们在所有情况下都表明,添加LQR控制可以改善性能,尽管如果能够用来增强一个离散行动集,效果会更大。