Deep reinforcement learning (RL) algorithms can learn complex policies to optimize agent operation over time. RL algorithms have shown promising results in solving complicated problems in recent years. However, their application on real-world physical systems remains limited. Despite the advancements in RL algorithms, the industries often prefer traditional control strategies. Traditional methods are simple, computationally efficient and easy to adjust. In this paper, we first propose a new Q-learning algorithm for continuous action space, which can bridge the control and RL algorithms and bring us the best of both worlds. Our method can learn complex policies to achieve long-term goals and at the same time it can be easily adjusted to address short-term requirements without retraining. Next, we present an approximation of our algorithm which can be applied to address short-term requirements of any pre-trained RL algorithm. The case studies demonstrate that both our proposed method as well as its practical approximation can achieve short-term and long-term goals without complex reward functions.
翻译:深入强化学习(RL)算法可以学习复杂的政策,以便随着时间推移优化代理业务。RL算法在解决复杂问题方面取得了令人乐观的成果。但是,在现实世界中应用这些算法仍然有限。尽管在RL算法方面取得了进展,但行业往往倾向于传统的控制战略。传统方法简单、计算高效、易于调整。在本文中,我们首先为连续行动空间提出一种新的Q学习算法,它可以连接控制和RL算法,为我们带来两个世界的最佳操作。我们的方法可以学习复杂的政策,实现长期目标,同时可以很容易地调整,在没有再培训的情况下满足短期需求。接下来,我们提出了一种算法的近似法,可以用来满足任何经过预先培训的RL算法的短期需求。案例研究表明,我们拟议的方法及其实际的近似法可以在没有复杂报酬功能的情况下实现短期和长期目标。