Growing advancements in reinforcement learning has led to advancements in control theory. Reinforcement learning has effectively solved the inverted pendulum problem and more recently the double inverted pendulum problem. In reinforcement learning, our agents learn by interacting with the control system with the goal of maximizing rewards. In this paper, we explore three such reward functions in the cart position problem. This paper concludes that a discontinuous reward function that gives non-zero rewards to agents only if they are within a given distance from the desired position gives the best results.
翻译:强化学习的进步导致控制理论的进步。强化学习有效地解决了倒置的钟表问题,最近又解决了双倒置的钟表问题。在强化学习中,我们的代理通过与控制系统互动学习,以最大限度地获得回报为目标。在本文中,我们探讨了在推车位置问题上的三个这样的奖赏功能。本文的结论是,不连续的奖赏功能只有在代理人处于与预期位置的一定距离内才能给予非零奖赏。