Adapting the idea of training CartPole with Deep Q-learning agent, we are able to find a promising result that prevent the pole from falling down. The capacity of reinforcement learning (RL) to learn from the interaction between the environment and agent provides an optimal control strategy. In this paper, we aim to solve the classic pendulum swing-up problem that making the learned pendulum to be in upright position and balanced. Deep Deterministic Policy Gradient algorithm is introduced to operate over continuous action domain in this problem. Salient results of optimal pendulum are proved with increasing average return, decreasing loss, and live video in the code part.
翻译:通过深Q学习代理机构对卡托尔培训卡托尔的想法进行调整后,我们能够找到一个有希望的结果,防止杆下降。强化学习(RL)从环境和代理机构之间的互动中学习的能力提供了最佳的控制策略。在本文中,我们的目标是解决典型的钟摆波动问题,即让学到的钟摆处于正向和平衡的位置。引入了深确定性政策渐进算法,以便在这一问题的连续行动领域运作。最佳钟摆的显著效果证明是平均回报增加、损失减少和代码部分的现场视频。