In order to avoid conventional controlling methods which created obstacles due to the complexity of systems and intense demand on data density, developing modern and more efficient control methods are required. In this way, reinforcement learning off-policy and model-free algorithms help to avoid working with complex models. In terms of speed and accuracy, they become prominent methods because the algorithms use their past experience to learn the optimal policies. In this study, three reinforcement learning algorithms; DDPG, TD3 and SAC have been used to train Fetch robotic manipulator for four different tasks in MuJoCo simulation environment. All of these algorithms are off-policy and able to achieve their desired target by optimizing both policy and value functions. In the current study, the efficiency and the speed of these three algorithms are analyzed in a controlled environment.
翻译:为了避免由于系统复杂和对数据密度的强烈需求而造成障碍的常规控制方法,需要发展现代和更有效的控制方法。这样,强化学习非政策和不使用模型的算法有助于避免与复杂模型合作。在速度和准确性方面,这些方法成为突出的方法,因为算法利用过去的经验学习最佳政策。在这项研究中,有三种强化学习算法;DDPG、TD3和SAC被用于在 MuJoCo模拟环境中为四种不同任务培训Spetry机器人操纵器。所有这些算法都是不政策,能够通过优化政策和价值功能实现预期目标。在目前的研究中,对这三种算法的效率和速度在受控制的环境中进行了分析。