Learning agents can make use of Reinforcement Learning (RL) to decide their actions by using a reward function. However, the learning process is greatly influenced by the elect of values of the hyperparameters used in the learning algorithm. This work proposed a Deep Deterministic Policy Gradient (DDPG) and Hindsight Experience Replay (HER) based method, which makes use of the Genetic Algorithm (GA) to fine-tune the hyperparameters' values. This method (GA+DDPG+HER) experimented on six robotic manipulation tasks: FetchReach; FetchSlide; FetchPush; FetchPickAndPlace; DoorOpening; and AuboReach. Analysis of these results demonstrated a significant increase in performance and a decrease in learning time. Also, we compare and provide evidence that GA+DDPG+HER is better than the existing methods.
翻译:学习代理可以使用强化学习(RL) 来使用奖励功能来决定自己的行动。 但是,学习过程受到学习算法中所用超参数值选择的极大影响。 这项工作提出了基于深确定性政策梯度( DPG ) 和 Hindsight 经验重玩(HER) 的方法,该方法利用遗传演算法( GA) 来微调超参数值。 这个方法( GA+DDPG+HER) 实验了六种机器人操作任务: FetchReach; FetchSlide; FetchPush; FetPickAnderPlace; Door Openning; AuboReach。 对这些结果的分析表明,这些结果的性能显著提高,学习时间也减少了。 此外,我们比较并提供证据表明GA+DDPGHG+HER优于现有方法。