In this paper, we propose SACHER (soft actor-critic (SAC) with hindsight experience replay (HER)), which constitutes a class of deep reinforcement learning (DRL) algorithms. SAC is known as an off-policy model-free DRL algorithm based on the maximum entropy framework, which outperforms earlier DRL algorithms in terms of exploration, robustness and learning performance. However, in SAC, maximizing the entropy-augmented objective may degrade the optimality of learning outcomes. HER is known as a sample-efficient replay method that enhances the performance of off-policy DRL algorithms by allowing the agent to learn from both failures and successes. We apply HER to SAC and propose SACHER to improve the learning performance of SAC. More precisely, SACHER achieves the desired optimal outcomes faster and more accurately than SAC, since HER improves the sample efficiency of SAC. We apply SACHER to the navigation and control problem of unmanned aerial vehicles (UAVs), where SACHER generates the optimal navigation path of the UAV under various obstacles in operation. Specifically, we show the effectiveness of SACHER in terms of the tracking error and cumulative reward in UAV operation by comparing them with those of state-of-the-art DRL algorithms, SAC and DDPG. Note that SACHER in UAV navigation and control problems can be applied to arbitrary models of UAVs.
翻译:在本文中,我们提议SACHER(SAC)(SAC)(SAC)(SAC)(SAC),具有后见经验重现(CHER)的软体行为方极(SAC)),它是一个深层强化学习(DRL)算法的类别。SAC(SAC)被称为基于最大英特罗比框架的非政策模式DRL算法,它比早期DRL算法在探索、稳健和学习性能方面优于前者。然而,在SAC, 最大限度地实现UCUAU增强目标可能会降低学习成果的最佳性能。她被称为一种抽样高效的重现方法,它通过让代理人从失败和成功中学习,提高脱离政策的DRL算法(DRL)算法的性能。我们向SAC(SAC)应用HER来改进SL的学习性能。更精确地说,SACHER比SAC(SADR)的性能效率。我们用SAC(UAA)的优化导航和控制模式可以用SARC(SAARC)在SAR)的累积性操作中,具体地显示SAR(SAR)在SARC(SAR)的效能追踪中的各种问题。