In this paper, we propose SACHER (soft actor-critic (SAC) with hindsight experience replay (HER)), which constitutes a class of deep reinforcement learning (DRL) algorithms. SAC is known as an off-policy model-free DRL algorithm based on the maximum entropy framework, which outperforms earlier DRL algorithms in terms of exploration, robustness and learning performance. However, in SAC, maximizing the entropy-augmented objective may degrade the optimality of the learning outcomes. HER is known as a sample-efficient replay method that enhances the performance of off-policy DRL algorithms by allowing them to learn from both failures and successes. We apply HER to SAC and propose SACHER to improve the learning performance of SAC. More precisely, SACHER achieves the desired optimal outcomes faster and more accurately than SAC, since HER improves the sample efficiency of SAC. We apply SACHER to the navigation and control problem of unmanned aerial vehicles (UAVs), where SACHER generates the optimal navigation path of the UAV under various obstacles in operation. Specifically, we show the effectiveness of SACHER in terms of the tracking error and cumulative reward in UAV operation by comparing them with those of state-of-the-art DRL algorithms, SAC and DDPG. Note that SACHER in UAV navigation and control problems can be applied to arbitrary models of UAVs.
翻译:在本文中,我们建议SACHER(SAC)(SAC)(SAC)(SAC)(SAC)(SAC)(SAC)(SAC)(SAC)(SAC),具有事后观察经验重现(HER)),这是一组深层强化学习(DRL)算法。SAC(SAC)被称为基于最大英特罗比框架的非政策模式DL(DRL)算法,在探索、稳健和学习性能方面优于早期DRL算法。然而,在SAC(SAC)中,最大限度地提高UCUCHER(UA)的样本效率可能会降低学习结果的最佳性能。她被称为一种抽样高效的重现方法,通过让它们从失败和成功中学习,提高DRL(DL)非政策性能。我们向SAC(SAC)应用HER(DL)算法的运行效率。更准确地说,SAC(UAC)在SARC(SA)的累积性操作中,SAR(SARC)的运行中,我们用SARC(SARC(SAR)的最佳导航和(SAR)的运行中)的轨误判中,具体地显示SARC(SAR)的系统(SA-A)在SARC(SAR)操作中)的最佳导航(SAR)的轨误)。