Deterministic and Stochastic techniques in Deep Reinforcement Learning (Deep-RL) have become a promising solution to improve motion control and the decision-making tasks for a wide variety of robots. Previous works showed that these Deep-RL algorithms can be applied to perform mapless navigation of mobile robots in general. However, they tend to use simple sensing strategies since it has been shown that they perform poorly with a high dimensional state spaces, such as the ones yielded from image-based sensing. This paper presents a comparative analysis of two Deep-RL techniques - Deep Deterministic Policy Gradients (DDPG) and Soft Actor-Critic (SAC) - when performing tasks of mapless navigation for mobile robots. We aim to contribute by showing how the neural network architecture influences the learning itself, presenting quantitative results based on the time and distance of navigation of aerial mobile robots for each approach. Overall, our analysis of six distinct architectures highlights that the stochastic approach (SAC) better suits with deeper architectures, while the opposite happens with the deterministic approach (DDPG).
翻译:深层强化学习中的确定性和斯托卡技术(Dep-RL)已成为改善运动控制和各种机器人决策任务的有希望的解决办法。以前的工作表明,这些深层RL算法可用于对一般移动机器人进行无地图导航,但是,它们倾向于使用简单的遥感战略,因为已经证明,在高维状态空间,例如通过图像遥感产生的空间,它们的表现不佳。本文对两种深层RL技术 -- -- 深确定性政策梯度(DDPG)和Soft Acor-Critic(SAC) -- -- 在执行移动机器人无地图导航任务时进行比较分析。我们的目的是通过展示神经网络结构如何影响学习本身,根据每种方法的航空移动机器人的导航时间和距离提供定量结果。总体而言,我们对六种不同的结构的分析突出表明,深层结构更适合深层分析方法(SAC),而与此相反的是确定性方法(DPG)。