We propose a deep reinforcement learning approach for solving a mapless navigation problem in warehouse scenarios. The automatic guided vehicle is equipped with LiDAR and frontal RGB sensors and learns to reach underneath the target dolly. The challenges reside in the sparseness of positive samples for learning, multi-modal sensor perception with partial observability, the demand for accurate steering maneuvers together with long training cycles. To address these points, we proposed NavACL-Q as an automatic curriculum learning together with distributed soft actor-critic. The performance of the learning algorithm is evaluated exhaustively in a different warehouse environment to check both robustness and generalizability of the learned policy. Results in NVIDIA Isaac Sim demonstrates that our trained agent significantly outperforms the map-based navigation pipeline provided by NVIDIA Isaac Sim in terms of higher agent-goal distances and relative orientations. The ablation studies also confirmed that NavACL-Q greatly facilitates the whole learning process and a pre-trained feature extractor manifestly boosts the training speed.
翻译:我们提出了一种深度强化学习方法,以解决仓库情况中的无地图导航问题。自动制导器配备了LIDAR和前方RGB传感器,并学习如何在目标圆圈下达到目标。挑战在于:学习的正面样本稀少,具有部分可视性的多式传感器感知,需要精确的指导操作,以及长期培训周期。为了解决这些问题,我们建议纳瓦拉-Q与分布式软行为者对立器一起作为自动课程学习。学习算法在不同的仓库环境中进行详尽的评估,以检查所学政策的稳健性和可概括性。NVIDIA Isaac Sim的结果表明,我们所培训的代理器在更高的代理器目标距离和相对方向方面大大超过VIVIDAIA Isac Sim提供的基于地图的导航管道。该分析研究还证实,纳瓦拉-Q大大促进了整个学习过程,并有一个事先培训的特征提取器明显地提高了培训速度。