In the field of autonomous robots, reinforcement learning (RL) is an increasingly used method to solve the task of dynamic obstacle avoidance for mobile robots, autonomous ships, and drones. A common practice to train those agents is to use a training environment with random initialization of agent and obstacles. Such approaches might suffer from a low coverage of high-risk scenarios in training, leading to impaired final performance of obstacle avoidance. This paper proposes a general training environment where we gain control over the difficulty of the obstacle avoidance task by using short training episodes and assessing the difficulty by two metrics: The number of obstacles and a collision risk metric. We found that shifting the training towards a greater task difficulty can massively increase the final performance. A baseline agent, using a traditional training environment based on random initialization of agent and obstacles and longer training episodes, leads to a significantly weaker performance. To prove the generalizability of the proposed approach, we designed two realistic use cases: A mobile robot and a maritime ship under the threat of approaching obstacles. In both applications, the previous results can be confirmed, which emphasizes the general usability of the proposed approach, detached from a specific application context and independent of the agent's dynamics. We further added Gaussian noise to the sensor signals, resulting in only a marginal degradation of performance and thus indicating solid robustness of the trained agent.
翻译:在自主机器人领域,强化学习(RL)是一种日益常用的方法,用于解决机动机器人、自主船舶和无人驾驶飞机避免动态障碍这一任务。培训这些代理人的一个常见做法是使用一种随机初始化剂和障碍的培训环境。这种方法可能因培训中高风险情景的覆盖率低而受损,从而导致最后避免障碍的失灵。本文件建议了一种总体培训环境,通过使用短期培训片段来控制障碍避免任务的困难,并评估两种衡量标准的困难:障碍和碰撞风险度。我们发现,将培训转向更大的任务难度可以大大增加最后性能。一种基线剂,使用一种基于随机初始化剂和障碍的传统培训环境以及更长的培训过程,导致业绩大大降低。为了证明拟议方法的可普遍适用性,我们设计了两种现实的使用案例:移动机器人和海运船面临接近障碍的威胁。在这两种应用中,可以确认先前的结果,这两类标准都强调拟议方法的一般可使用性,从特定应用背景中分离,并独立于经过培训的代理人的边缘性动态。我们进一步增加了一种稳定的传感器。