Multi-agent reinforcement learning was performed in this study for indoor path planning of two unmanned aerial vehicles (UAVs). Each UAV performed the task of moving as fast as possible from a randomly paired initial position to a goal position in an environment with obstacles. To minimize training time and prevent the damage of UAVs, learning was performed by simulation. Considering the non-stationary characteristics of the multi-agent environment wherein the optimal behavior varies based on the actions of other agents, the action of the other UAV was also included in the state space of each UAV. Curriculum learning was performed in two stages to increase learning efficiency. A goal rate of 89.0% was obtained compared with other learning strategies that obtained goal rates of 73.6% and 79.9%.
翻译:在这项研究中,对两台无人驾驶飞行器进行了室内路径规划,进行了多剂强化学习,每个无人驾驶飞行器都完成了尽可能快地从随机配对的初始位置到有障碍的环境中的目标位置的任务;为尽量减少培训时间和防止无人驾驶飞行器的损坏,通过模拟进行了学习;考虑到多剂环境的非固定性特点,即根据其他飞行器的行动,最佳行为各不相同,其他无人驾驶飞行器的行动也被纳入每个无人驾驶飞行器的状态空间;课程学习分两个阶段进行,以提高学习效率;与其他获得目标率73.6%和79.9%的学习战略相比,目标率为89.0%。