Pursuit-evasion is the problem of capturing mobile targets with one or more pursuers. We use deep reinforcement learning for pursuing an omni-directional target with multiple, homogeneous agents that are subject to unicycle kinematic constraints. We use shared experience to train a policy for a given number of pursuers that is executed independently by each agent at run-time. The training benefits from curriculum learning, a sweeping-angle ordering to locally represent neighboring agents and encouraging good formations with reward structure that combines individual and group rewards. Simulated experiments with a reactive evader and up to eight pursuers show that our learning-based approach, with non-holonomic agents, performs on par with classical algorithms with omni-directional agents, and outperforms their non-holonomic adaptations. The learned policy is successfully transferred to the real world in a proof-of-concept demonstration with three motion-constrained pursuer drones.
翻译:追逐逃难是用一个或多个追逐者捕捉移动目标的问题。 我们用深度强化学习来追求一个全方向目标,由多个受单周期运动运动限制的同质物剂执行。 我们用共同的经验来为每个追逐者在运行时独立执行的一定数量的追逐者培训政策。 通过课程学习、向当地代表周边物剂的扫荡式命令以及鼓励良好的结构与将个人和集体奖励结合起来的奖励结构相结合。 与反应式躲避者和多达8个追追逐者模拟实验显示,我们以学习为基础的方法,与非单向物剂相同,与超向导物剂的经典算法相同,并超越其非血球学适应。 所学政策成功地转移到现实世界,通过三架运动式的追击无人驾驶飞机进行辨别示范。