利用深强化学习进行分散分散的多机构追求 (Decentralized Multi-Agent Pursuit using Deep Reinforcement Learning)

Pursuit-evasion is the problem of capturing mobile targets with one or more pursuers. We use deep reinforcement learning for pursuing an omni-directional target with multiple, homogeneous agents that are subject to unicycle kinematic constraints. We use shared experience to train a policy for a given number of pursuers that is executed independently by each agent at run-time. The training benefits from curriculum learning, a sweeping-angle ordering to locally represent neighboring agents and encouraging good formations with reward structure that combines individual and group rewards. Simulated experiments with a reactive evader and up to eight pursuers show that our learning-based approach, with non-holonomic agents, performs on par with classical algorithms with omni-directional agents, and outperforms their non-holonomic adaptations. The learned policy is successfully transferred to the real world in a proof-of-concept demonstration with three motion-constrained pursuer drones.

翻译：追逐逃难是用一个或多个追逐者捕捉移动目标的问题。我们用深度强化学习来追求一个全方向目标,由多个受单周期运动运动限制的同质物剂执行。我们用共同的经验来为每个追逐者在运行时独立执行的一定数量的追逐者培训政策。通过课程学习、向当地代表周边物剂的扫荡式命令以及鼓励良好的结构与将个人和集体奖励结合起来的奖励结构相结合。与反应式躲避者和多达8个追追逐者模拟实验显示,我们以学习为基础的方法,与非单向物剂相同,与超向导物剂的经典算法相同,并超越其非血球学适应。所学政策成功地转移到现实世界,通过三架运动式的追击无人驾驶飞机进行辨别示范。

相关内容

深度强化学习

关注 154

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

「元强化学习」报告，斯坦福Chelsea Finn讲解，52页ppt，Meta Reinforcement Learning

专知会员服务

42+阅读 · 2021年1月11日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【Uber AI新论文】持续元学习，Learning to Continually Learn

专知会员服务

37+阅读 · 2020年2月27日