This paper establishes directionality reinforcement learning (DRL) technique to propose the complete decentralized multi-agent reinforcement learning method which can achieve cooperation based on each agent's learning: no communication and no observation. Concretely, DRL adds the direction "agents have to learn to reach the farthest goal among reachable ones" to learning agents to operate the agents cooperatively. Furthermore, to investigate the effectiveness of the DRL, this paper compare Q-learning agent with DRL with previous learning agent in maze problems. Experimental results derive that (1) DRL performs better than the previous method in terms of the spending time, (2) the direction makes agents learn yielding action for others, and (3) DRL suggests achieving multiagent learning with few costs for any number of agents.
翻译:本文确定了方向强化学习(DRL)技术,以提出完整的分散式多剂强化学习方法,该方法可以在每个代理人的学习基础上实现合作:没有沟通,也没有观察。具体地说,DRL增加了一个方向,即“代理人必须学会达到最远的可达目标”,学习代理人以合作方式操作代理人。此外,为了调查DRL的有效性,本文件将Q学习剂与DRL和以前在迷宫问题上的学习剂进行了比较。实验结果显示:(1) DRL在花费时间方面表现优于以往的方法,(2)该方向使代理人学会为他人采取行动,(3) DRL建议实现多剂学习,而任何代理人的成本都不高。