In teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multiagent reinforcement learning (MARL), where agents need to cooperate with one another, a student may fail to cooperate well with others even by following the teachers' suggested actions, as the polices of all agents are ever changing before convergence. When the number of times that agents communicate with one another is limited (i.e., there is budget constraint), the advising strategy that uses actions as advices may not be good enough. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraint. In PSAF, each Q-learner can decide when to ask for Q-values and share its Q-values. We perform experiments in three typical multiagent learning problems. Evaluation results show that our approach PSAF outperforms existing advising methods under both unlimited and limited budget, and we give an analysis of the impact of advising actions and sharing Q-values on agents' learning.
翻译:在教师-学生框架内,经验更丰富的代理(教师)通过建议在某些州采取行动,帮助加速另一个代理(学生)的学习。在合作性多试剂强化学习(MARL)中,如果代理人需要相互合作,那么学生可能甚至没有与其他人进行良好的合作,即使按照教师建议的行动,因为所有代理人的政策在趋同之前就一直在发生变化。当代理人相互沟通的次数有限(即存在预算限制)时,使用行动作为建议的建议战略可能不够好。我们建议为合作性多试剂强化学习(MARL)建议一个partaker-Sharer咨询框架(PSAF),在预算限制下,我们分析建议性行动和分享Q值对代理人学习的影响。在PSAF中,每个Qlearner可以决定何时要求Q值并分享Q值。我们在三个典型的多试剂学习问题中进行实验。评价结果显示,我们的PSAF方法超越了预算限制和有限的现有咨询方法,我们分析建议性行动和分享Q值对代理人学习的影响。