This paper investigates the dynamics of competition among organizations with unequal expertise. Multi-agent reinforcement learning has been used to simulate and understand the impact of various incentive schemes designed to offset such inequality. We design Touch-Mark, a game based on well-known multi-agent-particle-environment, where two teams (weak, strong) with unequal but changing skill levels compete against each other. For training such a game, we propose a novel controller assisted multi-agent reinforcement learning algorithm \our\, which empowers each agent with an ensemble of policies along with a supervised controller that by selectively partitioning the sample space, triggers intelligent role division among the teammates. Using C-MADDPG as an underlying framework, we propose an incentive scheme for the weak team such that the final rewards of both teams become the same. We find that in spite of the incentive, the final reward of the weak team falls short of the strong team. On inspecting, we realize that an overall incentive scheme for the weak team does not incentivize the weaker agents within that team to learn and improve. To offset this, we now specially incentivize the weaker player to learn and as a result, observe that the weak team beyond an initial phase performs at par with the stronger team. The final goal of the paper has been to formulate a dynamic incentive scheme that continuously balances the reward of the two teams. This is achieved by devising an incentive scheme enriched with an RL agent which takes minimum information from the environment.
翻译:本文调查了具有不平等专长的组织之间的竞争动态。多试剂强化学习被用于模拟和理解旨在抵消这种不平等的各种奖励计划的影响。我们设计了Teach-Mark,这是一个以众所周知的多试剂粒子环境为基础的游戏,其中两个团队(弱、强)相互竞争,技能水平不平等但变化不一的团队相互竞争。为了培训这样一个游戏,我们提议一个新型控制员协助多试剂强化学习算法。这个算法赋予每个代理商以一整套政策,并赋予监管控制员以权力,通过有选择地分割抽样空间,激发团队伙伴之间的智能角色分工。我们用C-MADDPG作为基本框架,我们建议为弱小团队制定奖励计划,使两个团队的最终奖励计划变得相同。弱小团队的最后奖励机制是学习和完善。一个最弱的激励机制是,一个最弱的激励团队,一个最弱的激励机制是更强的团队,一个最弱的激励机制,一个更强的激励机制是更强的激励团队,一个最弱的激励机制是更强的团队,一个最弱的激励机制,一个更强的激励力的团队在最弱的团队中学习和最强的激励力的激励机制,一个激励力的团队形成。