We propose a model for multi-objective optimization, a credo, for agents in a system that are configured into multiple groups (i.e., teams). Our model of credo regulates how agents optimize their behavior for the groups they belong to. We evaluate credo in the context of challenging social dilemmas with reinforcement learning agents. Our results indicate that the interests of teammates, or the entire system, are not required to be fully aligned for achieving globally beneficial outcomes. We identify two scenarios without full common interest that achieve high equality and significantly higher mean population rewards compared to when the interests of all agents are aligned.
翻译:我们提出了一种多目标优化模型——信仰,以调节系统中的多组代理的优化行为。我们在强化学习代理面临的复杂社会困境中评估了信仰模型。研究结果表明,即使队友(即同一团队的代理)或整个系统的利益并不完全一致,也可以实现全局受益的目标,并分析了两种非完全共同利益的情况,均实现了较高的平等性,并显著提高了平均总体奖励。