We propose a model for multi-objective optimization, a credo, for agents in a system that are configured into multiple groups (i.e., teams). Our model of credo regulates how agents optimize their behavior for the component groups they belong to. We evaluate credo in the context of challenging social dilemmas with reinforcement learning agents. Our results indicate that the interests of teammates, or the entire system, are not required to be fully aligned for globally beneficial outcomes. We identify two scenarios without full common interest that achieve high equality and significantly higher mean population rewards compared to when the interests of all agents are aligned.
翻译:我们提出一个多目标优化模式,即一个信条,用于一个被配置成多个群体(即团队)的系统中的代理商。我们的信条模式规范了代理商如何优化其所属组成群体的行为。我们用强化学习代理商评估在挑战社会困境背景下的信条。我们的结果表明,团队伙伴或整个系统的利益不需要完全配合全球效益的结果。我们确定了两个没有实现高度平等和显著提高人口平均报酬的完全共同利益的情景,而所有代理商的利益是一致的。