Cooperation in settings where agents have both common and conflicting interests (mixed-motive environments) has recently received considerable attention in multi-agent learning. However, the mixed-motive environments typically studied have a single cooperative outcome on which all agents can agree. Many real-world multi-agent environments are instead bargaining problems (BPs): they have several Pareto-optimal payoff profiles over which agents have conflicting preferences. We argue that typical cooperation-inducing learning algorithms fail to cooperate in BPs when there is room for normative disagreement resulting in the existence of multiple competing cooperative equilibria, and illustrate this problem empirically. To remedy the issue, we introduce the notion of norm-adaptive policies. Norm-adaptive policies are capable of behaving according to different norms in different circumstances, creating opportunities for resolving normative disagreement. We develop a class of norm-adaptive policies and show in experiments that these significantly increase cooperation. However, norm-adaptiveness cannot address residual bargaining failure arising from a fundamental tradeoff between exploitability and cooperative robustness.
翻译:在代理人有着共同和相互冲突的利益(混合-运动环境)的环境下,合作近来在多代理人的学习中受到相当重视,然而,通常研究的混合运动环境有一个所有代理人都能同意的单一合作结果。许多现实世界的多代理人环境是讨价还价(BPs):它们有几个最佳的回报特征,而代理人的偏好是相互冲突的。我们争辩说,典型的合作-引导学习算法在BPs没有合作,因为有在规范方面出现分歧的余地,导致存在多种相互竞争的合作平衡,并用经验来说明这个问题。为了纠正这个问题,我们引入了规范-适应政策的概念。规范-适应政策能够在不同情况下按照不同的规范行事,创造解决规范分歧的机会。我们制定了一套规范-适应政策,并在实验中表明这些能够大大增强合作。然而,规范-适应性不能解决由于剥削与合作的稳健性之间的根本权衡而导致的剩余谈判失败。