As machine learning agents act more autonomously in the world, they will increasingly interact with each other. Unfortunately, in many social dilemmas like the one-shot Prisoner's Dilemma, standard game theory predicts that ML agents will fail to cooperate with each other. Prior work has shown that one way to enable cooperative outcomes in the one-shot Prisoner's Dilemma is to make the agents mutually transparent to each other, i.e., to allow them to access one another's source code (Rubinstein 1998, Tennenholtz 2004) -- or weights in the case of ML agents. However, full transparency is often unrealistic, whereas partial transparency is commonplace. Moreover, it is challenging for agents to learn their way to cooperation in the full transparency setting. In this paper, we introduce a more realistic setting in which agents only observe a single number indicating how similar they are to each other. We prove that this allows for the same set of cooperative outcomes as the full transparency setting. We also demonstrate experimentally that cooperation can be learned using simple ML methods.
翻译:由于机器学习代理人在世界上更加自主地行事,他们将日益相互互动。 不幸的是,在许多社会困境中,如单枪囚犯的“困境 ”, 标准游戏理论预测ML代理人将不互相合作。 先前的工作表明,在单枪囚犯的“困境”中,实现合作结果的一种方式是使这些代理人相互透明,即允许他们接触对方的源码(Rubinstein 1998, Tennenenholtz 2004) -- -- 或ML代理人的权重。 但是,完全透明往往不现实,而部分透明则很常见。 此外,对于代理人来说,在完全透明环境下学习合作的方式是困难的。在本文中,我们引入了一个更现实的环境,即代理人只观察一个数字,表明他们彼此的相似性。我们证明,这允许他们接触完全透明性设定的一套合作结果。我们还实验性地表明,可以用简单的ML方法来学习合作。