In this paper we present a Reinforcement Learning environment that leverages agent cooperation and communication, aimed at detection, learning and ultimately penalizing betrayal patterns that emerge in the behavior of self-interested agents. We provide a description of game rules, along with interesting cases of betrayal and trade-offs that arise. Preliminary experimental investigations illustrate a) betrayal emergence, b) deceptive agents outperforming honest baselines and b) betrayal detection based on classification of behavioral features, which surpasses probabilistic detection baselines. Finally, we propose approaches for penalizing betrayal, list directions for future work and suggest interesting extensions of the environment towards capturing and exploring increasingly complex patterns of social interactions.
翻译:在本文中,我们提出了一个强化学习环境,利用代理合作与交流,旨在发现、学习并最终惩罚自利代理行为中出现的背叛模式。我们描述了游戏规则以及有趣的背叛和权衡案例。初步实验性调查表明:(a) 背叛的出现;(b) 欺骗性代理超过诚实基线;(b) 基于行为特征分类的背叛检测,超过探测概率基线。最后,我们提出了惩罚背叛行为的方法,列出未来工作的方向,并提出环境的有趣延伸,以捕捉和探索日益复杂的社会互动模式。