This paper studies multi-agent systems that involve networks of self-interested agents. We propose a Markov Decision Process-derived framework, called RepNet-MDP, tailored to domains in which agent reputation is a key driver of the interactions between agents. The fundamentals are based on the principles of RepNet-POMDP, a framework developed by Rens et al. in 2018, but addresses its mathematical inconsistencies and alleviates its intractability by only considering fully observable environments. We furthermore use an online learning algorithm for finding approximate solutions to RepNet-MDPs. In a series of experiments, RepNet agents are shown to be able to adapt their own behavior to the past behavior and reliability of the remaining agents of the network. Finally, our work identifies a limitation of the framework in its current formulation that prevents its agents from learning in circumstances in which they are not a primary actor.
翻译:本文研究涉及自我利益代理人网络的多试剂系统。我们提议了一个Markov决定进程衍生框架,称为RepNet-MDP,专门针对代理人声誉是代理人之间相互作用关键驱动力的领域。基本原理基于RepNet-POMDP的原则,这是Rens等人在2018年开发的一个框架,但解决其数学不一致问题,并仅考虑完全可观察的环境,从而减轻其吸引力。我们还使用在线学习算法,为RepNet-MDP寻找近似的解决办法。在一系列实验中,RepNet代理人已证明能够使自己的行为适应网络其余代理人过去的行为和可靠性。最后,我们的工作确定了目前拟订的框架的局限性,使代理人在不是主要行为者的情况下无法学习。