The widespread deployment of Machine Learning systems everywhere raises challenges, such as dealing with interactions or competition between multiple learners. In that goal, we study multi-agent sequential decision-making by considering principal-agent interactions in a tree structure. In this problem, the reward of a player is influenced by the actions of her children, who are all self-interested and non-cooperative, hence the complexity of making good decisions. Our main finding is that it is possible to steer all the players towards the globally optimal set of actions by simply allowing single-step transfers between them. A transfer is established between a principal and one of her agents: the principal actually offers the proposed payment if the agent picks the recommended action. The analysis poses specific challenges due to the intricate interactions between the nodes of the tree and the propagation of the regret within this tree. Considering a bandit setup, we propose algorithmic solutions for the players to end up being no-regret with respect to the optimal pair of actions and incentives. In the long run, allowing transfers between players makes them act as if they were collaborating together, although they remain self-interested non-cooperative: transfers restore efficiency.
翻译:机器学习系统在各领域的广泛部署带来了新的挑战,例如处理多个学习器之间的交互或竞争。为此,我们通过研究树形结构中的委托-代理交互来探讨多智能体序贯决策问题。在该问题中,参与者的收益受其所有子节点行为的影响,而这些子节点均为自利且非合作的,这增加了制定优质决策的复杂性。我们的核心发现是:仅通过允许参与者间进行单步转移支付,即可引导所有参与者趋向全局最优行动组合。转移支付在委托者与其某一代理间建立:若代理选择推荐行动,委托者将实际支付提议金额。由于树节点间复杂的交互作用及遗憾值在树结构中的传播特性,该分析面临特定挑战。在赌博机框架下,我们提出了算法方案,使参与者最终能针对最优行动与激励组合实现无遗憾学习。长期而言,允许参与者间进行转移支付可使他们表现出协作行为,尽管其本质仍是自利非合作的:转移支付机制重建了系统效率。