Information is often stored in a distributed and proprietary form, and agents who own information are often self-interested and require incentives to reveal their information. Suitable mechanisms are required to elicit and aggregate such distributed information for decision making. In this paper, we use simulations to investigate the use of decision markets as mechanisms in a multi-agent learning system to aggregate distributed information for decision-making in a contextual bandit problem. The system utilises strictly proper decision scoring rules to assess the accuracy of probabilistic reports from agents, which allows agents to learn to solve the contextual bandit problem jointly. Our simulations show that our multi-agent system with distributed information can be trained as efficiently as a centralised counterpart with a single agent that receives all information. Moreover, we use our system to investigate scenarios with deterministic decision scoring rules which are not incentive compatible. We observe the emergence of more complex dynamics with manipulative behaviour, which agrees with existing theoretical analyses.
翻译:信息通常以分布式和专有的形式储存,拥有信息的代理人往往具有自我利益,需要激励来披露信息。需要适当的机制来获取和汇总这种分布式信息,以便决策。在本文中,我们利用模拟来调查决策市场作为多机构学习系统中的机制,以汇总信息,用于背景强盗问题的决策。这个系统使用严格适当的决策评分规则来评估代理人报告概率的准确性,使代理人能够学习共同解决背景强盗问题。我们的模拟表明,我们拥有分布式信息的多代理系统可以与接收所有信息的单一代理进行高效的集中对应方培训。此外,我们利用我们的系统来调查具有决定性决定性的评分规则的情景,这些规则不具有激励性。我们观察到操纵行为出现了更复杂的动态,这些动态与现有的理论分析是一致的。