We investigate the benefits of heterogeneity in multi-agent explore-exploit decision making where the goal of the agents is to maximize cumulative group reward. To do so we study a class of distributed stochastic bandit problems in which agents communicate over a multi-star network and make sequential choices among options in the same uncertain environment. Typically, in multi-agent bandit problems, agents use homogeneous decision-making strategies. However, group performance can be improved by incorporating heterogeneity into the choices agents make, especially when the network graph is irregular, i.e. when agents have different numbers of neighbors. We design and analyze new heterogeneous explore-exploit strategies, using the multi-star as the model irregular network graph. The key idea is to enable center agents to do more exploring than they would do using the homogeneous strategy, as a means of providing more useful data to the peripheral agents. In the case all agents broadcast their reward values and choices to their neighbors with the same probability, we provide theoretical guarantees that group performance improves under the proposed heterogeneous strategies as compared to under homogeneous strategies. We use numerical simulations to illustrate our results and to validate our theoretical bounds.
翻译:我们调查多试剂探索-开发决策中的异质性的好处,在多试剂探索-开发设计者的目标是最大限度地增加累积集体奖励的情况下,我们调查多试剂探索-开发决策中的异质性的好处。为了研究一组分布式的随机盗匪问题,在多星网络中,代理商通过多星网络进行沟通,并在同一个不确定的环境中对各种选项进行顺序选择。在多试剂强盗问题中,代理商使用同质决策战略,采用同质决策战略。然而,如果将异异质性特征纳入选择代理商所作的选择,特别是当网络图不规则时,可以提高集体绩效。我们用多星模型作为非常规网络模型来设计和分析新的多元探索-开发战略。关键的想法是让中心代理商比他们用同质战略做更多的探索,作为向外围代理商提供更有用的数据的手段。如果所有代理商以同样的可能性向邻居传播其奖赏价值和选择,我们提供理论上的保证,在拟议的混合战略下,即代理人拥有不同数量的邻居。我们使用数字模拟来说明我们的结果,并验证我们的理论界限。