In multi-agent systems with large number of agents, typically the contribution of each agent to the value of other agents is minimal (e.g., aggregation systems such as Uber, Deliveroo). In this paper, we consider such multi-agent systems where each agent is self-interested and takes a sequence of decisions and represent them as a Stochastic Non-atomic Congestion Game (SNCG). We derive key properties for equilibrium solutions in SNCG model with non-atomic and also nearly non-atomic agents. With those key equilibrium properties, we provide a novel Multi-Agent Reinforcement Learning (MARL) mechanism that minimizes variance across values of agents in the same state. To demonstrate the utility of this new mechanism, we provide detailed results on a real-world taxi dataset and also a generic simulator for aggregation systems. We show that our approach reduces the variance in revenues earned by taxi drivers, while still providing higher joint revenues than leading approaches.
翻译:在具有大量代理物的多试剂系统中,每个代理物对其他代理物价值的贡献一般微乎其微(例如Uber、Faluoo等集成系统)。在本文件中,我们认为,每个代理物都具有自身利益并作出一系列决定的多试剂系统,并把它们当作一个托盘式非原子拥挤游戏(SNGG)来代表。我们在SNCG模型中以非原子和近乎非原子代理物为平衡解决方案产生关键特性。在这些关键平衡特性下,我们提供了一个新的多机构强化学习机制(MARL),以尽量减少同一州不同物剂价值的差异。为了展示这一新机制的效用,我们提供了关于真实世界出租车数据集的详细结果,同时也是集成系统的通用模拟器。我们表明,我们的做法可以减少出租车司机所得收入的差异,同时提供比主要方法更高的共同收入。