We present a multi-agent learning algorithm, ALMA-Learning, for efficient and fair allocations in large-scale systems. We circumvent the traditional pitfalls of multi-agent learning (e.g., the moving target problem, the curse of dimensionality, or the need for mutually consistent actions) by relying on the ALMA heuristic as a coordination mechanism for each stage game. ALMA-Learning is decentralized, observes only own action/reward pairs, requires no inter-agent communication, and achieves near-optimal (<5% loss) and fair coordination in a variety of synthetic scenarios and a real-world meeting scheduling problem. The lightweight nature and fast learning constitute ALMA-Learning ideal for on-device deployment.
翻译:我们提出了一个多试剂学习算法(ALMA-Linarning),目的是在大规模系统中高效和公平地分配资金。我们绕过多试剂学习的传统陷阱(例如移动目标问题、维度的诅咒或需要相互一致的行动),依靠ALMA超时论作为每个阶段游戏的协调机制。ALMA-Linance是分散的,只观察自己的行动/回报对,不需要机构间沟通,在各种合成情景和现实世界会议时间安排问题上实现接近最佳的( < 5%的损失)和公平协调。轻度性质和快速学习构成了用于在设备上部署的ALMA-Llearning理想。