Efficient coordination in multi-agent systems often incurs high communication overhead or slow convergence rates, making scalable welfare optimization difficult. We propose Single-Bit Coordination Dynamics for Pareto-Efficient Outcomes (SBC-PE), a decentralized learning algorithm requiring only a single-bit satisfaction signal per agent each round. Despite this extreme efficiency, SBC-PE guarantees convergence to the exact optimal solution in arbitrary finite games. We establish explicit regret bounds, showing expected regret grows only logarithmically with the horizon, i.e., O(log T). Compared with prior payoff-based or bandit-style rules, SBC-PE uniquely combines minimal signaling, general applicability, and finite-time guarantees. These results show scalable welfare optimization is achievable under minimal communication constraints.
翻译:多智能体系统中的高效协调通常伴随着高昂的通信开销或缓慢的收敛速度,这使得可扩展的福利优化变得困难。我们提出了用于帕累托高效结果的单比特协调动力学(SBC-PE),这是一种去中心化学习算法,每轮每个智能体仅需一个单比特的满意度信号。尽管这种效率极高,SBC-PE仍能保证在任意有限博弈中收敛到精确的最优解。我们建立了明确的遗憾界,表明期望遗憾仅随时间范围对数增长,即O(log T)。与先前的基于收益或赌博机式规则相比,SBC-PE独特地结合了最小化信号传输、普遍适用性和有限时间保证。这些结果表明,在最小通信约束下,可扩展的福利优化是可以实现的。