学习说什么以及多精确：通过可微分离散通信学习实现高效通信 (Learning what to say and how precisely: Efficient Communication via Differentiable Discrete Communication Learning)

Effective communication in multi-agent reinforcement learning (MARL) is critical for success but constrained by bandwidth, yet past approaches have been limited to complex gating mechanisms that only decide \textit{whether} to communicate, not \textit{how precisely}. Learning to optimize message precision at the bit-level is fundamentally harder, as the required discretization step breaks gradient flow. We address this by generalizing Differentiable Discrete Communication Learning (DDCL), a framework for end-to-end optimization of discrete messages. Our primary contribution is an extension of DDCL to support unbounded signals, transforming it into a universal, plug-and-play layer for any MARL architecture. We verify our approach with three key results. First, through a qualitative analysis in a controlled environment, we demonstrate \textit{how} agents learn to dynamically modulate message precision according to the informational needs of the task. Second, we integrate our variant of DDCL into four state-of-the-art MARL algorithms, showing it reduces bandwidth by over an order of magnitude while matching or exceeding task performance. Finally, we provide direct evidence for the \enquote{Bitter Lesson} in MARL communication: a simple Transformer-based policy leveraging DDCL matches the performance of complex, specialized architectures, questioning the necessity of bespoke communication designs.

翻译：在多智能体强化学习（MARL）中，有效通信对于成功至关重要，但受带宽限制；然而，以往方法仅限于复杂的门控机制，仅决定是否通信，而非通信的精确程度。在比特级别学习优化消息精度本质上更为困难，因为所需的离散化步骤会中断梯度流。我们通过推广可微分离散通信学习（DDCL）框架来解决这一问题，该框架支持离散消息的端到端优化。我们的主要贡献是将DDCL扩展至支持无界信号，将其转化为适用于任何MARL架构的通用即插即用层。我们通过三项关键结果验证了该方法。首先，在受控环境中进行定性分析，我们展示了智能体如何根据任务的信息需求动态调整消息精度。其次，我们将DDCL的变体集成到四种最先进的MARL算法中，结果表明其在匹配或超越任务性能的同时，将带宽降低了一个数量级以上。最后，我们为MARL通信中的“苦涩教训”提供了直接证据：基于Transformer的简单策略结合DDCL，其性能与复杂专用架构相当，这质疑了定制化通信设计的必要性。