Modern AI systems often comprise multiple learnable components that can be naturally organized as graphs. A central challenge is the end-to-end training of such systems without restrictive architectural or training assumptions. Such tasks fit the theory and approaches of the collaborative Multi-Agent Reinforcement Learning (MARL) field. We introduce Reinforcement Networks, a general framework for MARL that organizes agents as vertices in a directed acyclic graph (DAG). This structure extends hierarchical RL to arbitrary DAGs, enabling flexible credit assignment and scalable coordination while avoiding strict topologies, fully centralized training, and other limitations of current approaches. We formalize training and inference methods for the Reinforcement Networks framework and connect it to the LevelEnv concept to support reproducible construction, training, and evaluation. We demonstrate the effectiveness of our approach on several collaborative MARL setups by developing several Reinforcement Networks models that achieve improved performance over standard MARL baselines. Beyond empirical gains, Reinforcement Networks unify hierarchical, modular, and graph-structured views of MARL, opening a principled path toward designing and training complex multi-agent systems. We conclude with theoretical and practical directions - richer graph morphologies, compositional curricula, and graph-aware exploration. That positions Reinforcement Networks as a foundation for a new line of research in scalable, structured MARL.
翻译:现代人工智能系统通常包含多个可学习组件,这些组件可自然地组织为图结构。一个核心挑战在于对此类系统进行端到端训练,同时避免限制性的架构或训练假设。此类任务符合协作式多智能体强化学习(MARL)领域的理论与方法。我们提出强化网络——一种用于MARL的通用框架,将智能体组织为有向无环图(DAG)中的顶点。该结构将分层强化学习扩展至任意DAG,实现了灵活的信用分配与可扩展的协调机制,同时避免了现有方法中严格的拓扑结构、完全中心化训练等局限性。我们形式化了强化网络框架的训练与推理方法,并将其与LevelEnv概念相结合,以支持可复现的构建、训练与评估流程。通过在多个协作式MARL场景中开发若干强化网络模型,我们证明了该方法相较于标准MARL基线取得了更优的性能。除实证效果外,强化网络统一了MARL的分层化、模块化与图结构视角,为设计与训练复杂多智能体系统开辟了理论化路径。最后我们探讨了理论与应用方向——更丰富的图形态、组合式课程学习以及图感知探索——这些工作将强化网络确立为可扩展结构化MARL新研究方向的基石。