平衡专业化与集中化：面向序列工业控制的多智能体强化学习基准 (Balancing Specialization and Centralization: A Multi-Agent Reinforcement Learning Benchmark for Sequential Industrial Control)

from arxiv, Preprint (submitted version) to be presented at the 13th International Conference on Industrial Engineering and Applications (ICIEA-EU), Milan, 2026. The final Version of Record will appear in the official conference proceedings

Autonomous control of multi-stage industrial processes requires both local specialization and global coordination. Reinforcement learning (RL) offers a promising approach, but its industrial adoption remains limited due to challenges such as reward design, modularity, and action space management. Many academic benchmarks differ markedly from industrial control problems, limiting their transferability to real-world applications. This study introduces an enhanced industry-inspired benchmark environment that combines tasks from two existing benchmarks, SortingEnv and ContainerGym, into a sequential recycling scenario with sorting and pressing operations. We evaluate two control strategies: a modular architecture with specialized agents and a monolithic agent governing the full system, while also analyzing the impact of action masking. Our experiments show that without action masking, agents struggle to learn effective policies, with the modular architecture performing better. When action masking is applied, both architectures improve substantially, and the performance gap narrows considerably. These results highlight the decisive role of action space constraints and suggest that the advantages of specialization diminish as action complexity is reduced. The proposed benchmark thus provides a valuable testbed for exploring practical and robust multi-agent RL solutions in industrial automation, while contributing to the ongoing debate on centralization versus specialization.

翻译：多阶段工业过程的自主控制既需要局部专业化，也需要全局协调。强化学习（RL）提供了一种有前景的途径，但由于奖励设计、模块化和动作空间管理等挑战，其在工业中的应用仍然有限。许多学术基准与工业控制问题存在显著差异，限制了其向实际应用的迁移性。本研究引入了一个增强的、受工业启发的基准环境，它将来自两个现有基准（SortingEnv 和 ContainerGym）的任务结合到一个包含分拣和压缩操作的序列回收场景中。我们评估了两种控制策略：一种采用具有专业化智能体的模块化架构，另一种采用管理整个系统的单体智能体，同时还分析了动作掩码的影响。我们的实验表明，在没有动作掩码的情况下，智能体难以学习有效的策略，其中模块化架构表现更好。当应用动作掩码时，两种架构的性能都得到显著提升，且性能差距大大缩小。这些结果突显了动作空间约束的决定性作用，并表明随着动作复杂性的降低，专业化的优势会减弱。因此，所提出的基准为探索工业自动化中实用且鲁棒的多智能体强化学习解决方案提供了一个有价值的测试平台，同时也为关于集中化与专业化的持续讨论做出了贡献。