In the rapidly evolving field of multi-agent reinforcement learning (MARL), understanding the dynamics of open systems is crucial. Openness in MARL refers to the dynam-ic nature of agent populations, tasks, and agent types with-in a system. Specifically, there are three types of openness as reported in (Eck et al. 2023) [2]: agent openness, where agents can enter or leave the system at any time; task openness, where new tasks emerge, and existing ones evolve or disappear; and type openness, where the capabil-ities and behaviors of agents change over time. This report provides a conceptual and empirical review, focusing on the interplay between openness and the credit assignment problem (CAP). CAP involves determining the contribution of individual agents to the overall system performance, a task that becomes increasingly complex in open environ-ments. Traditional credit assignment (CA) methods often assume static agent populations, fixed and pre-defined tasks, and stationary types, making them inadequate for open systems. We first conduct a conceptual analysis, in-troducing new sub-categories of openness to detail how events like agent turnover or task cancellation break the assumptions of environmental stationarity and fixed team composition that underpin existing CAP methods. We then present an empirical study using representative temporal and structural algorithms in an open environment. The results demonstrate that openness directly causes credit misattribution, evidenced by unstable loss functions and significant performance degradation.
翻译:在多智能体强化学习(MARL)这一快速发展的领域中,理解开放系统的动态特性至关重要。MARL中的开放性指的是系统内智能体群体、任务及智能体类型的动态特性。具体而言,根据(Eck等人,2023)[2]的研究,开放性可分为三类:智能体开放性——智能体可随时加入或离开系统;任务开放性——新任务不断涌现,现有任务则演化或消失;类型开放性——智能体的能力与行为随时间动态变化。本报告通过概念性与实证性综述,聚焦于开放性与信用分配问题(CAP)之间的相互作用。CAP涉及评估单个智能体对整体系统性能的贡献,这一任务在开放环境中变得日益复杂。传统信用分配(CA)方法通常假设静态的智能体群体、固定且预定义的任务以及稳定的智能体类型,使其难以适用于开放系统。我们首先进行概念分析,通过引入开放性的新子类别,详细阐述智能体更替或任务取消等事件如何打破现有CAP方法所依赖的环境稳态与固定团队构成的假设。随后,我们通过在开放环境中使用代表性的时序与结构算法开展实证研究。结果表明,开放性直接导致信用错误归因,具体表现为损失函数的不稳定性与系统性能的显著下降。