Enforcing a fair workload allocation among multiple agents tasked to achieve an objective in learning enabled demand side healthcare worker settings is crucial for consistent and reliable performance at runtime. Existing multi-agent reinforcement learning (MARL) approaches steer fairness by shaping reward through post hoc orchestrations, leaving no certifiable self-enforceable fairness that is immutable by individual agents at runtime. Contextualized within a setting where each agent shares resources with others, we address this shortcoming with a learning enabled optimization scheme among self-interested decision makers whose individual actions affect those of other agents. This extends the problem to a generalized Nash equilibrium (GNE) game-theoretic framework where we steer group policy to a safe and locally efficient equilibrium, so that no agent can improve its utility function by unilaterally changing its decisions. Fair-GNE models MARL as a constrained generalized Nash equilibrium-seeking (GNE) game, prescribing an ideal equitable collective equilibrium within the problem's natural fabric. Our hypothesis is rigorously evaluated in our custom-designed high-fidelity resuscitation simulator. Across all our numerical experiments, Fair-GNE achieves significant improvement in workload balance over fixed-penalty baselines (0.89 vs.\ 0.33 JFI, $p < 0.01$) while maintaining 86\% task success, demonstrating statistically significant fairness gains through adaptive constraint enforcement. Our results communicate our formulations, evaluation metrics, and equilibrium-seeking innovations in large multi-agent learning-based healthcare systems with clarity and principled fairness enforcement.
翻译:在基于学习的需求侧医护人员配置中,为达成共同目标的多智能体间实施公平的工作负载分配,对于运行时一致且可靠的性能至关重要。现有的多智能体强化学习(MARL)方法通过事后编排调整奖励来引导公平性,但缺乏可证明的、运行时不受单个智能体篡改的自我强制执行公平性。本文基于智能体间共享资源的场景,通过自利决策者间的学习优化方案弥补这一不足,其中每个智能体的个体行动会影响其他智能体。这将问题扩展至广义纳什均衡(GNE)博弈论框架,我们引导群体策略达到安全且局部高效的均衡状态,使得任何智能体都无法通过单方面改变决策来提升其效用函数。Fair-GNE 将 MARL 建模为约束广义纳什均衡寻求(GNE)博弈,在问题固有结构中规定理想且公平的集体均衡。我们的假设在自主设计的高保真复苏模拟器中得到严格验证。在所有数值实验中,Fair-GNE 相较于固定惩罚基线方法(JFI 指数 0.89 对 0.33,$p < 0.01$)在工作负载平衡方面取得显著改善,同时保持 86% 的任务成功率,通过自适应约束执行实现了统计显著的公平性提升。本研究清晰阐述了大型多智能体学习医疗系统中的建模框架、评估指标及均衡寻求创新机制,并展示了原则性公平执行的有效性。