面向自主多卫星地球观测的多智能体强化学习：一项现实案例研究 (Multi-Agent Reinforcement Learning for Autonomous Multi-Satellite Earth Observation: A Realistic Case Study)

The exponential growth of Low Earth Orbit (LEO) satellites has revolutionised Earth Observation (EO) missions, addressing challenges in climate monitoring, disaster management, and more. However, autonomous coordination in multi-satellite systems remains a fundamental challenge. Traditional optimisation approaches struggle to handle the real-time decision-making demands of dynamic EO missions, necessitating the use of Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL). In this paper, we investigate RL-based autonomous EO mission planning by modelling single-satellite operations and extending to multi-satellite constellations using MARL frameworks. We address key challenges, including energy and data storage limitations, uncertainties in satellite observations, and the complexities of decentralised coordination under partial observability. By leveraging a near-realistic satellite simulation environment, we evaluate the training stability and performance of state-of-the-art MARL algorithms, including PPO, IPPO, MAPPO, and HAPPO. Our results demonstrate that MARL can effectively balance imaging and resource management while addressing non-stationarity and reward interdependency in multi-satellite coordination. The insights gained from this study provide a foundation for autonomous satellite operations, offering practical guidelines for improving policy learning in decentralised EO missions.

翻译：低地球轨道（LEO）卫星的指数级增长已彻底改变了地球观测（EO）任务，有效应对了气候监测、灾害管理等方面的挑战。然而，多卫星系统中的自主协调仍是一个根本性难题。传统优化方法难以满足动态EO任务对实时决策的需求，因此需要采用强化学习（RL）和多智能体强化学习（MARL）。本文通过建模单卫星运行机制，并扩展至基于MARL框架的多卫星星座系统，研究了基于RL的自主EO任务规划方法。我们重点解决了能量与数据存储限制、卫星观测不确定性，以及部分可观测条件下分散式协调的复杂性等关键挑战。借助接近真实的卫星仿真环境，我们评估了包括PPO、IPPO、MAPPO和HAPPO在内的前沿MARL算法的训练稳定性与性能。结果表明，MARL能有效平衡成像任务与资源管理，同时处理多卫星协调中的非平稳性与奖励相互依赖问题。本研究为自主卫星运行奠定了理论基础，并为提升分散式EO任务中的策略学习提供了实用指导。