The exponential growth of Low Earth Orbit (LEO) satellites has revolutionised Earth Observation (EO) missions, addressing challenges in climate monitoring, disaster management, and more. However, autonomous coordination in multi-satellite systems remains a fundamental challenge. Traditional optimisation approaches struggle to handle the real-time decision-making demands of dynamic EO missions, necessitating the use of Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL). In this paper, we investigate RL-based autonomous EO mission planning by modelling single-satellite operations and extending to multi-satellite constellations using MARL frameworks. We address key challenges, including energy and data storage limitations, uncertainties in satellite observations, and the complexities of decentralised coordination under partial observability. By leveraging a near-realistic satellite simulation environment, we evaluate the training stability and performance of state-of-the-art MARL algorithms, including PPO, IPPO, MAPPO, and HAPPO. Our results demonstrate that MARL can effectively balance imaging and resource management while addressing non-stationarity and reward interdependency in multi-satellite coordination. The insights gained from this study provide a foundation for autonomous satellite operations, offering practical guidelines for improving policy learning in decentralised EO missions.
翻译:低地球轨道(LEO)卫星的指数级增长已彻底改变了地球观测(EO)任务,有效应对了气候监测、灾害管理等方面的挑战。然而,多卫星系统中的自主协调仍是一个根本性难题。传统优化方法难以满足动态EO任务对实时决策的需求,因此需要采用强化学习(RL)和多智能体强化学习(MARL)。本文通过建模单卫星运行机制,并扩展至基于MARL框架的多卫星星座系统,研究了基于RL的自主EO任务规划方法。我们重点解决了能量与数据存储限制、卫星观测不确定性,以及部分可观测条件下分散式协调的复杂性等关键挑战。借助接近真实的卫星仿真环境,我们评估了包括PPO、IPPO、MAPPO和HAPPO在内的前沿MARL算法的训练稳定性与性能。结果表明,MARL能有效平衡成像任务与资源管理,同时处理多卫星协调中的非平稳性与奖励相互依赖问题。本研究为自主卫星运行奠定了理论基础,并为提升分散式EO任务中的策略学习提供了实用指导。