Autonomous spacecraft control for mission phases such as launch, ascent, stage separation, and orbit insertion remains a critical challenge due to the need for adaptive policies that generalize across dynamically distinct regimes. While reinforcement learning (RL) has shown promise in individual astrodynamics tasks, existing approaches often require separate policies for distinct mission phases, limiting adaptability and increasing operational complexity. This work introduces a transformer-based RL framework that unifies multi-phase trajectory optimization through a single policy architecture, leveraging the transformer's inherent capacity to model extended temporal contexts. Building on proximal policy optimization (PPO), our framework replaces conventional recurrent networks with a transformer encoder-decoder structure, enabling the agent to maintain coherent memory across mission phases spanning seconds to minutes during critical operations. By integrating a Gated Transformer-XL (GTrXL) architecture, the framework eliminates manual phase transitions while maintaining stability in control decisions. We validate our approach progressively: first demonstrating near-optimal performance on single-phase benchmarks (double integrator and Van der Pol oscillator), then extending to multiphase waypoint navigation variants, and finally tackling a complex multiphase rocket ascent problem that includes atmospheric flight, stage separation, and vacuum operations. Results demonstrate that the transformer-based framework not only matches analytical solutions in simple cases but also effectively learns coherent control policies across dynamically distinct regimes, establishing a foundation for scalable autonomous mission planning that reduces reliance on phase-specific controllers while maintaining compatibility with safety-critical verification protocols.
翻译:针对发射、上升、级间分离与轨道注入等任务阶段,自主航天器控制仍面临关键挑战,这主要源于需要具备跨动态特性迥异区域的自适应策略。尽管强化学习在单一航天动力学任务中展现出潜力,现有方法通常需为不同任务阶段设计独立策略,限制了适应性并增加了操作复杂性。本研究提出一种基于Transformer的强化学习框架,通过单一策略架构统一多阶段轨迹优化,利用Transformer固有的扩展时序上下文建模能力。该框架以近端策略优化为基础,采用Transformer编码器-解码器结构替代传统循环网络,使智能体能够在关键操作中维持跨越数秒至数分钟任务阶段的连贯记忆。通过集成门控Transformer-XL架构,该框架在保持控制决策稳定性的同时消除了人工阶段切换需求。我们通过渐进式实验验证方法:首先在单阶段基准任务(双积分器和Van der Pol振荡器)中展示接近最优的性能,随后扩展至多阶段航点导航变体,最终攻克包含大气飞行、级间分离与真空操作的多阶段火箭上升复杂问题。结果表明,基于Transformer的框架不仅在简单案例中匹配解析解,还能有效学习跨动态特性迥异区域的连贯控制策略,为可扩展的自主任务规划奠定基础,在保持与安全关键验证协议兼容性的同时,降低对阶段专用控制器的依赖。