Tether-net launched from a chaser spacecraft provides a promising method to capture and dispose of large space debris in orbit. This tether-net system is subject to several sources of uncertainty in sensing and actuation that affect the performance of its net launch and closing control. Earlier reliability-based optimization approaches to design control actions however remain challenging and computationally prohibitive to generalize over varying launch scenarios and target (debris) state relative to the chaser. To search for a general and reliable control policy, this paper presents a reinforcement learning framework that integrates a proximal policy optimization (PPO2) approach with net dynamics simulations. The latter allows evaluating the episodes of net-based target capture, and estimate the capture quality index that serves as the reward feedback to PPO2. Here, the learned policy is designed to model the timing of the net closing action based on the state of the moving net and the target, under any given launch scenario. A stochastic state transition model is considered in order to incorporate synthetic uncertainties in state estimation and launch actuation. Along with notable reward improvement during training, the trained policy demonstrates capture performance (over a wide range of launch/target scenarios) that is close to that obtained with reliability-based optimization run over an individual scenario.
翻译:从追星航天器上发射的绳子网提供了一种捕捉和处理轨道中大型空间碎片的有希望的方法。系绳网系统在影响其净发射和封闭控制性能的感测和激活方面受到若干不确定因素的影响,在设计控制行动的早期基于可靠性的优化方法方面仍然具有挑战性,在计算上难以比拟与追星者相比的不同发射情景和目标(碎片)状态。为寻求一项普遍和可靠的控制政策,本文件提出了一个强化学习框架,将准政策优化(PPPO2)方法与净动态模拟结合起来。后者允许评估基于净目标的捕捉过程,并估计作为给PPPO2的奖励反馈的捕捉质量指数。在这里,所学的政策旨在根据任何特定发射设想的网络和目标(碎片)状况来模拟净关闭行动的时机。为了将合成的不确定性纳入国家估计和启动行动。除了在培训期间显著的奖励性改进外,经过培训的政策还展示了业绩(在广泛的发射/目标假设情景上),并近近近于单个的可靠度。