斯托查斯空间飞行运动设计等级强化学习框架 (Hierarchical Reinforcement Learning Framework for Stochastic Spaceflight Campaign Design)

from arxiv, 31 pages, 4 figures, a former version was presented at the AIAA ASCEND conference 2020, under review by the Journal of Spacecraft and Rockets

This paper develops a hierarchical reinforcement learning architecture for multi-mission spaceflight campaign design under uncertainty, including vehicle design, infrastructure deployment planning, and space transportation scheduling. This problem involves a high-dimensional design space and is challenging especially with uncertainty present. To tackle this challenge, the developed framework has a hierarchical structure with reinforcement learning (RL) and network-based mixed-integer linear programming (MILP), where the former optimizes campaign-level decisions (e.g., design of the vehicle used throughout the campaign, destination demand assigned to each mission in the campaign), whereas the latter optimizes the detailed mission-level decisions (e.g., when to launch what from where to where). The framework is applied to a set of human lunar exploration campaign scenarios with uncertain in-situ resource utilization (ISRU) performance as a case study. The main value of this work is its integration of the rapidly growing RL research and the existing MILP-based space logistics methods through a hierarchical framework to handle the otherwise intractable complexity of space mission design under uncertainty. We expect this unique framework to be a critical steppingstone for the emerging research direction of artificial intelligence for space mission design.

翻译：本文为不确定的多飞行任务空间飞行运动设计开发了等级强化学习结构,包括车辆设计、基础设施部署规划和空间运输时间安排,这一问题涉及高维设计空间,尤其具有挑战性,为了应对这一挑战,已开发的框架有一个等级结构,包括强化学习和网络混合内线编程(MILP),前者优化了运动一级决定(例如,在整个运动期间使用的车辆的设计、分配给每个飞行任务的目的地需求),而后者优化了详细的飞行任务一级决定(例如,何时从何处发射到何处),该框架作为案例研究,适用于一套在现场利用资源方面表现不确定的人类月球探索活动设想方案。这项工作的主要价值是,通过等级框架将快速增长的RL研究和现有的以MILP为基础的空间后勤方法结合起来,以便在不确定的情况下处理空间飞行任务设计中本属棘手的复杂问题。我们期望这一独特框架将成为空间飞行任务设计人造情报研究方向的关键性基石。