As space becomes more congested, on orbit inspection is an increasingly relevant activity whether to observe a defunct satellite for planning repairs or to de-orbit it. However, the task of on orbit inspection itself is challenging, typically requiring the careful coordination of multiple observer satellites. This is complicated by a highly nonlinear environment where the target may be unknown or moving unpredictably without time for continuous command and control from the ground. There is a need for autonomous, robust, decentralized solutions to the inspection task. To achieve this, we consider a hierarchical, learned approach for the decentralized planning of multi-agent inspection of a tumbling target. Our solution consists of two components: a viewpoint or high-level planner trained using deep reinforcement learning and a navigation planner handling point-to-point navigation between pre-specified viewpoints. We present a novel problem formulation and methodology that is suitable not only to reinforcement learning-derived robust policies, but extendable to unknown target geometries and higher fidelity information theoretic objectives received directly from sensor inputs. Operating under limited information, our trained multi-agent high-level policies successfully contextualize information within the global hierarchical environment and are correspondingly able to inspect over 90% of non-convex tumbling targets, even in the absence of additional agent attitude control.
翻译:随着空间变得更加拥挤,轨道检查是一项越来越重要的活动,无论是观察停用的多试剂检查的分权计划,以规划修理还是使其脱轨,都是一个越来越重要的活动。然而,轨道检查的任务本身具有挑战性,通常需要多观察员卫星的仔细协调。由于高度非线性环境,目标可能不为人知,或者在没有时间从地面持续指挥和控制的情况下移动出无法预测的高度线性环境,使得情况更加复杂。有必要对视察任务采取自主、稳健、分散的解决办法。为了实现这一目标,我们认为,在多试剂检查的分散规划方面,应当采用等级分级、有学识的方法。我们的解决办法包括两个部分:一个观点或高层次规划员,他们受过训练,使用深层加固学习的学习,一个导航规划员,处理预先确定的观点之间的点对点导航。我们提出了一个新的问题和方法,它不仅适合加强从学习得到的稳健政策,而且适用于从传感器输入中直接获得的不明目标和更高准确性的信息。在有限的信息下操作,我们经过训练的多剂高级政策在高度环境内成功地将信息背景化成背景化,甚至能够在全球等级级环境内进行额外的检查。</s>