In recent years, with the large-scale deployment of space spacecraft entities and the increase of satellite onboard capabilities, delay/disruption tolerant network (DTN) emerged as a more robust communication protocol than TCP/IP in the case of excessive network dynamics. DTN node buffer management is still an active area of research, as the current implementation of the DTN core protocol still relies on the assumption that there is always enough memory available in different network nodes to store and forward bundles. In addition, the classical queuing theory does not apply to the dynamic management of DTN node buffers. Therefore, this paper proposes a centralized approach to automatically manage cognitive DTN nodes in low earth orbit (LEO) satellite constellation scenarios based on the advanced reinforcement learning (RL) strategy advantage actor-critic (A2C). The method aims to explore training a geosynchronous earth orbit intelligent agent to manage all DTN nodes in an LEO satellite constellation scenario. The goal of the A2C agent is to maximize delivery success rate and minimize network resource consumption cost while considering node memory utilization. The intelligent agent can dynamically adjust the radio data rate and perform drop operations based on bundle priority. In order to measure the effectiveness of applying A2C technology to DTN node management issues in LEO satellite constellation scenarios, this paper compares the trained intelligent agent strategy with the other two non-RL policies, including random and standard policies. Experiments show that the A2C strategy balances delivery success rate and cost, and provides the highest reward and the lowest node memory utilization.
 翻译:近年来,随着空间航天器实体的大规模部署和卫星机载能力的提高,延迟/干扰容忍网络(DTN)在网络动态动态中成为比TCP/IP更强有力的通信协议,在网络动态过度的情况下,DTN节点缓冲管理仍然是一个活跃的研究领域,因为目前DTN核心协议的实施仍然基于以下假设:不同网络节点中总是有足够的记忆可用于储存和转发捆包;此外,传统的排队理论不适用于DTN节点缓冲的动态管理。因此,本文件建议采用集中办法,根据高级强化学习(RL)战略优势于行动方-critic(A2C),自动管理认知DTN节点,自动管理低地轨道节点的认知DTN节点节点节点。该方法旨在探索如何培训地球同步的地球轨道智能剂,以便在低地卫星星座星座情景中管理所有DTN节点。A2代理商的目标是最大限度地实现交付成功率,并尽量减少网络资源消耗成本。因此,智能代理商可以动态地调整低地球轨道交付率战略中的最高成本,并在运行中执行A-A-rod Streal Stormain 的操作中,不显示A-lax