Data packet routing in aeronautical ad-hoc networks (AANETs) is challenging due to their high-dynamic topology. In this paper, we invoke deep reinforcement learning for routing in AANETs aiming at minimizing the end-to-end (E2E) delay. Specifically, a deep Q-network (DQN) is conceived for capturing the relationship between the optimal routing decision and the local geographic information observed by the forwarding node. The DQN is trained in an offline manner based on historical flight data and then stored by each airplane for assisting their routing decisions during flight. To boost the learning efficiency and the online adaptability of the proposed DQN-routing, we further exploit the knowledge concerning the system's dynamics by using a deep value network (DVN) conceived with a feedback mechanism. Our simulation results show that both DQN-routing and DVN-routing achieve lower E2E delay than the benchmark protocol, and DVN-routing performs similarly to the optimal routing that relies on perfect global information.
翻译:航空特别热量网络(AANETs)的数据包路径因其高动态地形学而具有挑战性。在本文中,我们为AANETs的路径学援引了深度强化学习,目的是尽量减少端到端的延迟。具体地说,设计了一个深Q网络(DQN),以捕捉最佳路径决定与转发节点所观测的当地地理信息之间的关系。DQN是根据历史飞行数据进行离线培训的,然后由每架飞机储存,以协助其在飞行期间的路径决策。为了提高拟议的DQN路径学效率和在线适应性,我们进一步利用与反馈机制构想的深值网络(DVN)有关系统动态的知识。我们的模拟结果表明,DQN路由和DVN路径学都比基准协议的延迟更低,DVN路径学的运行与依赖完美的全球信息的最佳路径学相似。