Highly dynamic mobile ad-hoc networks (MANETs) remain as one of the most challenging environments to develop and deploy robust, efficient, and scalable routing protocols. In this paper, we present DeepCQ+ routing protocol which, in a novel manner integrates emerging multi-agent deep reinforcement learning (MADRL) techniques into existing Q-learning-based routing protocols and their variants and achieves persistently higher performance across a wide range of topology and mobility configurations. While keeping the overall protocol structure of the Q-learning-based routing protocols, DeepCQ+ replaces statically configured parameterized thresholds and hand-written rules with carefully designed MADRL agents such that no configuration of such parameters is required a priori. Extensive simulation shows that DeepCQ+ yields significantly increased end-to-end throughput with lower overhead and no apparent degradation of end-to-end delays (hop counts) compared to its Q-learning based counterparts. Qualitatively, and perhaps more significantly, DeepCQ+ maintains remarkably similar performance gains under many scenarios that it was not trained for in terms of network sizes, mobility conditions, and traffic dynamics. To the best of our knowledge, this is the first successful application of the MADRL framework for the MANET routing problem that demonstrates a high degree of scalability and robustness even under environments that are outside the trained range of scenarios. This implies that our MARL-based DeepCQ+ design solution significantly improves the performance of Q-learning based CQ+ baseline approach for comparison and increases its practicality and explainability because the real-world MANET environment will likely vary outside the trained range of MANET scenarios. Additional techniques to further increase the gains in performance and scalability are discussed.
翻译:高度动态的移动性在线网络(MANETs)仍然是开发和部署强大、高效和可缩放的路线规程最困难的环境之一。在本文件中,我们介绍了DeepC ⁇ 路由规程协议,以新颖的方式将新兴的多剂深度强化学习(MADRL)技术纳入现有的基于Q-学习的路线规程及其变体,并在广泛的地形和流动配置中取得持续的更高性能。虽然保持基于Q学习的路线比较规程的总体协议结构,但DeepC ⁇ 以精心设计的MADRL 代理器取代静态配置的参数化阈值和手写规则,因此不需要事先配置此类参数。 广泛的模拟表明,DeepC ⁇ 生成的端到端到端到端的强度技术,与基于学习的对应方相比,最终到端到端的延迟(hop 计数)的性能持续提高。 即便在质量上,也许更显著的是,DeepC ⁇ 在很多假设下保持了类似的性绩效增益,因为它在网络规模上没有经过大量培训的基线性设计,因此, MADRL 流流流化环境的性能条件可能显示,在高度环境下得到更好的改进。