This paper proposes a novel scalable reinforcement learning approach for simultaneous routing and spectrum access in wireless ad-hoc networks. In most previous works on reinforcement learning for network optimization, the network topology is assumed to be fixed, and a different agent is trained for each transmission node -- this limits scalability and generalizability. Further, routing and spectrum access are typically treated as separate tasks. Moreover, the optimization objective is usually a cumulative metric along the route, e.g., number of hops or delay. In this paper, we account for the physical-layer signal-to-interference-plus-noise ratio (SINR) in a wireless network and further show that bottleneck objective such as the minimum SINR along the route can also be optimized effectively using reinforcement learning. Specifically, we propose a scalable approach in which a single agent is associated with each flow and makes routing and spectrum access decisions as it moves along the frontier nodes. The agent is trained according to the physical-layer characteristics of the environment using a novel rewarding scheme based on the Monte Carlo estimation of the future bottleneck SINR. It learns to avoid interference by intelligently making joint routing and spectrum allocation decisions based on the geographical location information of the neighbouring nodes.
翻译:本文为无线临时热点网络的同步路由和频谱接入提出了一个新的可扩展强化学习方法。在以前大多数关于网络优化强化学习的工程中,网络地形学假定是固定的,为每个传输节点培训了一个不同的代理器 -- -- 这限制了可缩放性和可概括性。此外,路径和频谱接入通常被视为单独的任务。此外,优化目标通常是沿路径的累积指标,例如跳跃或延迟的次数。在本文中,我们核算无线网络中的物理-层信号对干涉+噪音比率(SINR),并进一步显示,还可以利用强化学习来优化诸如最低限度SINR这样的瓶颈目标。具体地说,我们提出了一种可扩展的方法,即单个代理器与每种流动相联系,并在沿前沿节点移动时作出路路由和频谱访问决定。该代理器根据环境的物理-层特征进行了培训。我们使用基于对未来信箱SINR(SINR)的蒙特卡洛估计的新型奖赏计划,进一步表明,通过光谱位置分配来避免联合干扰。