延迟受限制的异质无线网络的强化学习随机访问:两用案例 (Reinforcement Learning Random Access for Delay-Constrained Heterogeneous Wireless Networks: A Two-User Case)

In this paper, we investigate the random access problem for a delay-constrained heterogeneous wireless network. As a first attempt to study this new problem, we consider a network with two users who deliver delay-constrained traffic to an access point (AP) via a common unreliable collision wireless channel. We assume that one user (called user 1) adopts ALOHA and we optimize the random access scheme of the other user (called user 2). The most intriguing part of this problem is that user 2 does not know the information of user 1 but needs to maximize the system timely throughput. Such a paradigm of collaboratively sharing spectrum is envisioned by DARPA to better dynamically match the supply and demand in the future [1], [2]. We first propose a Markov Decision Process (MDP) formulation to derive a modelbased upper bound, which can quantify the performance gap of any designed schemes. We then utilize reinforcement learning (RL) to design an R-learning-based [3]-[5] random access scheme, called TSRA. We finally carry out extensive simulations to show that TSRA achieves close-to-upper-bound performance and better performance than the existing baseline DLMA [6], which is our counterpart scheme for delay-unconstrained heterogeneous wireless network. All source code is publicly available in https://github.com/DanzhouWu/TSRA.

翻译：在本文中,我们调查了受延迟限制的多式无线网络的随机访问问题。作为研究这一新问题的第一次尝试,我们考虑与两个用户建立网络,这两个用户通过共同的不可靠碰撞无线频道向一个接入点提供受延迟限制的交通。我们假设一个用户(称为用户1)采用ALOHA,我们优化了另一个用户(称为用户2)的随机访问计划。这个问题最令人感兴趣的部分是,用户2不知道用户1的信息,但需要尽量扩大系统及时输送。DARPA设想了一种协作共享频谱的范例,以便在未来[1,[2]更好地动态匹配供需。我们首先提出一个Markov决定程序(MDP),以制作一个基于模型的上限,可以量化任何设计计划(称为用户2)的绩效差距。我们随后利用强化学习(RLL)设计一个基于R-学习的[3][5]随机访问计划。我们最后进行了广泛的模拟,以显示TRA实现近至上调的绩效和更好的未来供求[1,[2]。我们首先提出模型的MAR/FALMA系统,这是我们现有的无线/CRUFIFIFSUFSU的系统。