We study a delay-constrained grant-free random access system with a multi-antenna base station. The users randomly generate data packets with expiration deadlines, which are then transmitted from data queues on a first-in first-out basis. To deliver a packet, a user needs to succeed in both random access phase (sending a pilot without collision) and data transmission phase (achieving a required data rate with imperfect channel information) before the packet expires. We develop a distributed, cross-layer policy that allows the users to dynamically and independently choose their pilots and transmit powers to achieve a high effective sum throughput with fairness consideration. Our policy design involves three key components: 1) a proxy of the instantaneous data rate that depends only on macroscopic environment variables and transmission decisions, considering pilot collisions and imperfect channel estimation; 2) a quantitative, instantaneous measure of fairness within each communication round; and 3) a deep learning-based, multi-agent control framework with centralized training and distributed execution. The proposed framework benefits from an accurate, differentiable objective function for training, thereby achieving a higher sample efficiency compared with a conventional application of model-free, multi-agent reinforcement learning algorithms. The performance of the proposed approach is verified by simulations under highly dynamic and heterogeneous scenarios.
翻译:暂无翻译