In this paper, we investigate the problem of fast spectrum sharing in vehicle-to-everything communication. In order to improve the spectrum efficiency of the whole system, the spectrum of vehicle-to-infrastructure links is reused by vehicle-to-vehicle links. To this end, we model it as a problem of deep reinforcement learning and tackle it with proximal policy optimization. A considerable number of interactions are often required for training an agent with good performance, so simulation-based training is commonly used in communication networks. Nevertheless, severe performance degradation may occur when the agent is directly deployed in the real world, even though it can perform well on the simulator, due to the reality gap between the simulation and the real environments. To address this issue, we make preliminary efforts by proposing an algorithm based on meta reinforcement learning. This algorithm enables the agent to rapidly adapt to a new task with the knowledge extracted from similar tasks, leading to fewer interactions and less training time. Numerical results show that our method achieves near-optimal performance and exhibits rapid convergence.
翻译:暂无翻译