This paper studies the problem of minimizing the age of information (AoI) in cellular vehicle-to-everything communications. To provide minimal AoI and high reliability for vehicles' safety information, NOMA is exploited. We reformulate a resource allocation problem that involves half-duplex transceiver selection, broadcast coverage optimization, power allocation, and resource block scheduling. First, to obtain the optimal solution, we formulate the problem as a mixed-integer nonlinear programming problem and then study its NP-hardness. The NP-hardness result motivates us to design simple solutions. Consequently, we model the problem as a single-agent Markov decision process to solve the problem efficiently using fingerprint deep reinforcement learning techniques such as deep-Q-network (DQN) methods. Nevertheless, applying DQN is not straightforward due to the curse of dimensionality implied by the large and mixed action space that contains discrete and continuous optimization decisions. Therefore, to solve this mixed discrete/continuous problem efficiently, simply and elegantly, we propose a decomposition technique that consists of first solving the discrete subproblem using a matching algorithm based on state-of-the-art stable roommate matching and then solving the continuous subproblem using DRL algorithm that is based on deep deterministic policy gradient DDPG. We validate our proposed method through Monte Carlo simulations where we show that the decomposed matching and DRL algorithm successfully minimizes the AoI and achieves almost 66% performance gain compared to the best benchmarks for various vehicles' speeds, transmission power, or packet sizes. Further, we prove the existence of an optimal value of broadcast coverage at which the learning algorithm provides the optimal AoI.
翻译:本文研究在移动车辆到每个车辆之间通信中信息年龄最小化(AoI)的问题。为了提供最低AoI和车辆安全信息的高度可靠性,我们开发了NOMA。我们重新定义了一个资源分配问题,它涉及半多式收发器选择、广播覆盖面优化、电力分配和资源块时间安排。首先,为了获得最佳解决方案,我们将问题发展成一个混合的内联非线性非线性编程问题,然后研究其NP-硬度问题。由于NP-硬度的比较结果促使我们设计简单的解决方案。因此,为了提供最低的AOI和车辆到每个车辆的安全信息,我们将问题模拟成一个单一的代理商Markov决定程序,以便利用深Q网络(DQ网络)方法等指纹深强化学习技术来有效解决问题。然而,应用DQN并不是直接的,因为包含离离散和连续和连续优化决定的大型混合行动空间所隐含的维度诅咒。因此,我们将问题发展成一个混合的、简单和精度问题,我们建议一个解化技术,即我们更高效、简单和优优巧地解决。我们提议一个解化技术,我们提议,我们提议一个包括首先解决离离的分解的分解的亚分解的亚分解技术,然后用以DIP分解的分解的亚分点技术,即先先先通过一个在DIP点点点的解的、先通过一个基于S级的、用SL级、先通过SL级化方法,用SL级化方法,用SL级、用S级化方法,利用一个以SL的解方法,利用一个以持续的解方法,用SL的解的方法,用S级、先先先先先先先在目前的、先先先先先先在SLL的、先先先在SLL的解的、先在DL级、先先先先先先在DL的、先在DL的解的解的解的解的解的解的、先在DL的解的、先先先先先先先先先先先先先先先先先先先先先先先先先先先先在DL的解的解的、再先先先先先先先先先先先先先先先先先