分散式强化学习促进隐私保护动态边缘缓存 (Distributed Reinforcement Learning for Privacy-Preserving Dynamic Edge Caching)

Mobile edge computing (MEC) is a prominent computing paradigm which expands the application fields of wireless communication. Due to the limitation of the capacities of user equipments and MEC servers, edge caching (EC) optimization is crucial to the effective utilization of the caching resources in MEC-enabled wireless networks. However, the dynamics and complexities of content popularities over space and time as well as the privacy preservation of users pose significant challenges to EC optimization. In this paper, a privacy-preserving distributed deep deterministic policy gradient (P2D3PG) algorithm is proposed to maximize the cache hit rates of devices in the MEC networks. Specifically, we consider the fact that content popularities are dynamic, complicated and unobservable, and formulate the maximization of cache hit rates on devices as distributed problems under the constraints of privacy preservation. In particular, we convert the distributed optimizations into distributed model-free Markov decision process problems and then introduce a privacy-preserving federated learning method for popularity prediction. Subsequently, a P2D3PG algorithm is developed based on distributed reinforcement learning to solve the distributed problems. Simulation results demonstrate the superiority of the proposed approach in improving EC hit rate over the baseline methods while preserving user privacy.

翻译：移动边缘计算(MEC)是一个突出的计算模式,它扩大了无线通信的应用领域。由于用户设备和MEC服务器的能力有限,边缘缓冲(EC)优化对于有效使用MEC驱动的无线网络的缓存资源至关重要。然而,在空间和时间方面内容的流行动态和复杂性以及用户的隐私保护对EC优化构成了重大挑战。在本文中,提出了一种保护隐私的分布式深层确定性政策梯度(P2D3PG)算法,以最大限度地实现MEC网络设备缓存冲击率。具体地说,我们认为,内容的普及性是动态的、复杂的和不易观测的,并制定了在隐私保护的制约下将设备作为分布式的问题的存储速率最大化。特别是,我们将分布式优化转化为分散式无模式的Markov决策程序,然后引入一种保密的、节能的学习方法,用于公众化预测。随后,根据分散式强化学习来解决分布式的问题,制定了一种P2D3PGG值。模拟结果显示,在提高用户隐私率的同时,在提高基准率方面,同时保持用户的保密率。