In this paper, we propose a federated deep reinforcement learning framework to solve a multi-objective optimization problem, where we consider minimizing the expected long-term task completion delay and energy consumption of IoT devices. This is done by optimizing offloading decisions, computation resource allocation, and transmit power allocation. Since the formulated problem is a mixed-integer non-linear programming (MINLP), we first cast our problem as a multi-agent distributed deep reinforcement learning (DRL) problem and address it using double deep Q-network (DDQN), where the actions are offloading decisions. The immediate cost of each agent is calculated through solving either the transmit power optimization or local computation resource optimization, based on the selected offloading decisions (actions). Then, to enhance the learning speed of IoT devices (agents), we incorporate federated learning (FDL) at the end of each episode. FDL enhances the scalability of the proposed DRL framework, creates a context for cooperation between agents, and minimizes their privacy concerns. Our numerical results demonstrate the efficacy of our proposed federated DDQN framework in terms of learning speed compared to federated deep Q network (DQN) and non-federated DDQN algorithms. In addition, we investigate the impact of batch size, network layers, DDQN target network update frequency on the learning speed of the FDL.
翻译:在本文中,我们提出一个联合的深强化学习框架,以解决多目标优化问题,即考虑最大限度地减少预期的长期任务完成延迟和IoT装置的能源消耗。这是通过优化卸载决定、计算资源分配和传输电力分配来完成的。由于所提出的问题是一个混合整数非线性编程(MILP),我们首先将问题作为一个多试剂的分布式深强化学习(DRL)问题,并使用双深Q网络(DDQN)来解决,在那里,行动正在卸载决定。我们根据选定的卸载决定(行动),通过解决传输电源优化或本地计算资源优化来计算每种代理的直接成本。然后,为了提高IoT装置(代理)的学习速度,我们在每集成末都采用联式学习(FDL)框架的伸缩性,为代理方之间的合作环境,并最大限度地减少其隐私问题。我们的数字结果显示,我们所拟议的配制的DDQN网络速度框架在深度学习速度方面,我们所加的DDDQ网络,对DDQ目标网络的升级速度进行深入调查。