为自力和可持续边缘电子计算系统进行多机构元加强学习 (Multi-Agent Meta-Reinforcement Learning for Self-Powered and Sustainable Edge Computing Systems)

The stringent requirements of mobile edge computing (MEC) applications and functions fathom the high capacity and dense deployment of MEC hosts to the upcoming wireless networks. However, operating such high capacity MEC hosts can significantly increase energy consumption. Thus, a base station (BS) unit can act as a self-powered BS. In this paper, an effective energy dispatch mechanism for self-powered wireless networks with edge computing capabilities is studied. First, a two-stage linear stochastic programming problem is formulated with the goal of minimizing the total energy consumption cost of the system while fulfilling the energy demand. Second, a semi-distributed data-driven solution is proposed by developing a novel multi-agent meta-reinforcement learning (MAMRL) framework to solve the formulated problem. In particular, each BS plays the role of a local agent that explores a Markovian behavior for both energy consumption and generation while each BS transfers time-varying features to a meta-agent. Sequentially, the meta-agent optimizes (i.e., exploits) the energy dispatch decision by accepting only the observations from each local agent with its own state information. Meanwhile, each BS agent estimates its own energy dispatch policy by applying the learned parameters from meta-agent. Finally, the proposed MAMRL framework is benchmarked by analyzing deterministic, asymmetric, and stochastic environments in terms of non-renewable energy usages, energy cost, and accuracy. Experimental results show that the proposed MAMRL model can reduce up to 11% non-renewable energy usage and by 22.4% the energy cost (with 95.8% prediction accuracy), compared to other baseline methods.

翻译：移动边缘计算(MEC)应用和功能的严格要求使移动边缘计算(MEC)主机在即将到来的无线网络中的使用能力和密集部署能力都非常高。然而,运行这种高容量的MEC主机可以大大提高能源消耗量。因此,基地站(BS)单位可以发挥自我驱动的BS作用。在本文中,研究具有边际计算能力的自力无线网络的有效能源发送机制。首先,设计一个两阶段线性线性随机编程问题,目标是在满足能源需求的同时尽可能降低系统的能源消耗总成本。第二,通过开发新的多试机元强化学习(MAMRL)框架来解决所设计的问题,可以提出半分配数据驱动的解决办法。特别是,每个基站(BS)发挥当地代理机构的作用,探讨具有超强能源消耗和发电能力的Markovian行为,同时将时间变化特性转移到元试剂模型。相应地,元剂优化(e.e. 利用) 能源发送决定,仅接受每个当地代理机构提出的非观测结果,而将最低能源使用成本框架的B-L 向22个机构展示。