Deep Reinforcement Learning (DRL) solutions are becoming pervasive at the edge of the network as they enable autonomous decision-making in a dynamic environment. However, to be able to adapt to the ever-changing environment, the DRL solution implemented on an embedded device has to continue to occasionally take exploratory actions even after initial convergence. In other words, the device has to occasionally take random actions and update the value function, i.e., re-train the Artificial Neural Network (ANN), to ensure its performance remains optimal. Unfortunately, embedded devices often lack processing power and energy required to train the ANN. The energy aspect is particularly challenging when the edge device is powered only by a means of Energy Harvesting (EH). To overcome this problem, we propose a two-part algorithm in which the DRL process is trained at the sink. Then the weights of the fully trained underlying ANN are periodically transferred to the EH-powered embedded device taking actions. Using an EH-powered sensor, real-world measurements dataset, and optimizing for Age of Information (AoI) metric, we demonstrate that such a DRL solution can operate without any degradation in the performance, with only a few ANN updates per day.
翻译:深强化学习(DRL)解决方案在网络边缘越来越普遍,因为它们能够在动态环境中进行自主决策。然而,为了能够适应不断变化的环境,在嵌入设备上实施的DRL解决方案必须继续偶尔采取探索性行动,即使在初始趋同之后也是如此。换句话说,该设备必须偶尔采取随机行动,更新价值功能,即人工神经网络(ANN)的再培训,以确保其性能保持最佳。不幸的是,嵌入设备往往缺乏在动态环境中培训ANN所需的处理能力和能量。当边缘设备仅通过能源采集(EH)手段来驱动时,能源方面特别具有挑战性。为了解决这一问题,我们建议采用两部分的算法,在水槽中培训DRL进程。然后,经过充分训练的ANN的重量定期转移到EH动力嵌入装置,采取行动。使用电子H动力传感器、真实世界测量数据集,以及优化信息时代(AOI)的度量度,我们证明这种DRL解决方案每天只能运行,而没有退化,只有少量的性能。