In this paper, a lifelong learning problem is studied for an Internet of Things (IoT) system. In the considered model, each IoT device aims to balance its information freshness and energy consumption tradeoff by controlling its computational resource allocation at each time slot under dynamic environments. An unmanned aerial vehicle (UAV) is deployed as a flying base station so as to enable the IoT devices to adapt to novel environments. To this end, a new lifelong reinforcement learning algorithm, used by the UAV, is proposed in order to adapt the operation of the devices at each visit by the UAV. By using the experience from previously visited devices and environments, the UAV can help devices adapt faster to future states of their environment. To do so, a knowledge base shared by all devices is maintained at the UAV. Simulation results show that the proposed algorithm can converge $25\%$ to $50\%$ faster than a policy gradient baseline algorithm that optimizes each device's decision making problem in isolation.
翻译:本文为物联网(IoT)系统研究了终身学习问题,在考虑的模型中,每个IoT装置都旨在通过控制动态环境中每个时段的计算资源分配来平衡其信息新鲜度和能源消耗量的权衡。无人驾驶飞行器(UAV)是作为飞行基地站部署的,以使IoT装置适应新的环境。为此,提出了由UAV使用的新的终身强化学习算法,以适应无人驾驶飞行器每次访问的装置操作。利用以前访问过的装置和环境的经验,无人驾驶飞行器可以帮助装置更快地适应其环境的未来状态。为此,UAV系统保持了所有装置共享的知识库。模拟结果表明,拟议的算法可以比政策梯度基线算法同步25 $ 至 50 $ 。 该算法可以使每个装置在孤立时作出最佳决定。