Interest in remote monitoring has grown thanks to recent advancements in Internet-of-Things (IoT) paradigms. New applications have emerged, using small devices called sensor nodes capable of collecting data from the environment and processing it. However, more and more data are processed and transmitted with longer operational periods. At the same, the battery technologies have not improved fast enough to cope with these increasing needs. This makes the energy consumption issue increasingly challenging and thus, miniaturized energy harvesting devices have emerged to complement traditional energy sources. Nevertheless, the harvested energy fluctuates significantly during the node operation, increasing uncertainty in actually available energy resources. Recently, approaches in energy management have been developed, in particular using reinforcement learning approaches. However, in reinforcement learning, the algorithm's performance relies greatly on the reward function. In this paper, we present two contributions. First, we explore five different reward functions to identify the most suitable variables to use in such functions to obtain the desired behaviour. Experiments were conducted using the Q-learning algorithm to adjust the energy consumption depending on the energy harvested. Results with the five reward functions illustrate how the choice thereof impacts the energy consumption of the node. Secondly, we propose two additional reward functions able to find the compromise between energy consumption and a node performance using a non-fixed balancing parameter. Our simulation results show that the proposed reward functions adjust the node's performance depending on the battery level and reduce the learning time.
翻译:对远程监测的兴趣由于最近互联网网络(IoT)模式的发展而增加。新的应用已经出现,使用名为传感器节点的小型装置,能够从环境中收集数据并加以处理。然而,越来越多的数据被处理和传送,操作期较长。同样,电池技术没有迅速改善,无法满足这些不断增加的需求。这使得能源消费问题越来越具有挑战性,因此,小型能源收获装置已经出现,以补充传统能源来源。然而,节点操作期间,收获的能源波动很大,实际可得能源资源的不确定性增加。最近,能源管理方法已经发展,特别是使用强化学习方法。但在强化学习方面,算法的性能在很大程度上依赖于奖励功能。在本文中,我们提出了两项不同的奖励功能:首先,我们探讨五个不同的奖励功能,以确定在这种功能中使用的最合适的变量来获得所期望的行为。在节点操作期间,利用定量学习算法来调整能源消耗量,这取决于所收获的能源。结果有五项奖励功能说明能源消耗水平是如何降低的,特别是使用强化学习方法。在强化学习学习过程中,我们建议用不进行学习的成绩调整。