Considering grant-free transmissions in low-power IoT networks with unknown time-frequency distribution of interference, we address the problem of Dynamic Resource Configuration (DRC), which amounts to a Markov decision process. Unfortunately, off-the-shelf methods based on single-objective reinforcement learning cannot guarantee energy-efficient transmission, especially when all frequency-domain channels in a time interval are interfered. Therefore, we propose a novel DRC scheme where configuration policies are optimized with a Multi-Objective Reinforcement Learning (MORL) framework. Numerical results show that the average decision error rate achieved by the MORL-based DRC can be even less than 12% of that yielded by the conventional R-learning-based approach.
翻译:考虑到在干扰时间-频率分布不明的低功率IOT网络中无赠款传输,我们解决了动态资源配置(DRC)问题,这相当于马尔科夫决策程序。 不幸的是,基于单一目标强化学习的现成方法无法保证节能传输,特别是当一个时间间隔的所有频率-域频道都受到干扰时。 因此,我们提出一个新的刚果民主共和国计划,根据多目标强化学习(MORL)框架优化配置政策。 数字结果显示,以MORL为基础的刚果民主共和国的平均决策错误率甚至低于常规R-学习方法得出的12%。