A Reinforcement Learning (RL) system depends on a set of initial conditions (hyperparameters) that affect the system's performance. However, defining a good choice of hyperparameters is a challenging problem. Hyperparameter tuning often requires manual or automated searches to find optimal values. Nonetheless, a noticeable limitation is the high cost of algorithm evaluation for complex models, making the tuning process computationally expensive and time-consuming. In this paper, we propose a framework based on integrating complex event processing and temporal models, to alleviate these trade-offs. Through this combination, it is possible to gain insights about a running RL system efficiently and unobtrusively based on data stream monitoring and to create abstract representations that allow reasoning about the historical behaviour of the RL system. The obtained knowledge is exploited to provide feedback to the RL system for optimising its hyperparameters while making effective use of parallel resources. We introduce a novel history-aware epsilon-greedy logic for hyperparameter optimisation that instead of using static hyperparameters that are kept fixed for the whole training, adjusts the hyperparameters at runtime based on the analysis of the agent's performance over time windows in a single agent's lifetime. We tested the proposed approach in a 5G mobile communications case study that uses DQN, a variant of RL, for its decision-making. Our experiments demonstrated the effects of hyperparameter tuning using history on training stability and reward values. The encouraging results show that the proposed history-aware framework significantly improved performance compared to traditional hyperparameter tuning approaches.
翻译:强化学习(RL)系统取决于一套影响系统性能的初始条件(高温计),然而,确定一个良好的超光度计选择是一个具有挑战性的问题。超光度计调往往需要人工或自动搜索才能找到最佳值。然而,一个明显的局限性是复杂模型的算法评估成本高昂,使调试过程计算成本昂贵且耗时。在本文中,我们提出了一个基于综合复杂事件处理和时间模型的框架,以缓解这些偏差。通过这一组合,有可能了解运行一个基于数据流监测的高效和不清晰的超光度计系统,并创建抽象的表达方式,以便能够对RL系统的历史行为进行推理。获得的知识被用来向RL系统提供反馈,以便优化其超光速的超常度计进程,同时有效地使用平行资源。我们引入了一种新颖的超常超光度调整方法,而不是使用为整个培训而固定的固定的超常超常超常度计值计,同时在一次时间分析中将超常性能性能测试用于我们所拟的超时空实验室的运行模式。</s>