Advances in reinforcement learning research have demonstrated the ways in which different agent-based models can learn how to optimally perform a task within a given environment. Reinforcement leaning solves unsupervised problems where agents move through a state-action-reward loop to maximize the overall reward for the agent, which in turn optimizes the solving of a specific problem in a given environment. However, these algorithms are designed based on our understanding of actions that should be taken in a real-world environment to solve a specific problem. One such problem is the ability to identify, recommend and execute an action within a system where the users are the subject, such as in education. In recent years, the use of blended learning approaches integrating face-to-face learning with online learning in the education context, has in-creased. Additionally, online platforms used for education require the automation of certain functions such as the identification, recommendation or execution of actions that can benefit the user, in this sense, the student or learner. As promising as these scientific advances are, there is still a need to conduct research in a variety of different areas to ensure the successful deployment of these agents within education systems. Therefore, the aim of this study was to contextualise and simulate the cumulative reward within an environment for an intervention recommendation problem in the education context.
翻译:强化学习研究的进展展示了不同代理商模型能够学习如何在特定环境中最佳地执行任务的方法。强化倾斜解决了一些不受监督的问题,即代理商通过州-行动回报循环流动,最大限度地提高代理商的总体报酬,进而优化特定环境中具体问题的解决。然而,这些算法的设计是基于我们对在现实环境中为解决具体问题而应采取的行动的理解。其中一个问题是,在用户为对象的系统中,例如在教育方面,确定、建议和执行行动的能力。近年来,采用混合学习方法,将面对面学习与在线学习结合起来,在教育环境中,已经形成一个封闭的循环。此外,用于教育的在线平台需要某些功能的自动化,例如确定、建议或执行有利于用户的行动,从这个意义上讲,是学生或学习者。由于这些科学进步很有希望,因此,在不同的系统中,仍然需要在不同的领域进行研究,以确保成功地将这些代理商与在线学习结合起来,在教育环境中进行,因此,在环境范围内的模拟研究的目标是在环境范围内进行一个模拟研究。