A classical learning setting is one in which a student collects data, or observations, about a system, and estimates a certain quantity of interest about it. Correctional learning is a type of cooperative teacher-student framework where a teacher, who has knowledge about the system, has the possibility to observe and alter (correct) the observations received by the student in order to improve its estimation. In this paper, we show that the variance of the estimate of the student is reduced with the help of the teacher. We further formulate the online problem - where the teacher has to decide at each time instant whether or not to change the observations - as a Markov decision process, from which the optimal policy is derived using dynamic programming. We validate the framework in numerical experiments, and compare the optimal online policy with the one from the batch setting.
翻译:古典的学习环境是学生对系统收集数据或观察,并估计对系统感兴趣的程度。教改学习是一种合作师生框架,对系统有知识的教师可以观察和修改(纠正)学生收到的意见,以改进其估计。在本文中,我们表明学生估计数的差异在教师的帮助下有所减少。我们进一步阐述了在线问题,即教师必须立即决定是否改变观测结果,作为Markov决策程序,最佳政策是利用动态程序制定出来的。我们验证了数字实验框架,并将最佳在线政策与批次设置的政策进行比较。