Developing interactive software, such as websites or games, is a particularly engaging way to learn computer science. However, teaching and giving feedback on such software is time-consuming -- standard approaches require instructors to manually grade student-implemented interactive programs. As a result, online platforms that serve millions, like Code.org, are unable to provide any feedback on assignments for implementing interactive programs, which critically hinders students' ability to learn. One approach toward automatic grading is to learn an agent that interacts with a student's program and explores states indicative of errors via reinforcement learning. However, existing work on this approach only provides binary feedback of whether a program is correct or not, while students require finer-grained feedback on the specific errors in their programs to understand their mistakes. In this work, we show that exploring to discover errors can be cast as a meta-exploration problem. This enables us to construct a principled objective for discovering errors and an algorithm for optimizing this objective, which provides fine-grained feedback. We evaluate our approach on a set of over 700K real anonymized student programs from a Code.org interactive assignment. Our approach provides feedback with 94.3% accuracy, improving over existing approaches by 17.7% and coming within 1.5% of human-level accuracy. Project web page: https://ezliu.github.io/dreamgrader.
翻译:开发互动软件,例如网站或游戏,是学习计算机科学的一个特别令人特别感兴趣的方法。然而,就这种软件进行教学和提供反馈是非常费时的 -- -- 标准方法要求教员手动进行年级学生执行的互动式程序。因此,为数百万人服务的在线平台,如代码.org,无法就执行互动程序的任务提供任何反馈,这严重妨碍学生的学习能力。自动分级的一个方法是学习一个与学生程序互动的代理商,并通过强化学习探索错误的标志性状态。然而,目前关于这一方法的工作只提供程序是否正确问题的二进制反馈,而学生则需要对其程序中的具体错误进行精细微的反馈,才能理解错误。在这项工作中,我们显示探索错误的探索可以作为一种元解析问题。这使我们能够构建一个发现错误和优化这一目标的算法的原则性目标,提供精细的反馈。我们评估了一套700K真实的本地语地名化学生程序的方法,从代码.org互动任务中进行。我们的方法提供了精细的反馈。我们的方法提供了网络精确度为94.3%/155页的反馈。