Novice programmers often struggle with the formal syntax of programming languages. To assist them, we design a novel programming language correction framework amenable to reinforcement learning. The framework allows an agent to mimic human actions for text navigation and editing. We demonstrate that the agent can be trained through self-exploration directly from the raw input, that is, program text itself, without any knowledge of the formal syntax of the programming language. We leverage expert demonstrations for one tenth of the training data to accelerate training. The proposed technique is evaluated on 6975 erroneous C programs with typographic errors, written by students during an introductory programming course. Our technique fixes 14% more programs and 29% more compiler error messages relative to those fixed by a state-of-the-art tool, DeepFix, which uses a fully supervised neural machine translation approach.
翻译:新编程员经常与编程语言的正式语法拼凑。 为了帮助他们, 我们设计了一个新的编程语言校正框架, 以强化学习。 这个框架允许一个代理于文本导航和编辑时模仿人类行动。 我们证明该代理可以直接通过原始输入的自我探索来培训, 即程序文本本身, 完全不了解编程语言的正式语法。 我们利用十分之一培训数据的专家演示来加速培训。 提议的技术是用学生在入门编程课程中写的印错的6975个错误的 C 程序进行评估。 我们的技术比由最先进的工具DeepFix(DeepFix)(使用完全受监督的神经机器翻译方法)确定的错误信息增加了14%和29%。