Beginning programmers struggle with the complex grammar of modern programming languages like Java, and make lot of syntax errors. The diagnostic syntax error messages from compilers and IDEs are sometimes useful, but often the messages are cryptic and puzzling. Students could be helped, and instructors' time saved, by automated repair suggestions when dealing with syntax errors. Large samples of student errors and fixes are now available, offering the possibility of data-driven machine-learning approaches to help students fix syntax errors. Current machine-learning approaches do a reasonable job fixing syntax errors in shorter programs, but don't work as well even for moderately longer programs. We introduce SYNFIX, a machine-learning based tool that substantially improves on the state-of-the-art, by learning to use compiler diagnostics, employing a very large neural model that leverages unsupervised pre-training, and relying on multi-label classification rather than autoregressive synthesis to generate the (repaired) output. We describe SYNFIX's architecture in detail, and provide a detailed evaluation. We have built SYNFIX into a free, open-source version of Visual Studio Code; we make all our source code and models freely available.
翻译:初始程序员与像 Java 这样的现代编程语言的复杂语法拼凑, 并做了大量的语法错误。 编程员和 IDE 的诊断性语法错误信息有时有用, 但信息往往有隐秘和令人费解。 处理语法错误时, 学生可以得到帮助, 教官可以节省时间。 大量的学生错误和校正样本现已可用, 提供了数据驱动机学习方法的可能性, 以帮助学生修补语法错误。 当前的机器学习方法在较短的程序里可以做合理的工作, 修补语法错误, 但即使是对程序来说, 也不起作用。 我们引入了SYNFIX, 这是一种基于机器学习工具, 大大改进了当前艺术的状态, 学会使用编程诊断方法, 使用一个非常大型的神经模型, 利用不受监管的训练前训练, 依靠多标签分类而不是自动回归合成合成方法来生成( 更新 ) 输出。 我们详细描述SYNFIX 的架构, 并且提供详细的评估。 我们建立了一个自由的SIS 代码源, 我们建立了一个自由的SY- RVIS 版本。