Automatic code generation is to generate the program code according to the given natural language description. The current mainstream approach uses neural networks to encode natural language descriptions, and output abstract syntax trees (AST) at the decoder, then convert the AST into program code. While the generated code largely conforms to specific syntax rules, two problems are still ignored. One is missing program testing, an essential step in the process of complete code implementation; the other is only focusing on the syntax compliance of the generated code, while ignoring the more important program functional requirements. The paper proposes a CodeGen-Test model, which adds program testing steps and incorporates program testing information to iteratively generate code that meets the functional requirements of the program, thereby improving the quality of code generation. At the same time, the paper proposes a new evaluation metric, test accuracy (Test-Acc), which represents the proportion of passing program test in generated code. Different from the previous evaluation metric, which only evaluates the quality of code generation from the perspective of character similarity, the Test-Acc can evaluate the quality of code generation from the Program functions. Moreover, the paper evaluates the CodeGen-test model on a python data set "hearthstone legend". The experimental results show the proposed method can effectively improve the quality of generated code. Compared with the existing optimal model, CodeGen-Test model improves the Bleu value by 0.2%, Rouge-L value by 0.3% and Test-Acc by 6%.
翻译:自动代码生成是根据给定的自然语言描述生成程序代码。 当前的主流方法使用神经网络在解码器中编码自然语言描述和输出抽象语法树( AST), 然后将 AST 转换为程序代码。 虽然生成的代码基本上符合具体的语法规则, 但有两个问题仍然被忽视。 一个是缺少程序测试, 这是完整代码执行过程中一个必不可少的步骤; 另一个只是侧重于生成代码的语法合规性, 而忽略了更为重要的程序功能要求 。 本文提议了一个 CodeG- 测试模型, 添加程序测试步骤, 并纳入程序测试信息, 以迭代生成符合程序功能要求的代码, 从而改进代码生成的质量。 同时, 该文件还提出了一个新的评价标准、 测试准确性( Test- Acc), 代表了生成代码中通过程序测试的比例。 不同于以前的评估度度度度, 仅从性质相似的角度评估代码生成质量, 测试- Acc 能够评估程序功能生成的代码质量质量。 此外, 实验性测试G 能够通过测试模型 改进现有的测试方法 。