Recently there have been many advances in research on language modeling of source code. Applications range from code suggestion and completion to code summarization. However, complete program synthesis of industry-grade programming languages remains an open problem. In this work, we introduce and experimentally validate a variational autoencoder model for program synthesis of industry-grade programming languages. This model makes use of the inherent tree structure of code and can be used in conjunction with gradient free optimization techniques like evolutionary methods to generate programs that maximize a given fitness function, for instance, passing a set of test cases. A demonstration is avaliable at https://tree2tree.app
翻译:最近,在源代码的语言建模研究方面取得了许多进展,应用范围从代码建议和完成到代码汇总不等,然而,工业级编程语言的完整程序合成仍然是一个尚未解决的问题。在这项工作中,我们引入并实验性地验证了工业级编程语言方案合成的变式自动编码模型。该模型利用了内在的代码树结构,并可以与梯度自由优化技术(如进化方法)一起使用,以生成程序,最大限度地发挥给定的健身功能,例如,通过一套测试案例。示范在https://tree2tree.app可以证明。