Source code spends most of its time in a broken or incomplete state during software development. This presents a challenge to machine learning for code, since high-performing models typically rely on graph structured representations of programs derived from traditional program analyses. Such analyses may be undefined for broken or incomplete code. We extend the notion of program graphs to work-in-progress code by learning to predict edge relations between tokens, training on well-formed code before transferring to work-in-progress code. We consider the tasks of code completion and localizing and repairing variable misuse in a work-in-process scenario. We demonstrate that training relation-aware models with fine-tuned edges consistently leads to improved performance on both tasks.
翻译:源代码大部分时间都花在软件开发过程中的破碎或不完整状态中。 这对机器代码学习提出了挑战,因为高性能模型通常依赖传统程序分析产生的程序图表结构化的表达方式。 这种分析可能没有被确定为破碎或不完整的代码。 我们通过学习预测标识之间的边际关系,在转换到在运行代码之前对完善的代码进行培训,将程序图的概念扩大到在运行代码。 我们考虑了代码完成、本地化和修复在运行过程中的可变误用等任务。 我们证明,对具有精细调整边缘的关联性模型进行的培训会不断改善这两项任务的业绩。