We advance the state-of-the-art in the accuracy of code prediction (next token prediction) used in autocomplete systems. First, we report that using the recently proposed Transformer architecture even out-of-the-box outperforms previous neural and non-neural systems for code prediction. We then show that by making the Transformer architecture aware of the syntactic structure of code, we further increase the margin by which a Transformer-based system outperforms previous systems. With this, it outperforms the accuracy of an RNN-based system (similar to Hellendoorn et al. 2018) by 18.3%, the Deep3 system (Raychev et al 2016) by 14.1%, and an adaptation of Code2Seq (Alon et al., 2018) for code prediction by 14.4%. We present in the paper several ways of communicating the code structure to the Transformer, which is fundamentally built for processing sequence data. We provide a comprehensive experimental evaluation of our proposal, along with alternative design choices, on a standard Python dataset, as well as on a Facebook internal Python corpus. Our code and data preparation pipeline will be available in open source.
翻译:我们推进了自动化系统中使用的代码预测(下个象征性预测)的准确性。 首先,我们报告说,使用最近提议的变异器结构,甚至以离盒外的外外外外外外外显示以前的神经和非神经系统来进行代码预测。 然后,我们表明,通过使变异器结构了解代码的合成结构,我们进一步加大了以变异器为基础的系统比以往系统更完善的值。这样,它比基于RNN的系统(类似于Hellendoorn等人,2018年)的准确性高了18.3%, Deep3系统(Raychev等人,2016年)增加了14.1%,对代码2Seq(Alon等人,2018年)进行了14.4%的修改,以进行代码预测。我们在文件中提出了向为处理序列数据基础的变异器传达代码结构的几种方法。我们提供了对我们提案的全面实验性评价,并提出了其他设计选择,包括标准 Python数据集,还将在Facebook内部管道中提供数据与数据源。