In recent years, the use of deep learning in language models gained much attention. Some research projects claim that they can generate text that can be interpreted as human-writing, enabling new possibilities in many application areas. Among the different areas related to language processing, one of the most notable in applying this type of modeling is programming languages. For years, the Machine Learning community has been researching this software engineering area, pursuing goals like applying different approaches to auto-complete, generate, fix, or evaluate code programmed by humans. Considering the increasing popularity of the Deep-Learning-enabled language models approach, we detected a lack of empirical papers that compare different deep learning architectures to create and use language models based on programming code. This paper compares different neural network architectures like AWD-LSTMs, AWD-QRNNs, and Transformer while using transfer learning and different tokenizations to see how they behave in building language models using a Python dataset for code generation and filling mask tasks. Considering the results, we discuss each approach's different strengths and weaknesses and what gaps we find to evaluate the language models or apply them in a real programming context.
翻译:近些年来,语言模型中深层学习的使用引起了人们的极大关注。一些研究项目声称,它们能够产生可被解释为人文写作的文本,从而在许多应用领域促成新的可能性。在与语言处理有关的不同领域,在应用这种类型的建模中最显著的一个领域是编程语言。多年来,机器学习社区一直在研究这个软件工程领域,追求不同的目标,例如对自动完成、生成、修正或评估人类编程的代码采用不同的方法。考虑到由深学习驱动的语言模型方法越来越受欢迎,我们发现缺乏经验文件,比较不同的深层次学习结构,以创建和使用基于编程代码的语言模型。本文比较了不同的神经网络结构,如AWD-LSTMs、AWD-QNNS和变形器,同时使用传输学习和不同符号来观察他们如何使用Python数据集构建语言模型来生成代码和填补面具任务。我们讨论了每种方法的不同优势和弱点,以及我们发现在评价语言模型或将其应用到什么差距,例如AWD-LSTM、AW-Q-QNNNNN和变形模型的实际编程。