As the complexity of modern software continues to escalate, software engineering has become an increasingly daunting and error-prone endeavor. In recent years, the field of Neural Code Intelligence (NCI) has emerged as a promising solution, leveraging the power of deep learning techniques to tackle analytical tasks on source code with the goal of improving programming efficiency and minimizing human errors within the software industry. Pretrained language models have become a dominant force in NCI research, consistently delivering state-of-the-art results across a wide range of tasks, including code summarization, generation, and translation. In this paper, we present a comprehensive survey of the NCI domain, including a thorough review of pretraining techniques, tasks, datasets, and model architectures. We hope this paper will serve as a bridge between the natural language and programming language communities, offering insights for future research in this rapidly evolving field.
翻译:随着现代软件的复杂程度不断升级,软件工程已成为一项日益艰巨和容易出错的努力。 近年来,神经编码情报领域已经成为一个充满希望的解决办法,利用深层次学习技术的力量处理源代码分析任务,目的是提高软件行业的编程效率和尽量减少人为错误。 训练有素的语言模型已成为国家清单研究的主导力量,在包括代码汇总、生成和翻译在内的广泛任务中不断提供最新成果。 在本文件中,我们对国家清单领域进行了全面调查,包括彻底审查培训前的技术、任务、数据集和模型结构。 我们希望这份文件将成为自然语言界与编程语言界之间的桥梁,为这一迅速变化的领域的未来研究提供见解。