The utilization of programming language (PL) models, pretrained on large-scale code corpora, as a means of automating software engineering processes has demonstrated considerable potential in streamlining various code generation tasks such as code completion, code translation, and program synthesis. However, current approaches mainly rely on supervised fine-tuning objectives borrowed from text generation, neglecting specific sequence-level features of code, including but not limited to compilability as well as syntactic and functional correctness. To address this limitation, we propose PPOCoder, a new framework for code generation that combines pretrained PL models with Proximal Policy Optimization (PPO) deep reinforcement learning and employs execution feedback as the external source of knowledge into the model optimization. PPOCoder is transferable across different code generation tasks and PLs. Extensive experiments on three code generation tasks demonstrate the effectiveness of our proposed approach compared to SOTA methods, improving the success rate of compilation and functional correctness over different PLs. Our code can be found at https://github.com/reddy-lab-code-research/PPOCoder .
翻译:作为软件工程流程自动化手段,预先接受大规模代码公司学前培训的编程语言模型的使用,在精简代码完成、代码翻译和方案合成等各种代码生成任务方面显示出相当大的潜力,然而,目前的方法主要依赖从文本生成中借用的监督性微调目标,忽视了代码的具体序列级特征,包括但不限于可兼容性以及综合和功能正确性。为解决这一局限性,我们提议了PPOCoder,这是一个新的代码生成框架,将预先培训的PL模型与优化政策深度强化学习(PPPO)相结合,并将执行反馈作为外部知识来源用于模式优化。 PPOCoder可跨越不同的代码生成任务和PLs。关于三种代码生成任务的广泛实验表明,与SOTA方法相比,我们拟议方法的有效性,提高了对不同PLs的汇编和功能正确性的成功率。我们的代码见https://github.com/reddy-lab-code-research/PPOCoder。