Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging. Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple programming tasks. However, these models still perform poorly when evaluated on more complex, unseen problems that require problem-solving skills beyond simply translating instructions into code. For example, competitive programming problems which require an understanding of algorithms and complex natural language remain extremely challenging. To address this gap, we introduce AlphaCode, a system for code generation that can create novel solutions to these problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3% in competitions with more than 5,000 participants. We found that three key components were critical to achieve good and reliable performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions.
翻译:开发能够帮助程序员甚至独立生成程序的程序的系统,可以使程序编制更有成效,更便于获取,但迄今为止将创新纳入AI系统证明是具有挑战性的。最近的大型语言模型已经展示出惊人的生成代码的能力,现在能够完成简单的程序设计任务。然而,这些模型在评估更复杂、更隐蔽的解决问题的技能时表现仍然很差,这些问题需要将指令转换成代码。例如,竞争性的方案编制问题仍然极具挑战性,需要理解算法和复杂的自然语言。为了弥补这一差距,我们引入了AphaCode,这是一个代码生成系统,可以为这些问题创造新的解决办法,需要更深入的推理。在模拟评估代码强化平台最近的方案制定竞争时,AlphaCode平均在5 000多名参与者的竞争中达到了54.3%的排名。我们发现,三个关键组成部分对于实现良好和可靠的业绩至关重要:(1) 用于培训和评估的广泛而清洁的竞争性的编程数据集,(2) 大型和高效到模版结构,可以创造出新的解决办法。(3) 大规模模型取样,以探索基于小规模程序进行的搜索空间。