Improvements in the performance of computing systems, driven by Moore's Law, have transformed society. As such hardware-driven gains slow down, it becomes even more important for software developers to focus on performance and efficiency during development. While several studies have demonstrated the potential from such improved code efficiency (e.g., 2x better generational improvements compared to hardware), unlocking these gains in practice has been challenging. Reasoning about algorithmic complexity and the interaction of coding patterns on hardware can be challenging for the average programmer, especially when combined with pragmatic constraints around development velocity and multi-person development. This paper seeks to address this problem. We analyze a large competitive programming dataset from the Google Code Jam competition and find that efficient code is indeed rare, with a 2x runtime difference between the median and the 90th percentile of solutions. We propose using machine learning to automatically provide prescriptive feedback in the form of hints, to guide programmers towards writing high-performance code. To automatically learn these hints from the dataset, we propose a novel discrete variational auto-encoder, where each discrete latent variable represents a different learned category of code-edit that increases performance. We show that this method represents the multi-modal space of code efficiency edits better than a sequence-to-sequence baseline and generates a distribution of more efficient solutions.
翻译:由摩尔法律驱动的计算机系统绩效的改善已经改变了社会。 随着硬件驱动的收益放缓,软件开发者更加需要重视开发过程中的绩效和效率。虽然一些研究显示,这种改进的代码效率具有潜力(例如,与硬件相比,代代间改善2x更好),但在实践上释放这些收益具有挑战性。基于算法复杂性和硬件编码模式的相互作用,对于平均程序设计者来说可能具有挑战性,特别是在结合发展速度和多人发展方面的务实制约因素的情况下。本文件力求解决这一问题。我们分析谷歌代码游戏竞赛中大型竞争性编程数据集,发现高效代码确实很少,中位数和90%的解决方案之间有2x运行时间差异。我们建议使用机器学习自动提供提示形式的指令反馈,引导程序设计者编写高性代码。为了自动从数据集中学习这些提示,我们建议了一个全新的离散式自动变式自动编码,每个离散的软体变量代表了不同的代码-序列,而代码-后序的代码分配效率则会提高。