Today's programmers, especially data science practitioners, make heavy use of data-processing libraries (APIs) such as PyTorch, Tensorflow, NumPy, Pandas, and the like. Program synthesizers can provide significant coding assistance to this community of users; however program synthesis also can be slow due to enormous search spaces. In this work, we examine ways in which machine learning can be used to accelerate enumerative program synthesis. We present a deep-learning-based model to predict the sequence of API functions that would be needed to go from a given input to a desired output, both being numeric vectors. Our work is based on two insights. First, it is possible to learn, based on a large number of input-output examples, to predict the likely API function needed in a given situation. Second, and crucially, it is also possible to learn to compose API functions into a sequence, given an input and the desired final output, without explicitly knowing the intermediate values. We show that we can speed up an enumerative program synthesizer by using predictions from our model variants. These speedups significantly outperform previous ways (e.g. DeepCoder) in which researchers have used ML models in enumerative synthesis.
翻译:今天的编程员,特别是数据科学从业人员,大量使用PyTorrch、Tensorflow、NumPy、Panda等数据处理图书馆(APIs),如PyTorrch、Tensorflow、NumPy、Pandas等。 程序合成员可以向这个用户群提供重要的编码协助; 但是由于搜索空间巨大,程序合成也可能很慢。 在这项工作中,我们研究如何利用机器学习来加速数字化程序合成。 我们提出了一个基于深层次学习的模型, 以预测从特定输入到理想输出所需的 API 功能的序列, 两者都是数字矢量矢量。 我们的工作基于两种洞察力。 首先, 可以根据大量投入- 输出示例, 学习如何预测在特定情况下可能需要的 API 函数 。 其次, 关键是, 我们也可以学习如何将 API 函数组合成一个序列, 提供输入和期望的最终输出, 而不明确了解中间值。 我们表明, 我们可以通过使用模型变式的预测来加速一个数字化程序合成器加速编程。 这些速度模型使用了前的模型。 模型。 这些模型使用了前方法。