Open-domain code generation is a challenging problem because the set of functions and classes that we use are frequently changed and extended in programming communities. We consider the challenge of code generation for unknown libraries without additional training. In this paper, we explore a framework of code generation that can refer to relevant API documentations like human programmers to handle unknown libraries. As a first step of this direction, we implement a model that can extract relevant code signatures from API documentations based on a natural language intent and copy primitives from the extracted signatures. Moreover, to evaluate code generation for unknown libraries and our framework, we extend an existing dataset of open-domain code generation and resplit it so that the evaluation data consist of only examples using the libraries that do not appear in the training data. Experiments on our new split show that baseline encoder-decoder models cannot generate code using primitives of unknown libraries as expected. In contrast, our model outperforms the baseline on the new split and can properly generate unknown primitives when extracted code signatures are noiseless.
翻译:开放域代码生成是一个具有挑战性的问题,因为我们使用的功能和类别在编程社区中经常改变和扩展。 我们考虑了在不额外培训的情况下为未知图书馆生成代码的挑战。 在本文中, 我们探索了一个代码生成框架, 可以引用相关的 API 文档, 如人类编程程序员来处理未知的图书馆。 作为这个方向的第一步, 我们实施了一个模型, 可以在自然语言意图的基础上从 API 文档中提取相关的代码签名, 并从提取的签名中复制原始文件 。 此外, 为了评估未知图书馆和我们的框架的代码生成, 我们扩展了一个开放式域代码生成的现有数据集, 并重新复制它, 这样评估数据只包含使用未出现在培训数据中的图书馆的示例 。 对我们新的分类实验显示, 基线编码解码器模式无法生成代码, 使用未知图书馆的原始数据。 相反, 我们的模型超越了新拆解的基线, 当提取代码签名时, 我们的模型能够正确生成未知的原始数据是无噪音的 。