Code example is a crucial part of good documentation. It helps the developers to understand the documentation easily and use the corresponding code unit (e.g., method) properly. However, many official documentation still lacks (good) code example and it is one of the common documentation issues as found by several studies. Hence in this paper, we consider automatic code example generation for documentation, a direction less explored by the existing research. We employ Codex, a GPT-3 based model, pre-trained on both natural and programming languages to generate code examples from source code and documentation given as input. Our preliminary investigation on 40 scikit-learn methods reveals that this approach is able to generate good code examples where 72.5% code examples were executed without error (passability) and 82.5% properly dealt with the target method and documentation (relevance). We also find that incorporation of error logs (produced by the compiler while executing a failed code example) in the input further improves the passability from 72.5% to 87.5%. Thus, our investigation sets the base of documentation-specific code example generation and warrants in-depth future studies.
翻译:代码示例是良好文档不可或缺的一部分。它有助于开发人员轻松理解文档并正确使用相应的代码单元(例如方法)。然而,许多官方文档仍然缺乏(好的)代码示例,这是几项研究发现的常见文档问题之一。因此,在本文中,我们考虑自动生成文档的代码示例,这是现有研究探索较少的方向。我们使用 Codex,一种基于 GPT-3 的模型,对自然语言和编程语言进行预训练,以从给定的源代码和文档生成代码示例。我们对 40 个 scikit-learn 方法进行的初步调查表明,这种方法能够生成良好的代码示例,其中有 72.5% 的代码示例可以无错误执行(可通过性),而 82.5% 的示例正确处理了目标方法和文档(相关性)。我们还发现,将编译器在执行失败的代码示例时产生的错误日志(error logs)合并到输入中可以将可通过性从 72.5% 提高到 87.5%。因此,我们的调查奠定了文档特定代码示例生成的基础,并需要进行深入的未来研究。