Jupyter notebook allows data scientists to write machine learning code together with its documentation in cells. In this paper, we propose a new task of code documentation generation (CDG) for computational notebooks. In contrast to the previous CDG tasks which focus on generating documentation for single code snippets, in a computational notebook, one documentation in a markdown cell often corresponds to multiple code cells, and these code cells have an inherent structure. We proposed a new model (HAConvGNN) that uses a hierarchical attention mechanism to consider the relevant code cells and the relevant code tokens information when generating the documentation. Tested on a new corpus constructed from well-documented Kaggle notebooks, we show that our model outperforms other baseline models.
翻译:Jupyter 笔记本可以让数据科学家在细胞中写出机器学习代码及其文档。 在本文中, 我们提议了一个新的计算笔记本的代码文件生成任务 。 与先前的CDG 任务相比, CDG 任务侧重于为计算笔记本中的单代码片生成文档, 而一个标记单元格中的文档通常与多个代码单元格相对应, 这些代码单元格有内在结构 。 我们提出了一个新的模型( HA ConvGNN ), 使用一个等级关注机制来考虑相关代码单元格和生成相关代码符号的信息 。 我们用一个由有据可查的 Kagle 笔记本构建的新体测试了我们的模型比其他基线模型要好得多 。