Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants' satisfaction with their computational notebook.
翻译:计算笔记本使数据科学家能够通过混合代码和文件来表达他们的想法。然而,数据科学家往往只关注代码,在快速迭代中忽视创建或更新其文件。在从80个高额卡格勒笔记本中学到的人类文件做法的启发下,我们设计并实施了Themisto,这是一个自动文件生成系统,探索以人为中心的人工智能系统如何在机器学习代码文件设想中支持人类数据科学家。这个系统通过三种方法便利了文件的创建:一种基于深层次学习的方法来生成源代码文件,一种基于查询的方法来检索源代码的在线API文件,以及一种用用户迅速的方法来说服用户撰写文件。我们与24名数据科学从业人员在一项主题内实验中评估了Themisto,发现自动化文件生成技术减少了撰写文件的时间,提醒与会者注意文件代码,他们会忽视,并提高了参与者对其计算笔记本的满意度。