Python Jupyter笔记本错误识别战略 (Error Identification Strategies for Python Jupyter Notebooks)

Computational notebooks-such as Jupyter or Colab-combine text and data analysis code. They have become ubiquitous in the world of data science and exploratory data analysis. Since these notebooks present a different programming paradigm than conventional IDE-driven programming, it is plausible that debugging in computational notebooks might also be different. More specifically, since creating notebooks blends domain knowledge, statistical analysis, and programming, the ways in which notebook users find and fix errors in these different forms might be different. In this paper, we present an exploratory, observational study into how notebook users find and understand potential errors in notebooks. We presented users with notebooks pre-populated with common notebook errors-errors rooted in either the statistical data analysis, the knowledge of domain concepts, or in the programming. We then analyzed the strategies our study participants used to find these errors and determined how successful each strategy was at identifying errors. Our findings indicate that while the notebook programming environment is different from the environments used for traditional programming, debugging strategies remain quite similar. It is our hope that the insights presented in this paper will help both notebook tool designers and educators make changes to improve how data scientists discover errors more easily in the notebooks they write.

翻译：计算笔记本中的调试可能与传统的 IDE 驱动的编程不同。更具体地说, 创建笔记本可以混合域知识、统计分析和编程, 笔记本用户发现和修正这些不同形式的错误的方法可能不同。在本文中, 我们提出一个探索性的观察性研究, 研究笔记本用户如何发现和理解笔记本中的潜在错误。我们向用户介绍的笔记本与共同的笔记本错误仪的预集模式, 其根植于统计数据分析、域概念知识或编程中。我们然后分析我们的研究参与者用来找到这些错误的战略, 并确定每项战略在辨别错误方面多么成功。我们的研究结果表明, 笔记本编程环境与传统编程所使用的环境不同, 解调策略仍然非常相似。我们希望, 本文中提供的洞察力将帮助笔记本的科学家更方便地改进笔记本工具设计师和教育工作者的写错误。