Computational notebooks have emerged as the platform of choice for data science and analytical workflows, enabling rapid iteration and exploration. By keeping intermediate program state in memory and segmenting units of execution into so-called "cells", notebooks allow users to execute their workflows interactively and enjoy particularly tight feedback. However, as cells are added, removed, reordered, and rerun, this hidden intermediate state accumulates in a way that is not necessarily correlated with the notebook's visible code, making execution behavior difficult to reason about, and leading to errors and lack of reproducibility. We present NBSafety, a custom Jupyter kernel that uses runtime tracing and static analysis to automatically manage lineage associated with cell execution and global notebook state. NBSafety detects and prevents errors that users make during unaided notebook interactions, all while preserving the flexibility of existing notebook semantics. We evaluate NBSafety's ability to prevent erroneous interactions by replaying and analyzing 666 real notebook sessions. Of these, NBSafety identified 117 sessions with potential safety errors, and in the remaining 549 sessions, the cells that NBSafety identified as resolving safety issues were more than $7\times$ more likely to be selected by users for re-execution compared to a random baseline, even though the users were not using NBSafety and were therefore not influenced by its suggestions.
翻译:计算笔记本已成为数据、 和分析工作流程的首选平台, 能够快速复制和探索。 将中间程序状态保留在记忆和分解执行单位的中间程序状态, 将执行单位保留在所谓的“ 细胞” 中, 笔记本可以让用户互动执行工作流程, 并享受特别紧的反馈。 但是, 随着细胞的添加、 去除、 重新排序和重新运行, 这种隐藏的中间状态积累的方式不一定与笔记本的可见代码相关, 使得执行行为难以解释, 并导致影响错误和缺乏可复制性。 我们展示了NBSAFEty, 一个定制的Juppyter内核内核, 使用运行时间追踪和静态分析自动管理与细胞执行和全球笔记本状态相关的线条线条。 NBSAfety 检测和防止用户在未辅助笔记本互动过程中的错误, 保持了现有笔记笔记本结构的灵活性。 我们评估NBSAfety 防止错误互动的能力, 通过重写和分析666个实际笔记本建议, 导致错误和无法再复制。 其中, NBSAfafty 找出了117个会议可能的安全错误, 而不是比749 的用户被确定为更接近的基会议。