Sensemaking is the iterative process of identifying, extracting, and explaining insights from data, where each iteration is referred to as the "sensemaking loop." Although recent work observes snapshots of the sensemaking loop within computational notebooks, none measure shifts in sensemaking behaviors over time -- between exploration and explanation. This gap limits our ability to understand the full scope of the sensemaking process and thus our ability to design tools to fully support sensemaking. We contribute the first quantitative method to characterize how sensemaking evolves within data science computational notebooks. To this end, we conducted a quantitative study of 2,574 Jupyter notebooks mined from GitHub. First, we identify data science-focused notebooks that have undergone significant iterations. Second, we present regression models that automatically characterize sensemaking activity within individual notebooks by assigning them a score representing their position within the sensemaking spectrum. Finally, we use our regression models to calculate and analyze shifts in notebook scores across GitHub versions. Our results show that notebook authors participate in a diverse range of sensemaking tasks over time, such as annotation, branching analysis, and documentation. Finally, we propose design recommendations for extending notebook environments to support the sensemaking behaviors we observed.
翻译:感知的形成是一个迭代的过程,即从数据中识别、提取和解释洞见,每个迭代被称为“感知环”的迭代过程。虽然最近的工作观察了计算笔记本中感知环的速记,但没有测量时间里感知行为的变化 -- -- 探索和解释之间的变化。这种差距限制了我们理解感知过程的全部范围的能力,从而限制了我们设计工具以充分支持感知行为的能力。我们贡献了第一个量化方法来描述数据科学计算笔记本中感知的演变过程。为此,我们进行了一项定量研究,从GitHub中提取了2,574本Jupyter笔记本。首先,我们确定了以科学为重点的数据笔记本,这些笔记本经历了重大的迭代。第二,我们提出了回归模型,通过给个人笔记本中指定一个分数来自动描述感知活动。最后,我们用回归模型来计算和分析笔记本在GitHub各版笔记本分数的变化。我们的结果显示,笔记本作者参与了一系列的感知性任务,例如注、分支分析和笔记本环境。最后,我们建议设计了一种感知环境。