Computational notebooks have become popular for Exploratory Data Analysis (EDA), augmented by LLM-based code generation and result interpretation. Effective LLM assistance hinges on selecting informative context -- the minimal set of cells whose code, data, or outputs suffice to answer a prompt. As notebooks grow long and messy, users can lose track of the mental model of their analysis. They thus fail to curate appropriate contexts for LLM tasks, causing frustration and tedious prompt engineering. We conducted a formative study (n=6) that surfaced challenges in LLM context selection and mental model maintenance. Therefore, we introduce NoteEx, a JupyterLab extension that provides a semantic visualization of the EDA workflow, allowing analysts to externalize their mental model, specify analysis dependencies, and enable interactive selection of task-relevant contexts for LLMs. A user study (n=12) against a baseline shows that NoteEx improved mental model retention and context selection, leading to more accurate and relevant LLM responses.
翻译:计算笔记本已成为探索性数据分析(EDA)的流行工具,并借助基于大语言模型(LLM)的代码生成与结果解释得以增强。有效的LLM辅助依赖于选择信息丰富的上下文——即代码、数据或输出足以响应提示的最小单元格集合。随着笔记本内容变得冗长且杂乱,用户可能逐渐丢失其分析过程中的心智模型,从而无法为LLM任务筛选合适的上下文,导致挫败感和繁琐的提示工程。我们开展了一项形成性研究(n=6),揭示了LLM上下文选择与心智模型维护中的挑战。为此,我们提出了NoteEx——一个JupyterLab扩展,它通过语义可视化呈现EDA工作流,使分析人员能够外化其心智模型、指定分析依赖关系,并支持为LLM交互式选择任务相关的上下文。一项与基线对比的用户研究(n=12)表明,NoteEx提升了心智模型保持度与上下文选择质量,从而获得了更准确、更相关的LLM响应。