Notebooks provide an interactive environment for programmers to develop code, analyse data and inject interleaved visualizations in a single environment. Despite their flexibility, a major pitfall that data scientists encounter is unexpected behaviour caused by the unique out-of-order execution model of notebooks. As a result, data scientists face various challenges ranging from notebook correctness, reproducibility and cleaning. In this paper, we propose a framework that performs static analysis on notebooks, incorporating their unique execution semantics. Our framework is general in the sense that it accommodate for a wide range of analyses, useful for various notebook use cases. We have instantiated our framework on a diverse set of analyses and have evaluated them on 2211 real world notebooks. Our evaluation demonstrates that the vast majority (98.7%) of notebooks can be analysed in less than a second, well within the time frame required by interactive notebook clients
翻译:笔记本提供了一个互动的环境,使程序员可以在单一环境中开发代码、分析数据和输入断裂视觉。尽管数据科学家具有灵活性,但数据科学家遇到的一个主要陷阱是独特的笔记本执行模式造成的意外行为。结果,数据科学家面临笔记本正确性、可复制性和清洁等各种挑战。在本文件中,我们提出了一个框架,对笔记本进行静态分析,并结合其独特的执行语义。我们的框架是一般性的,它包含广泛的分析,对各种笔记本使用案例有用。我们已经在一套多样的分析中回荡了我们的框架,并在2211个真实世界笔记本上对其进行了评估。我们的评估表明,绝大多数笔记本(98.7%)可以在互动笔记本客户所要求的时间框架内,在不到一秒的时间范围内进行分析。