Jupyter Notebook is a popular tool among data analysts and scientists for working with data. It provides a way to combine code, documentation, and visualizations in a single, interactive environment, facilitating code reuse. While code reuse can improve programming efficiency, it can also decrease readability, security, and overall performance. We conduct a large-scale exploratory study of code reuse practices in the Jupyter Notebook development community on the Stack Overflow platform to understand the potential negative impacts of code reuse. Our findings identified 1,097,470 Jupyter Notebook clone pairs that reuse Stack Overflow code snippets, and the average code snippet has 7.91 code quality violations. Through our research, we gain insight into the reasons behind Jupyter Notebook developers' decision to reuse code and the potential drawbacks of this practice.
翻译:Jupyter Notebook是数据分析家和科学家与数据合作的流行工具,它提供了一种在单一的互动环境中将代码、文档和可视化结合起来的方法,便利了代码再利用。虽然代码再利用可以提高编程效率,但也可以降低可读性、安全性和总体性能。我们在Stack Overflow平台上对Jupyter Notebook开发界的代码再利用做法进行大规模探索性研究,以了解代码再利用的潜在负面影响。我们发现1,097,470 Jupyter Notebook克隆夫妇重新使用堆积码片,而普通代码片有7.91代码质量上的违规现象。我们通过研究,深入了解了Jupyter Notebook开发者决定再利用代码的原因以及这种做法的潜在缺陷。