Mining repetitive code changes from version control history is a common way of discovering unknown change patterns. Such change patterns can be used in code recommender systems or automated program repair techniques. While there are such tools and datasets exist for Java, there is little work on finding and recommending such changes in Python. In this paper, we present a data set of manually vetted generalizable Python repetitive code change patterns. We create a coding guideline to identify generalizable change patterns that can be used in automated tooling. We leverage the mined change patterns from recent work that mines repetitive changes in Python projects and use our coding guideline to manually review the patterns. For each change, we also record a description of the change and why it is applied along with other characteristics such as the number of projects it occurs in. This review process allows us to identify and share 72 Python change patterns that can be used to build and advance Python developer support tools.
翻译:一个通用 Python 代码变更模式数据集
翻译后的摘要:
从版本控制历史中挖掘重复的代码变更是发现未知变更模式的常见方法。这些变更模式可以用于代码推荐系统或自动化程序修复技术。虽然有 Java 的类似工具和数据集,但在 Python 中找到和推荐这样的变更模式的研究较少。在本文中,我们提供了一个手动审核过的、适用于 Python 的通用重复代码变更模式数据集。我们创建了一个编码指南,以识别可用于自动化工具的通用性变更模式。我们利用最近针对 Python 项目挖掘重复变更的研究中发现的这些变更模式,并使用我们的编码指南手动审核它们。对于每个变更,我们还记录了变更描述及其应用原因,以及其他特征,如发生变更的项目数量。这个审核过程使我们能够识别和分享 72 种 Python 变更模式,这些变更模式可以用于构建和改进 Python 开发者支持工具。