中时代码重复 (Just-in-Time Code Duplicates Extraction)

Refactoring is a critical task in software maintenance, and is usually performed to enforce better design and coding practices, while coping with design defects. The Extract Method refactoring is widely used for merging duplicate code fragments into a single new method. Several studies attempted to recommend Extract Method refactoring opportunities using different techniques, including program slicing, program dependency graph analysis, change history analysis, structural similarity, and feature extraction. However, irrespective of the method, most of the existing approaches interfere with the developer's workflow: they require the developer to stop coding and analyze the suggested opportunities, and also consider all refactoring suggestions in the entire project without focusing on the development context. To increase the adoption of the Extract Method refactoring, in this paper, we aim to investigate the effectiveness of machine learning and deep learning algorithms for its recommendation while maintaining the workflow of the developer. The proposed approach relies on mining prior applied Extract Method refactorings and extracting their features to train a deep learning classifier that detects them in the user's code. We implemented our approach as a plugin for IntelliJ IDEA called AntiCopyPaster. To develop our approach, we trained and evaluated various popular models on a dataset of 18,942 code fragments from 13 Open Source Apache projects. The results show that the best model is the Convolutional Neural Network (CNN), which recommends appropriate Extract Method refactorings with an F-measure of 0.82. We also conducted a qualitative study with 72 developers to evaluate the usefulness of the developed plugin. The results show that developers tend to appreciate the idea of the approach and are satisfied with various aspects of the plugin's operation.

翻译：重新定位是软件维护中的一项关键任务,通常是为了在应对设计缺陷的同时实施更好的设计和编码做法,而通常是为了在应对设计缺陷的同时实施更好的设计和编码做法。抽取方法的重新定位被广泛用于将重复的代码碎片合并成单一的新方法。一些研究试图建议采用不同技术,包括程序剪切、程序依赖图分析、改变历史分析、结构相似性和特征提取等方法, 来重新定位抽取。然而, 不论采用何种方法, 大多数现有方法都会干扰开发者的工作流程: 它们要求开发者停止编码和分析所建议的机会, 并且也考虑整个项目中的所有再添加质量建议, 而不侧重于开发环境。为了更多地采用抽取方法的重新设定机会, 本文中我们的目的是在保持开发者工作流程的同时, 调查机器学习的有效性和深度学习算法。拟议的方法依赖于开采之前应用的抽取方法的模型, 来训练一个深层次的电算器, 在用户代码中, 我们用IntellJ Ral Restrial 数据网络应用了一个合适的插件, 我们用经过培训的版本的 Restrial AnPADIDLA As As Exlistal Produstrual 。