We have developed a plugin for IntelliJ IDEA called AntiCopyPaster that tracks the pasting of code inside the IDE and suggests appropriate Extract Method refactorings to combat the propagation of duplicates. To implement the plugin, we gathered a dataset of code fragments that should and should not be extracted, compiled a list of metrics of code that can influence the decision, and trained several popular classifying machine learning models, of which a gradient boosting classifier showed the best results. When a developer pastes a code fragment, the plugin searches for duplicates in the currently opened file. If there are any, it waits for a short period of time to allow the developer to edit the code. If the code duplicates are still present after a delay, AntiCopyPaster calculates the metrics for the fragment and inferences the decision: if the fragment should be extracted, the plugin suggests to refactor it. This can help the developers to keep their code clean and save them future maintenance time by providing the possibility to refactor code timely and without losing the context. You can find the plugin and its source code on GitHub at https://github.com/JetBrains-Research/anti-copy-paster.
翻译:我们为IntelliJIDA开发了一个名为 antopyPaster 的插件,该插件名为 antopyPaster, 用来跟踪在 IDE 内部粘贴代码的情况, 并建议适当的 抽取方法 重新设置 来打击复制器的传播。 为了执行插件, 我们收集了一组应该和不应该提取的代码碎片数据集。 我们收集了一组能够影响决定的代码, 汇编了一份能够影响决定的参数清单, 并培训了几个流行的机器学习模型, 其中梯度递增分类器展示了最佳结果 。 当开发者粘贴了一个代码碎片时, 插件会搜索当前打开的文件中的复制件。 如果有的话, 它会等待很短的时间来允许开发者编辑代码。 如果代码复制者在延迟之后仍然存在, AntiCopyPaster 计算了该碎片的参数, 并推断了决定: 如果碎片被提取, 插件会建议重写它。 这可以帮助开发者保持其代码的清洁和保存时间, 保存它们未来的维护时间, 通过提供及时重写代码的可能性, 而不丢失上下文。 您可以在 Gistrabsbs/ repubs/ repubs.