Code completion is widely used by software developers to provide coding suggestions given a partially written code snippet. Apart from the traditional code completion methods, which only support single token completion at minimal positions, recent studies show the ability to provide longer code completion at more flexible positions. However, such frequently triggered and longer completion results reduce the overall precision as they generate more invalid results. Moreover, different studies are mostly incompatible with each other. Thus, it is vital to develop an ensemble framework that can combine results from multiple models to draw merits and offset defects of each model. This paper conducts a coding simulation to collect data from code context and different code completion models and then apply the data in two tasks. First, we introduce an acceptance model which can dynamically control whether to display completion results to the developer. It uses simulation features to predict whether correct results exist in the output of these models. Our best model reduces the percentage of false-positive completion from 55.09% to 17.44%. Second, we design a fusion ranking scheme that can automatically identify the priority of the completion results and reorder the candidates from multiple code completion models. This scheme is flexible in dealing with various models, regardless of the type or the length of their completion results. We integrate this ranking scheme with two frequency models and a GPT-2 styled language model, along with the acceptance model to yield 27.80% and 37.64% increase in TOP1 and TOP5 accuracy, respectively. In addition, we propose a new code completion evaluation metric, Benefit-Cost Ratio(BCR), taking into account the benefit of keystrokes saving and hidden cost of completion list browsing, which is closer to real coder experience scenario.
翻译:软件开发者广泛使用代码完成功能来提供带有部分写成代码片段的编码建议。 除了传统的代码完成方法(仅支持在最小位置上单象征性完成)之外, 最近的研究显示, 能够在更灵活的位置上提供更长的代码完成功能。 然而, 如此频繁和更长的完成结果会降低总体精确性, 因为它们产生更无效的结果。 此外, 不同的研究大多互不相容。 因此, 关键是要开发一个组合框架, 将多个模型的结果结合起来, 以得出优缺点并抵消每个模型的缺陷。 本文进行编码完成方法模拟, 从代码背景和不同的代码完成模型收集数据, 然后在两个任务中应用数据。 首先, 我们引入一个接收模型, 可以动态地控制是否向开发者显示完成结果。 它使用模拟功能来预测这些模型的输出结果是否正确。 我们的最佳模型将错误的完成百分比从55. 09% 降低17.44 %。 其次, 我们设计一个组合排序计划, 可以自动确定完成结果的优先顺序, 从多个代码完成模型中重新排列候选人的顺序, 从多个代码完成模型应用到两个任务中的数据。 这个计划可以灵活地处理各种模型, 精度, 和精度的精度将精度方案的精度与精度 与精度的精度方案的精度 与精度和精度的精度模型分别与精度与精度的精度与精度 的精度与精度的精度的精度与精度比分别与精度的精度 。