学习使用 kecheches 的完整代码 (Learning to Complete Code with Sketches)

Code completion is usually cast as a language modelling problem, i.e., continuing an input in a left-to-right fashion. However, in practice, some parts of the completion (e.g., string literals) may be very hard to predict, whereas subsequent parts directly follow from the context. To handle this, we instead consider the scenario of generating code completions with "holes" inserted in places where a model is uncertain. We develop Grammformer, a Transformer-based model that guides code generation by the programming language grammar, and compare it to a variety of more standard sequence models. We train the models on code completion for C# and Python given partial code context. To evaluate models, we consider both ROUGE as well as a new metric RegexAcc that measures success of generating completions matching long outputs with as few holes as possible. In our experiments, Grammformer generates 10-50% more accurate completions compared to traditional generative models and 37-50% longer sketches compared to sketch-generating baselines trained with similar techniques.

翻译：代码完成通常是一个语言建模问题, 也就是说, 继续以左对右的方式输入。但是, 实际上, 完成的部分( 如字符串字典) 可能很难预测, 而随后的部分直接随上下文而来。要解决这个问题, 我们考虑在模型不确定的地方用“ 洞” 插入“ 洞” 生成代码完成的假想。我们开发了基于 Grammerxe 的变异器模型, 通过编程语言语法来指导代码生成, 并将其与各种更标准的序列模型进行比较。我们培训了 C# 和 Python 的代码完成模型, 并给部分代码环境做了部分代码背景。为了评估模型, 我们考虑将 ROUGE 和新的 Time RegexAcc 都作为衡量完成成功率的衡量标准, 与尽可能少的孔相匹配。在我们的实验中, 格拉姆xw 生成了10- 50% 的精确完成率比传统的谱化模型要高10- 50%, 和37- 50% 50% 的素描图比以类似技术训练的素制作基线要长。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日