快速调时代码语言模型,作为统计数据制成部分代码中类型推断的神经知识库 (Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code)

Partial code usually involves non-fully-qualified type names (non-FQNs) and undeclared receiving objects. Resolving the FQNs of these non-FQN types and undeclared receiving objects (referred to as type inference) is the prerequisite to effective search and reuse of partial code. Existing dictionary-lookup based methods build a symbolic knowledge base of API names and code contexts, which involve significant compilation overhead and are sensitive to unseen API names and code context variations. In this paper, we formulate type inference as a cloze-style fill-in-blank language task. Built on source code naturalness, our approach fine-tunes a code masked language model (MLM) as a neural knowledge base of code elements with a novel "pre-train, prompt and predict" paradigm from raw source code. Our approach is lightweight and has minimum requirements on code compilation. Unlike existing symbolic name and context matching for type inference, our prompt-tuned code MLM packs FQN syntax and usage in its parameters and supports fuzzy neural type inference. We systematically evaluate our approach on a large amount of source code from GitHub and Stack Overflow. Our results confirm the effectiveness of our approach design and the practicality for partial code type inference. As the first of its kind, our neural type inference method opens the door to many innovative ways of using partial code.

翻译：部分代码通常包括不完全合格的类型名称(非FQNs)和未申报的接收对象。解决这些非FQN类型和未申报的接收对象的FQNs(称为类型推断)是有效搜索和重新使用部分代码的先决条件。现有的基于字典的查看方法建立了API名称和代码背景的象征性知识库,其中涉及大量编译间接费用,并且对未知的API名称和代码背景变异十分敏感。在本文中,我们将推断作为凝块式填充-空白语言任务进行输入。在源代码自然性质方面,我们的方法是微调一种代号遮掩语言模型(称为“类型推断”),作为代码要素的神经知识库(称为“类型推断、快速和预测”),这是原始源代码的新型“前置、快速和预测”模式。我们的方法是轻量的,对代码汇编有最低要求。与现有的符号名称和背景匹配,我们迅速调整的代码MLMQN合成和其参数中的使用,支持模糊性神经型自然特性,并且支持模糊的神经型语言模式模式模式模式模式模型。我们从原始设计方法中系统地评估了我们的许多设计方法。