《守则》普遍代表制 (Universal Representation for Code)

Learning from source code usually requires a large amount of labeled data. Despite the possible scarcity of labeled data, the trained model is highly task-specific and lacks transferability to different tasks. In this work, we present effective pre-training strategies on top of a novel graph-based code representation, to produce universal representations for code. Specifically, our graph-based representation captures important semantics between code elements (e.g., control flow and data flow). We pre-train graph neural networks on the representation to extract universal code properties. The pre-trained model then enables the possibility of fine-tuning to support various downstream applications. We evaluate our model on two real-world datasets -- spanning over 30M Java methods and 770K Python methods. Through visualization, we reveal discriminative properties in our universal code representation. By comparing multiple benchmarks, we demonstrate that the proposed framework achieves state-of-the-art results on method name prediction and code graph link prediction.

翻译：从源代码中学习通常需要大量标签数据。尽管标签数据可能稀缺,但经过培训的模型具有高度的任务特性,不能转移到不同的任务中。在这项工作中,我们除了以新的图形为基础的代码代表外,还展示了有效的培训前战略,以生成通用代码代表。具体地说,我们的图形代表方式在代码要素(例如控制流程和数据流)之间捕捉了重要的语义。我们在表达方式上预培训图形神经网络以提取通用代码属性。经过培训的模型随后使得有可能进行微调以支持各种下游应用。我们评估了我们两个真实世界数据集的模型 -- -- 涵盖超过30M Java方法和770K Python方法。通过直观化,我们揭示了我们通用代码代表方式中的歧视性特性。通过比较多个基准,我们证明拟议框架在方法名称预测和代码链接预测方面达到了最新的结果。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日