通过语义-保留方案转变,建立学习守则代表的有效框架 (Efficient Framework for Learning Code Representations through Semantic-Preserving Program Transformations) - 专知论文

会员服务 ·

0

学成 · contrastive · 变换 · 对比学习 · 标注 ·

2020 年 10 月 9 日

Efficient Framework for Learning Code Representations through Semantic-Preserving Program Transformations

翻译：通过语义-保留方案转变,建立学习守则代表的有效框架

Recent learning techniques for the representation of code depend mostly on human-annotated (labeled) data. In this work, we are proposing Corder, a self-supervised learning system that can learn to represent code without having to label data. The key innovation is that we train the source code model by asking it to recognize similar and dissimilar code snippets through a contrastive learning paradigm. We use a set of semantic-preserving transformation operators to generate snippets that are syntactically diverse but semantically equivalent. The contrastive learning objective, at the same time, maximizes agreement between different views of the same snippets and minimizes agreement between transformed views of different snippets. We train different instances of Corder on 3 neural network encoders, which are Tree-based CNN, ASTNN, and Code2vec over 2.5 million unannotated Java methods mined from GitHub. Our result shows that the Corder pre-training improves code classification and method name prediction with large margins. Furthermore, the code vectors generated by Corder are adapted to code clustering which has been shown to significantly beat the other baselines.

翻译：在这项工作中,我们提出Corder,这是一个自我监督的学习系统,可以学习在无需标签数据的情况下代表代码。关键的创新是,我们通过要求源代码模型通过对比性学习模式承认相似和不同的代码片段来培训源代码模型。我们使用一套语义保存转换操作器生成片段,这些片段在实际操作上是多种多样的,但在语义上是等同的。同时,对比性学习目标将同一片段的不同观点之间达成最大程度的一致,并将不同片段不同转变观点之间的协议最小化。我们培训了3个神经网络编码模型的不同案例,即基于树木的CNN、ASTNNN和从GitHub开采的250多万无注释 Java方法。我们的结果显示,Corder-训练前改进了代码分类和大边距的方法名称预测。此外,Corder生成的代码矢量器被调整为代码组合,大大地打破了其他基线。

0

相关内容

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

专知会员服务

43+阅读 · 2020年7月19日

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【斯坦福大学】领域自适应小样本生成（DAWSON: A Domain Adaptive Few Shot Generation Framework）

【斯坦福大学】领域自适应小样本生成（DAWSON: A Domain Adaptive Few Shot Generation Framework）

专知会员服务

36+阅读 · 2020年1月7日

已删除

将门创投

3+阅读 · 2019年9月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

On the Generalizability of Neural Program Models with respect to Semantic-Preserving Program Transformations

Arxiv

0+阅读 · 2020年11月24日

Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning

Arxiv

1+阅读 · 2020年11月22日

Self-Supervised Ranking for Representation Learning

Arxiv

0+阅读 · 2020年11月20日

FROST: Faster and more Robust One-shot Semi-supervised Training

Arxiv

0+阅读 · 2020年11月20日

Self-supervised transfer learning of physiological representations from free-living wearable data

Arxiv

0+阅读 · 2020年11月18日

Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

Arxiv

4+阅读 · 2020年3月5日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Jointly Learning Entity and Relation Representations for Entity Alignment

Arxiv

3+阅读 · 2019年9月20日

S$^\mathbf{4}$L: Self-Supervised Semi-Supervised Learning

Arxiv

5+阅读 · 2019年5月9日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Arxiv

10+阅读 · 2018年5月10日

VIP会员

文章信息

相关主题

相关VIP内容

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

【2020 最新论文】节点邻近的图池化的层次表示学习 Graph Pooling with Node Proximity for Hierarchical Representation Learning

专知会员服务

43+阅读 · 2020年7月19日

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

【CVPR2020】在线深度聚类的无监督表示学习, Online Deep Clustering for Unsupervised Representation Learning

专知会员服务

69+阅读 · 2020年6月19日

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

Time2Vec：学习时间的向量表示，Time2Vec: Learning a Vector Representation of Time

专知会员服务

36+阅读 · 2020年5月10日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

【伯克利】最新《深度半监督学习》总述，146页ppt，Semi-Supervised Learning

专知会员服务

147+阅读 · 2020年4月11日

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

【ACL2020-Facebook AI】跨语言表示学习，Unsupervised Cross-lingual Representation Learning at Scale

专知会员服务

27+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【斯坦福大学】领域自适应小样本生成（DAWSON: A Domain Adaptive Few Shot Generation Framework）

【斯坦福大学】领域自适应小样本生成（DAWSON: A Domain Adaptive Few Shot Generation Framework）

专知会员服务

36+阅读 · 2020年1月7日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

已删除

将门创投

3+阅读 · 2019年9月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

On the Generalizability of Neural Program Models with respect to Semantic-Preserving Program Transformations

Arxiv

0+阅读 · 2020年11月24日

Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning

Arxiv

1+阅读 · 2020年11月22日

Self-Supervised Ranking for Representation Learning

Arxiv

0+阅读 · 2020年11月20日

FROST: Faster and more Robust One-shot Semi-supervised Training

Arxiv

0+阅读 · 2020年11月20日

Self-supervised transfer learning of physiological representations from free-living wearable data

Arxiv

0+阅读 · 2020年11月18日

Factorized Graph Representations for Semi-Supervised Learning from Sparse Data

Arxiv

4+阅读 · 2020年3月5日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Jointly Learning Entity and Relation Representations for Entity Alignment

Arxiv

3+阅读 · 2019年9月20日

S$^\mathbf{4}$L: Self-Supervised Semi-Supervised Learning

Arxiv

5+阅读 · 2019年5月9日

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Arxiv

10+阅读 · 2018年5月10日

微信扫码咨询专知VIP会员