Program representation learning is a fundamental task in software engineering applications. With the availability of "big code" and the development of deep learning techniques, various program representation learning models have been proposed to understand the semantic properties of programs and applied on different software engineering tasks. However, no previous study has comprehensively assessed the generalizability of these deep models on different tasks, so that the pros and cons of the models are unclear. In this experience paper, we try to bridge this gap by systemically evaluating the performance of eight program representation learning models on three common tasks, where six models are based on abstract syntax trees and two models are based on plain text of source code. We kindly explain the criteria for selecting the models and tasks, as well as the method for enabling end-to-end learning in each task. The results of performance evaluation show that they perform diversely in each task and the performance of the AST-based models is generally unstable over different tasks. In order to further explain the results, we apply a prediction attribution technique to find what elements are captured by the models and responsible for the predictions in each task. Based on the findings, we discuss some general principles for better capturing the information in the source code, and hope to inspire researchers to improve program representation learning methods for software engineering tasks.

0
下载
关闭预览

相关内容

Despite being able to capture a range of features of the data, high accuracy models trained with supervision tend to make similar predictions. This seemingly implies that high-performing models share similar biases regardless of training methodology, which would limit ensembling benefits and render low-accuracy models as having little practical use. Against this backdrop, recent work has made very different training techniques, such as large-scale contrastive learning, yield competitively-high accuracy on generalization and robustness benchmarks. This motivates us to revisit the assumption that models necessarily learn similar functions. We conduct a large-scale empirical study of models across hyper-parameters, architectures, frameworks, and datasets. We find that model pairs that diverge more in training methodology display categorically different generalization behavior, producing increasingly uncorrelated errors. We show these models specialize in subdomains of the data, leading to higher ensemble performance: with just 2 models (each with ImageNet accuracy ~76.5%), we can create ensembles with 83.4% (+7% boost). Surprisingly, we find that even significantly low-accuracy models can be used to improve high-accuracy models. Finally, we show diverging training methodology yield representations that capture overlapping (but not supersetting) feature sets which, when combined, lead to increased downstream performance.

0
0
下载
预览

Model efficiency is a critical aspect of developing and deploying machine learning models. Inference time and latency directly affect the user experience, and some applications have hard requirements. In addition to inference costs, model training also have direct financial and environmental impacts. Although there are numerous well-established metrics (cost indicators) for measuring model efficiency, researchers and practitioners often assume that these metrics are correlated with each other and report only few of them. In this paper, we thoroughly discuss common cost indicators, their advantages and disadvantages, and how they can contradict each other. We demonstrate how incomplete reporting of cost indicators can lead to partial conclusions and a blurred or incomplete picture of the practical considerations of different models. We further present suggestions to improve reporting of efficiency metrics.

0
0
下载
预览

The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This survey provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond. In this survey, we provide a synthesis of existing work as a single point of entry for practitioners who wish to gain a better understanding of how to apply transformers to text ranking problems and researchers who wish to pursue work in this area. We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that attempt to perform ranking directly. There are two themes that pervade our survey: techniques for handling long documents, beyond the typical sentence-by-sentence processing approaches used in NLP, and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). Although transformer architectures and pretraining techniques are recent innovations, many aspects of how they are applied to text ranking are relatively well understood and represent mature techniques. However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the field is heading.

0
19
下载
预览

A code completion system suggests future code elements to developers given a partially-complete code snippet. Code completion is one of the most useful features in Integrated Development Environments (IDEs). Currently, most code completion techniques predict a single token at a time. In this paper, we take a further step and discuss the probability of directly completing a whole line of code instead of a single token. We believe suggesting longer code sequences can further improve the efficiency of developers. Recently neural language models have been adopted as a preferred approach for code completion, and we believe these models can still be applied to full-line code completion with a few improvements. We conduct our experiments on two real-world python corpora and evaluate existing neural models based on source code tokens or syntactical actions. The results show that neural language models can achieve acceptable results on our tasks, with significant room for improvements.

0
3
下载
预览

There has been appreciable progress in unsupervised network representation learning (UNRL) approaches over graphs recently with flexible random-walk approaches, new optimization objectives and deep architectures. However, there is no common ground for systematic comparison of embeddings to understand their behavior for different graphs and tasks. In this paper we theoretically group different approaches under a unifying framework and empirically investigate the effectiveness of different network representation methods. In particular, we argue that most of the UNRL approaches either explicitly or implicit model and exploit context information of a node. Consequently, we propose a framework that casts a variety of approaches -- random walk based, matrix factorization and deep learning based -- into a unified context-based optimization function. We systematically group the methods based on their similarities and differences. We study the differences among these methods in detail which we later use to explain their performance differences (on downstream tasks). We conduct a large-scale empirical study considering 9 popular and recent UNRL techniques and 11 real-world datasets with varying structural properties and two common tasks -- node classification and link prediction. We find that there is no single method that is a clear winner and that the choice of a suitable method is dictated by certain properties of the embedding methods, task and structural properties of the underlying graph. In addition we also report the common pitfalls in evaluation of UNRL methods and come up with suggestions for experimental design and interpretation of results.

0
23
下载
预览

There has been considerable growth and interest in industrial applications of machine learning (ML) in recent years. ML engineers, as a consequence, are in high demand across the industry, yet improving the efficiency of ML engineers remains a fundamental challenge. Automated machine learning (AutoML) has emerged as a way to save time and effort on repetitive tasks in ML pipelines, such as data pre-processing, feature engineering, model selection, hyperparameter optimization, and prediction result analysis. In this paper, we investigate the current state of AutoML tools aiming to automate these tasks. We conduct various evaluations of the tools on many datasets, in different data segments, to examine their performance, and compare their advantages and disadvantages on different test cases.

0
3
下载
预览

Deep learning has penetrated all aspects of our lives and brought us great convenience. However, the process of building a high-quality deep learning system for a specific task is not only time-consuming but also requires lots of resources and relies on human expertise, which hinders the development of deep learning in both industry and academia. To alleviate this problem, a growing number of research projects focus on automated machine learning (AutoML). In this paper, we provide a comprehensive and up-to-date study on the state-of-the-art AutoML. First, we introduce the AutoML techniques in details according to the machine learning pipeline. Then we summarize existing Neural Architecture Search (NAS) research, which is one of the most popular topics in AutoML. We also compare the models generated by NAS algorithms with those human-designed models. Finally, we present several open problems for future research.

0
39
下载
预览

There has been appreciable progress in unsupervised network representation learning (UNRL) approaches over graphs recently with flexible random-walk approaches, new optimization objectives and deep architectures. However, there is no common ground for systematic comparison of embeddings to understand their behavior for different graphs and tasks. In this paper we theoretically group different approaches under a unifying framework and empirically investigate the effectiveness of different network representation methods. In particular, we argue that most of the UNRL approaches either explicitly or implicit model and exploit context information of a node. Consequently, we propose a framework that casts a variety of approaches -- random walk based, matrix factorization and deep learning based -- into a unified context-based optimization function. We systematically group the methods based on their similarities and differences. We study the differences among these methods in detail which we later use to explain their performance differences (on downstream tasks). We conduct a large-scale empirical study considering 9 popular and recent UNRL techniques and 11 real-world datasets with varying structural properties and two common tasks -- node classification and link prediction. We find that there is no single method that is a clear winner and that the choice of a suitable method is dictated by certain properties of the embedding methods, task and structural properties of the underlying graph. In addition we also report the common pitfalls in evaluation of UNRL methods and come up with suggestions for experimental design and interpretation of results.

0
5
下载
预览

The previous work for event extraction has mainly focused on the predictions for event triggers and argument roles, treating entity mentions as being provided by human annotators. This is unrealistic as entity mentions are usually predicted by some existing toolkits whose errors might be propagated to the event trigger and argument role recognition. Few of the recent work has addressed this problem by jointly predicting entity mentions, event triggers and arguments. However, such work is limited to using discrete engineering features to represent contextual information for the individual tasks and their interactions. In this work, we propose a novel model to jointly perform predictions for entity mentions, event triggers and arguments based on the shared hidden representations from deep learning. The experiments demonstrate the benefits of the proposed method, leading to the state-of-the-art performance for event extraction.

0
10
下载
预览

Neural word embeddings have been widely used in biomedical Natural Language Processing (NLP) applications since they provide vector representations of words that capture the semantic properties of words and the linguistic relationship between words. Many biomedical applications use different textual sources to train word embeddings and apply these word embeddings to downstream biomedical applications. However, there has been little work on comprehensively evaluating the word embeddings trained from these resources. In this study, we provide a comprehensive empirical evaluation of word embeddings trained from four different resources, namely clinical notes, biomedical publications, Wikepedia, and news. We perform the evaluation qualitatively and quantitatively. In qualitative evaluation, we manually inspect five most similar medical words to a given set of target medical words, and then analyze word embeddings through the visualization of those word embeddings. Quantitative evaluation falls into two categories: extrinsic and intrinsic evaluation. Based on the evaluation results, we can draw the following conclusions. First, EHR and PubMed can capture the semantics of medical terms better than GloVe and Google News and find more relevant similar medical terms. Second, the medical semantic similarity captured by the word embeddings trained on EHR and PubMed are closer to human experts' judgments, compared to these trained on GloVe and Google News. Third, there does not exist a consistent global ranking of word embedding quality for downstream biomedical NLP applications. However, adding word embeddings as extra features will improve results on most downstream tasks. Finally, word embeddings trained from a similar domain corpus do not necessarily have better performance than other word embeddings for any downstream biomedical tasks.

0
3
下载
预览
小贴士
相关论文
Raphael Gontijo-Lopes,Yann Dauphin,Ekin D. Cubuk
0+阅读 · 10月26日
Mostafa Dehghani,Anurag Arnab,Lucas Beyer,Ashish Vaswani,Yi Tay
0+阅读 · 10月25日
Jimmy Lin,Rodrigo Nogueira,Andrew Yates
19+阅读 · 2020年10月13日
Towards Full-line Code Completion with Neural Language Models
Wenhan Wang,Sijie Shen,Ge Li,Zhi Jin
3+阅读 · 2020年9月18日
Megha Khosla,Vinay Setty,Avishek Anand
23+阅读 · 2020年3月11日
Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools
Anh Truong,Austin Walters,Jeremy Goodsitt,Keegan Hines,C. Bayan Bruss,Reza Farivar
3+阅读 · 2019年9月3日
AutoML: A Survey of the State-of-the-Art
Xin He,Kaiyong Zhao,Xiaowen Chu
39+阅读 · 2019年8月14日
Megha Khosla,Avishek Anand,Vinay Setty
5+阅读 · 2019年3月19日
Trung Minh Nguyen,Thien Huu Nguyen
10+阅读 · 2018年12月1日
Yanshan Wang,Sijia Liu,Naveed Afzal,Majid Rastegar-Mojarad,Liwei Wang,Feichen Shen,Hongfang Liu
3+阅读 · 2018年2月1日
相关VIP内容
专知会员服务
24+阅读 · 2020年12月2日
专知会员服务
14+阅读 · 2020年9月6日
强化学习最新教程,17页pdf
专知会员服务
69+阅读 · 2019年10月11日
【新书】Python编程基础,669页pdf
专知会员服务
105+阅读 · 2019年10月10日
[综述]深度学习下的场景文本检测与识别
专知会员服务
41+阅读 · 2019年10月10日
【哈佛大学商学院课程Fall 2019】机器学习可解释性
专知会员服务
54+阅读 · 2019年10月9日
相关资讯
Hierarchically Structured Meta-learning
CreateAMind
12+阅读 · 2019年5月22日
TorchSeg:基于pytorch的语义分割算法开源了
极市平台
19+阅读 · 2019年1月28日
强化学习的Unsupervised Meta-Learning
CreateAMind
7+阅读 · 2019年1月7日
无监督元学习表示学习
CreateAMind
20+阅读 · 2019年1月4日
Unsupervised Learning via Meta-Learning
CreateAMind
29+阅读 · 2019年1月3日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
10+阅读 · 2018年12月24日
Hierarchical Disentangled Representations
CreateAMind
3+阅读 · 2018年4月15日
【论文】变分推断(Variational inference)的总结
机器学习研究会
24+阅读 · 2017年11月16日
Capsule Networks解析
机器学习研究会
10+阅读 · 2017年11月12日
【论文】图上的表示学习综述
机器学习研究会
8+阅读 · 2017年9月24日
Top