评估ML4SE模型的项目一级价格调整 (Assessing Project-Level Fine-Tuning of ML4SE Models) - 专知论文

会员服务 ·

0

Projection · MoDELS · Learning · 代码 · INFORMS ·

2022 年 6 月 7 日

Assessing Project-Level Fine-Tuning of ML4SE Models

翻译：评估ML4SE模型的项目一级价格调整

Egor Bogomolov,Sergey Zhuravlev,Egor Spirin,Timofey Bryksin

from arxiv, 12 pages, 3 figures

Machine Learning for Software Engineering (ML4SE) is an actively growing research area that focuses on methods that help programmers in their work. In order to apply the developed methods in practice, they need to achieve reasonable quality in order to help rather than distract developers. While the development of new approaches to code representation and data collection improves the overall quality of the models, it does not take into account the information that we can get from the project at hand. In this work, we investigate how the model's quality can be improved if we target a specific project. We develop a framework to assess quality improvements that models can get after fine-tuning for the method name prediction task on a particular project. We evaluate three models of different complexity and compare their quality in three settings: trained on a large dataset of Java projects, further fine-tuned on the data from a particular project, and trained from scratch on this data. We show that per-project fine-tuning can greatly improve the models' quality as they capture the project's domain and naming conventions. We open-source the tool we used for data collection, as well as the code to run the experiments: https://zenodo.org/record/6040745.

翻译：软件工程机器学习(ML4SE)是一个正在积极增长的研究领域,重点是帮助程序设计员开展工作的方法。为了在实践中应用开发的方法,他们需要达到合理的质量,以便帮助而不是分散开发者。虽然制定新的代码代表和数据收集方法可以提高模型的总体质量,但是它没有考虑到我们从手头项目中获得的信息。在这项工作中,我们调查如果我们针对一个特定项目,模型的质量如何可以提高。我们开发了一个框架,以评估模型在对某一特定项目的方法名称预测任务进行微调后能够取得的质量改进。我们评估了三种不同复杂程度的模型,并在三个环境中比较了它们的质量:对爪哇项目大型数据集进行了培训,对特定项目的数据进行了进一步微调,并从零头对数据进行了培训。我们表明,每个项目的微调可以极大地提高模型的质量,因为它们能够捕捉到项目的域和命名公约。我们打开了用于数据收集的工具,以及进行实验的代码是:https://zenodo.org/record7440/605.。

0

相关内容

Projection

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

专知会员服务

13+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

油酰乙醇胺对缺血性脑卒中神经血管稳态重构的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

低杂波与托卡马克边缘等离子体耦合的粒子模拟研究

国家自然科学基金

0+阅读 · 2013年12月31日

BiFeO3不同格位缺陷的特征及其对电磁性能的影响

国家自然科学基金

0+阅读 · 2013年12月31日

HIV-1 Tat蛋白诱发心肌间质纤维化促致死性心律失常

国家自然科学基金

0+阅读 · 2012年12月31日

片状玻璃填料增强的低收缩耐水解光固化复合树脂的研究

国家自然科学基金

0+阅读 · 2012年12月31日

拟Frobenius-Lusztig核

国家自然科学基金

0+阅读 · 2012年12月31日

HIV-1 Nef蛋白促进KSHV K1诱导血管和肿瘤形成：信号通路与miRNAs的作用

国家自然科学基金

0+阅读 · 2012年12月31日

MRP1/ABCC1基因3＇UTR单核苷酸多态性介导miRNA对原发性肝癌多药耐药性的影响

国家自然科学基金

0+阅读 · 2012年12月31日

多层梯度多场耦合纳米复合材料的性能分析及优化设计

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Exploring CLIP for Assessing the Look and Feel of Images

Arxiv

0+阅读 · 2022年7月25日

Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?

Arxiv

0+阅读 · 2022年7月25日

An Encryption Method of ConvMixer Models without Performance Degradation

Arxiv

0+阅读 · 2022年7月25日

No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence

Arxiv

0+阅读 · 2022年7月24日

AutoWeird: Weird Translational Scoring Function Identified by Random Search

Arxiv

0+阅读 · 2022年7月24日

Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced Data

Arxiv

0+阅读 · 2022年7月22日

Multi-Event-Camera Depth Estimation and Outlier Rejection by Refocused Events Fusion

Arxiv

0+阅读 · 2022年7月21日

The Birth of Bias: A case study on the evolution of gender bias in an English language model

Arxiv

0+阅读 · 2022年7月21日

CTL-MTNet: A Novel CapsNet and Transfer Learning-Based Mixed Task Net for the Single-Corpus and Cross-Corpus Speech Emotion Recognition

Arxiv

0+阅读 · 2022年7月18日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

【课程】Andrew Ng与Google Brain团队联合出品《TensorFlow in Practice 》

专知会员服务

13+阅读 · 2019年10月29日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

相关论文

Exploring CLIP for Assessing the Look and Feel of Images

Arxiv

0+阅读 · 2022年7月25日

Is GPT-3 all you need for Visual Question Answering in Cultural Heritage?

Arxiv

0+阅读 · 2022年7月25日

An Encryption Method of ConvMixer Models without Performance Degradation

Arxiv

0+阅读 · 2022年7月25日

No More Fine-Tuning? An Experimental Evaluation of Prompt Tuning in Code Intelligence

Arxiv

0+阅读 · 2022年7月24日

AutoWeird: Weird Translational Scoring Function Identified by Random Search

Arxiv

0+阅读 · 2022年7月24日

Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced Data

Arxiv

0+阅读 · 2022年7月22日

Multi-Event-Camera Depth Estimation and Outlier Rejection by Refocused Events Fusion

Arxiv

0+阅读 · 2022年7月21日

The Birth of Bias: A case study on the evolution of gender bias in an English language model

Arxiv

0+阅读 · 2022年7月21日

CTL-MTNet: A Novel CapsNet and Transfer Learning-Based Mixed Task Net for the Single-Corpus and Cross-Corpus Speech Emotion Recognition

Arxiv

0+阅读 · 2022年7月18日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

相关基金

油酰乙醇胺对缺血性脑卒中神经血管稳态重构的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

低杂波与托卡马克边缘等离子体耦合的粒子模拟研究

国家自然科学基金

0+阅读 · 2013年12月31日

BiFeO3不同格位缺陷的特征及其对电磁性能的影响

国家自然科学基金

0+阅读 · 2013年12月31日

HIV-1 Tat蛋白诱发心肌间质纤维化促致死性心律失常

国家自然科学基金

0+阅读 · 2012年12月31日

片状玻璃填料增强的低收缩耐水解光固化复合树脂的研究

国家自然科学基金

0+阅读 · 2012年12月31日

拟Frobenius-Lusztig核

国家自然科学基金

0+阅读 · 2012年12月31日

HIV-1 Nef蛋白促进KSHV K1诱导血管和肿瘤形成：信号通路与miRNAs的作用

国家自然科学基金

0+阅读 · 2012年12月31日

MRP1/ABCC1基因3＇UTR单核苷酸多态性介导miRNA对原发性肝癌多药耐药性的影响

国家自然科学基金

0+阅读 · 2012年12月31日

多层梯度多场耦合纳米复合材料的性能分析及优化设计

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员