优化对非常小数据集的预测:关于公开来源项目健康预测的案例研究 (Optimizing Predictions for Very Small Data Sets: a case study on Open-Source Project Health Prediction) - 专知论文

会员服务 ·

0

Projection · Learning · 优化器 · INFORMS · Less ·

2023 年 1 月 16 日

Optimizing Predictions for Very Small Data Sets: a case study on Open-Source Project Health Prediction

翻译：优化对非常小数据集的预测:关于公开来源项目健康预测的案例研究

Andre Lustosa,Tim Menzies

When learning from very small data sets, the resulting models can make many mistakes. For example, consider learning predictors for open source project health. The training data for this task may be very small (e.g. five years of data, collected every month means just 60 rows of training data). Using this data, prior work had unacceptably large errors in their learned predictors. We show that these high errors rates can be tamed by better configuration of the control parameters of the machine learners. For example, we present here a {\em landscape analytics} method (called SNEAK) that (a)~clusters the data to find the general landscape of the hyperparameters; then (b)~explores a few representatives from each part of that landscape. SNEAK is both faster and and more effective than prior state-of-the-art hyperparameter optimization algorithms (FLASH, HYPEROPT, OPTUNA, and differential evolution). More importantly, the configurations found by SNEAK had far less error that other methods. We conjecture that SNEAK works so well since it finds the most informative regions of the hyperparameters, then jumps to those regions. Other methods (that do not reflect over the landscape) can waste time exploring less informative options. From this, we make the following conclusions. Firstly, for predicting open source project health, we recommend landscape analytics (e.g.SNEAK). Secondly, and more generally, when learning from very small data sets, using hyperparameter optimization (e.g. SNEAK) to select learning control parameters. Due to its speed and implementation simplicity, we suggest SNEAK might also be useful in other ``data-light'' SE domains. To assist other researchers in repeating, improving, or even refuting our results, all our scripts and data are available on GitHub at https://github.com/zxcv123456qwe/niSneak

翻译：当从非常小的数据集中学习时, 由此产生的模型可以造成许多错误。例如, 我们在这里展示了一个用于开源项目健康的学习预测器。用于此任务的培训数据可能非常小( 例如, 五年的数据, 每个月收集60行的培训数据 ) 。使用此数据, 先前的工作在所学的预测器中出现令人无法接受的大错误。我们显示这些高误差率可以通过更好地配置机器学习者的控制参数来驯化。例如, 我们在这里展示了一个 hiem 地分析参数( 称为 SNEAK ) 的方法 ( a) ~ 利用数据来寻找超分量参数的全局; 然后( b) ~ 利用数据来查找每个部分的数据。 Snellchy 不仅速度更快, 也比先前的超光速优化算法( Flash, HyPEROPT, OPUNA, 和不同变异的变法) 。更重要的是, Snechechal 发现所有的方法都比其他方法更不那么简单。然后我们预测Snrech 工作得这么精细,, 。

0

相关内容

Projection

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

苦荞主要产量相关性状的全基因组关联分析

国家自然科学基金

0+阅读 · 2015年12月31日

内质网Ca2+感受器STIM1调控糖尿病冠状动脉平滑肌细胞表型转化的机制

国家自然科学基金

0+阅读 · 2014年12月31日

Faecalibacterium prausnitzii协同LFA-1在炎症性肠病发生中调控淋巴细胞分化及功能的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

肿瘤相关巨噬细胞分泌CCL18上调HOTAIR促进食管癌转移

国家自然科学基金

0+阅读 · 2012年12月31日

高迁移率族蛋白1通过上调肝癌病人kupffer细胞Toll样受体和IL-33表达来促进Th17细胞的功能

国家自然科学基金

0+阅读 · 2012年12月31日

BMP-7经MEK/ERK通路上调CXCR4表达促进BM-MSCs向缺血再灌注肾脏归巢的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

慢性牙周炎与冠心病相关性动物模型建立及抗炎干预治疗研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

利用GPS与IM/WS干涉测量监测鲜水河断层变形

国家自然科学基金

0+阅读 · 2008年12月31日

Optimal Design of Validation Experiments for the Prediction of Quantities of Interest

Arxiv

0+阅读 · 2023年3月10日

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Arxiv

0+阅读 · 2023年3月10日

Iterative Convex Optimization for Model Predictive Control with Discrete-Time High-Order Control Barrier Functions

Arxiv

0+阅读 · 2023年3月9日

Simulation-based, Finite-sample Inference for Privatized Data

Arxiv

0+阅读 · 2023年3月9日

Discovering a change point in a time series of organoid networks via the iso-mirror

Arxiv

0+阅读 · 2023年3月8日

DiM: Distilling Dataset into Generative Model

Arxiv

0+阅读 · 2023年3月8日

Maximum likelihood estimation and prediction error for a Mat{é}rn model on the circle

Arxiv

0+阅读 · 2023年3月8日

Learning Imbalanced Data with Vision Transformers

Arxiv

11+阅读 · 2023年3月8日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

A Comparative Study for Unsupervised Network Representation Learning

Arxiv

24+阅读 · 2020年3月11日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Optimal Design of Validation Experiments for the Prediction of Quantities of Interest

Arxiv

0+阅读 · 2023年3月10日

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Arxiv

0+阅读 · 2023年3月10日

Iterative Convex Optimization for Model Predictive Control with Discrete-Time High-Order Control Barrier Functions

Arxiv

0+阅读 · 2023年3月9日

Simulation-based, Finite-sample Inference for Privatized Data

Arxiv

0+阅读 · 2023年3月9日

Discovering a change point in a time series of organoid networks via the iso-mirror

Arxiv

0+阅读 · 2023年3月8日

DiM: Distilling Dataset into Generative Model

Arxiv

0+阅读 · 2023年3月8日

Maximum likelihood estimation and prediction error for a Mat{é}rn model on the circle

Arxiv

0+阅读 · 2023年3月8日

Learning Imbalanced Data with Vision Transformers

Arxiv

11+阅读 · 2023年3月8日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

A Comparative Study for Unsupervised Network Representation Learning

Arxiv

24+阅读 · 2020年3月11日

相关基金

苦荞主要产量相关性状的全基因组关联分析

国家自然科学基金

0+阅读 · 2015年12月31日

内质网Ca2+感受器STIM1调控糖尿病冠状动脉平滑肌细胞表型转化的机制

国家自然科学基金

0+阅读 · 2014年12月31日

Faecalibacterium prausnitzii协同LFA-1在炎症性肠病发生中调控淋巴细胞分化及功能的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

Kronheimer-Nakajima quiver 模空间与有理曲面

国家自然科学基金

1+阅读 · 2013年12月31日

肿瘤相关巨噬细胞分泌CCL18上调HOTAIR促进食管癌转移

国家自然科学基金

0+阅读 · 2012年12月31日

高迁移率族蛋白1通过上调肝癌病人kupffer细胞Toll样受体和IL-33表达来促进Th17细胞的功能

国家自然科学基金

0+阅读 · 2012年12月31日

BMP-7经MEK/ERK通路上调CXCR4表达促进BM-MSCs向缺血再灌注肾脏归巢的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

慢性牙周炎与冠心病相关性动物模型建立及抗炎干预治疗研究

国家自然科学基金

0+阅读 · 2009年12月31日

TR3相互作用新蛋白机理研究

国家自然科学基金

1+阅读 · 2008年12月31日

利用GPS与IM/WS干涉测量监测鲜水河断层变形

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员