评估早期鸟类营养评估(预测项目质量) (Assessing the Early Bird Heuristic (for Predicting Project Quality)) - 专知论文

会员服务 ·

0

Projection · MoDELS · INFORMS · SimPLe · Learning ·

2023 年 1 月 11 日

Assessing the Early Bird Heuristic (for Predicting Project Quality)

翻译：评估早期鸟类营养评估(预测项目质量)

N. C. Shrikanth,Tim Menzies

from arxiv, 38 pages (Accepted TOSEM Jan 2023)

Before researchers rush to reason across all available data or try complex methods, perhaps it is prudent to first check for simpler alternatives. Specifically, if the historical data has the most information in some small region, perhaps a model learned from that region would suffice for the rest of the project. To support this claim, we offer a case study with 240 projects, where we find that the information in those projects "clump" towards the earliest parts of the project. A quality prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives. Using just this "early bird" data, we can build models very quickly and very early in the project life cycle. Moreover, using this early bird method, we have shown that a simple model (with just a few features) generalizes to hundreds of projects. Based on this experience, we doubt that prior work on generalizing quality models may have needlessly complicated an inherently simple process. Further, prior work that focused on later-life cycle data needs to be revisited since their conclusions were drawn from relatively uninformative regions. Replication note: all our data and scripts are available here: https://github.com/snaraya7/early-bird

翻译：在研究人员匆忙地对所有现有数据进行解释或尝试复杂的方法之前,也许明智的做法是首先检查更简单的替代方法。具体地说,如果历史数据在某些小地区拥有最多的信息,也许一个从该地区学到的模型就足以满足项目的其余部分。为了支持这一主张,我们提供了240个项目的案例研究,我们发现这些项目中的信息“聚集”到项目的最早部分。仅仅从最初的150个项目中获得的高质量预测模型也投入了工作,或者比最先进的替代方法更好。仅仅使用这一“早期鸟”数据,我们就可以在项目生命周期中非常迅速和非常早地建立模型。此外,我们使用这一早期的鸟方法,我们展示了一个简单的模型(只有几个特点)可以概括成百个项目。根据这一经验,我们怀疑以前关于普及质量模型的工作可能毫无疑问地复杂一个内在简单的过程。此外,由于从相对不具有说明性的区域得出结论,需要重新审视以前侧重于较晚寿命周期数据的工作。再重复说明:我们所有的数据和脚本都在这里提供: https://gibal/libarmas/coms。

0

相关内容

Projection

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

浅海声场时频特性与warping变换方法应用研究

国家自然科学基金

0+阅读 · 2014年12月31日

嗜热链球菌基因多样性及其风味物质代谢多样性的研究

国家自然科学基金

0+阅读 · 2014年12月31日

延边地区脂联素信号传导通路基因Poly-miRTS与冠心病的相关研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

STIM1突变与核浆钙信号调控

国家自然科学基金

0+阅读 · 2012年12月31日

miRNA调控hERG基因表达的机制及其在药物获得性长QT综合征的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cayley图的匹配可扩性和semi-Cayley图的谱

国家自然科学基金

0+阅读 · 2011年12月31日

下丘脑CRHR介导抑郁症发病的表观遗传机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Investigating the contribution of authors' and papers' characteristics to predict the scholars' h-index

Arxiv

0+阅读 · 2023年3月7日

An Efficient Data Integration Scheme for Synthesizing Information from Multiple Secondary Datasets for the Parameter Inference of the Main Analysis

Arxiv

0+阅读 · 2023年3月6日

Model-matching Principle Applied to the Design of an Array-based All-neural Binaural Rendering System for Audio Telepresence

Arxiv

0+阅读 · 2023年3月6日

Robustness, Evaluation and Adaptation of Machine Learning Models in the Wild

Arxiv

7+阅读 · 2023年3月5日

Calibration of Quantum Decision Theory: Aversion to Large Losses and Predictability of Probabilistic Choices

Arxiv

0+阅读 · 2023年3月3日

Box constraints and weighted sparsity regularization for identifying sources in elliptic PDEs

Arxiv

0+阅读 · 2023年3月3日

On the Security Vulnerabilities of Text-to-SQL Models

Arxiv

1+阅读 · 2023年3月3日

Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods

Arxiv

22+阅读 · 2022年4月30日

Forecasting: theory and practice

Arxiv

57+阅读 · 2022年1月5日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Arxiv

21+阅读 · 2021年9月2日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

数据科学导论，54页ppt，Introduction to Data Science

数据科学导论，54页ppt，Introduction to Data Science

专知会员服务

42+阅读 · 2020年7月27日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄罗斯的未来战争方式第三部分：军事改革》报告

《俄罗斯的未来战争方式第一部分：国家动员》报告

《在单一作战合成环境（SSE）中运用人工智能与大型语言模型以提供灵活人文地形及可信角色组》报告

《俄罗斯的未来战争方式第二部分：核威慑》报告

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

相关论文

Investigating the contribution of authors' and papers' characteristics to predict the scholars' h-index

Arxiv

0+阅读 · 2023年3月7日

An Efficient Data Integration Scheme for Synthesizing Information from Multiple Secondary Datasets for the Parameter Inference of the Main Analysis

Arxiv

0+阅读 · 2023年3月6日

Model-matching Principle Applied to the Design of an Array-based All-neural Binaural Rendering System for Audio Telepresence

Arxiv

0+阅读 · 2023年3月6日

Robustness, Evaluation and Adaptation of Machine Learning Models in the Wild

Arxiv

7+阅读 · 2023年3月5日

Calibration of Quantum Decision Theory: Aversion to Large Losses and Predictability of Probabilistic Choices

Arxiv

0+阅读 · 2023年3月3日

Box constraints and weighted sparsity regularization for identifying sources in elliptic PDEs

Arxiv

0+阅读 · 2023年3月3日

On the Security Vulnerabilities of Text-to-SQL Models

Arxiv

1+阅读 · 2023年3月3日

Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods

Arxiv

22+阅读 · 2022年4月30日

Forecasting: theory and practice

Arxiv

57+阅读 · 2022年1月5日

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

Arxiv

21+阅读 · 2021年9月2日

相关基金

浅海声场时频特性与warping变换方法应用研究

国家自然科学基金

0+阅读 · 2014年12月31日

嗜热链球菌基因多样性及其风味物质代谢多样性的研究

国家自然科学基金

0+阅读 · 2014年12月31日

延边地区脂联素信号传导通路基因Poly-miRTS与冠心病的相关研究

国家自然科学基金

0+阅读 · 2014年12月31日

长链非编码RNA CAR intergenic 10在细胞衰老中的作用和机制

国家自然科学基金

1+阅读 · 2013年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

STIM1突变与核浆钙信号调控

国家自然科学基金

0+阅读 · 2012年12月31日

miRNA调控hERG基因表达的机制及其在药物获得性长QT综合征的作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

Cayley图的匹配可扩性和semi-Cayley图的谱

国家自然科学基金

0+阅读 · 2011年12月31日

下丘脑CRHR介导抑郁症发病的表观遗传机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员