摆脱时间坑:坑落和使用基于时间的地基数据的指导方针 (Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data) - 专知论文

会员服务 ·

0

MSR · Git · Engineering · MINE · 可辨认的 ·

2021 年 3 月 21 日

Escaping the Time Pit: Pitfalls and Guidelines for Using Time-Based Git Data

翻译：摆脱时间坑:坑落和使用基于时间的地基数据的指导方针

Samuel W. Flint,Jigyasa Chauhan,Robert Dyer

from arxiv, Accepted to the 18th International Conference on Mining Software Repositories (MSR 2021)

Many software engineering research papers rely on time-based data (e.g., commit timestamps, issue report creation/update/close dates, release dates). Like most real-world data however, time-based data is often dirty. To date, there are no studies that quantify how frequently such data is used by the software engineering research community, or investigate sources of and quantify how often such data is dirty. Depending on the research task and method used, including such dirty data could affect the research results. This paper presents the first survey of papers that utilize time-based data, published in the Mining Software Repositories (MSR) conference series. Out of the 690 technical track and data papers published in MSR 2004--2020, we saw at least 35% of papers utilized time-based data. We then used the Boa and Software Heritage infrastructures to help identify and quantify several sources of dirty commit timestamp data. Finally we provide guidelines/best practices for researchers utilizing time-based data from Git repositories.

翻译：许多软件工程研究文件依赖基于时间的数据(例如,承诺时间戳、发布报告创建/更新/关闭日期、发布日期),但与大多数现实世界数据一样,时间数据往往肮脏。迄今为止,还没有研究量化软件工程研究界使用这些数据的频率,或调查这些数据的频率,或调查此类数据的来源,并量化这些数据的频率。视研究任务和方法而定,包括此类肮脏数据可能会影响研究成果。本文件首次调查利用采矿软件储存库系列会议公布的基于时间的数据的文件。在2004-2020年采矿软件储存库出版的690份技术轨道和数据文件中,我们看到至少35%的文件使用了基于时间的数据。我们随后利用博阿和软件遗产基础设施帮助识别和量化若干基于时间戳的数据。最后,我们为研究人员利用来自Git储存库的时间数据提供了指导方针/最佳做法。

0

相关内容

MSR

挖掘软件存储库（MSR）会议分析软件存储库中可用的丰富数据，以发现有关软件系统和项目的有趣和可操作的信息。官网链接：http://www.msrconf.org/

人工智能顶会WSDM2021优秀论文奖(Best Paper Award Runner-Up)出炉

人工智能顶会WSDM2021优秀论文奖(Best Paper Award Runner-Up)出炉

专知会员服务

19+阅读 · 2021年2月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

专知会员服务

55+阅读 · 2020年8月28日

【微众银行】联邦学习白皮书_v2.0，48页pdf，

【微众银行】联邦学习白皮书_v2.0，48页pdf，

专知会员服务

169+阅读 · 2020年4月26日

可视化特征属性基线的影响，Visualizing the Impact of Feature Attribution Baselines

可视化特征属性基线的影响，Visualizing the Impact of Feature Attribution Baselines

专知会员服务

10+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

计算机 | 中低难度国际会议信息8条

计算机 | 中低难度国际会议信息8条

Call4Papers

9+阅读 · 2019年6月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Greenplum: A Hybrid Database for Transactional and Analytical Workloads

Arxiv

0+阅读 · 2021年5月14日

Advances in Machine and Deep Learning for Modeling and Real-time Detection of Multi-Messenger Sources

Arxiv

0+阅读 · 2021年5月13日

Grey Literature in Software Engineering: A Critical Review

Arxiv

0+阅读 · 2021年5月12日

A Decade Survey of Content Based Image Retrieval using Deep Learning

Arxiv

23+阅读 · 2020年11月23日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Anomalous Instance Detection in Deep Learning: A Survey

Anomalous Instance Detection in Deep Learning: A Survey

Arxiv

29+阅读 · 2020年3月16日

Robust breast cancer detection in mammography and digital breast tomosynthesis using annotation-efficient deep learning approach

Robust breast cancer detection in mammography and digital breast tomosynthesis using annotation-efficient deep learning approach

Arxiv

14+阅读 · 2019年12月27日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Task-Free Continual Learning

Arxiv

6+阅读 · 2018年12月10日

A Benchmark Study on Sentiment Analysis for Software Engineering Research

Arxiv

3+阅读 · 2018年3月17日

VIP会员

文章信息

相关主题

相关VIP内容

人工智能顶会WSDM2021优秀论文奖(Best Paper Award Runner-Up)出炉

人工智能顶会WSDM2021优秀论文奖(Best Paper Award Runner-Up)出炉

专知会员服务

19+阅读 · 2021年2月19日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

【2020Manning新书】微前端实战，Micro Frontends in Action，296页pdf

专知会员服务

55+阅读 · 2020年8月28日

【微众银行】联邦学习白皮书_v2.0，48页pdf，

【微众银行】联邦学习白皮书_v2.0，48页pdf，

专知会员服务

169+阅读 · 2020年4月26日

可视化特征属性基线的影响，Visualizing the Impact of Feature Attribution Baselines

可视化特征属性基线的影响，Visualizing the Impact of Feature Attribution Baselines

专知会员服务

10+阅读 · 2020年1月16日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

计算机 | 中低难度国际会议信息8条

计算机 | 中低难度国际会议信息8条

Call4Papers

9+阅读 · 2019年6月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

计算机 | ISMAR 2019等国际会议信息8条

计算机 | ISMAR 2019等国际会议信息8条

Call4Papers

3+阅读 · 2019年3月5日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

动物脑的好奇心和强化学习的好奇心

动物脑的好奇心和强化学习的好奇心

CreateAMind

10+阅读 · 2019年1月26日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Greenplum: A Hybrid Database for Transactional and Analytical Workloads

Arxiv

0+阅读 · 2021年5月14日

Advances in Machine and Deep Learning for Modeling and Real-time Detection of Multi-Messenger Sources

Arxiv

0+阅读 · 2021年5月13日

Grey Literature in Software Engineering: A Critical Review

Arxiv

0+阅读 · 2021年5月12日

A Decade Survey of Content Based Image Retrieval using Deep Learning

Arxiv

23+阅读 · 2020年11月23日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Anomalous Instance Detection in Deep Learning: A Survey

Anomalous Instance Detection in Deep Learning: A Survey

Arxiv

29+阅读 · 2020年3月16日

Robust breast cancer detection in mammography and digital breast tomosynthesis using annotation-efficient deep learning approach

Robust breast cancer detection in mammography and digital breast tomosynthesis using annotation-efficient deep learning approach

Arxiv

14+阅读 · 2019年12月27日

Multi-Task Self-Supervised Learning for Disfluency Detection

Arxiv

5+阅读 · 2019年8月15日

Task-Free Continual Learning

Arxiv

6+阅读 · 2018年12月10日

A Benchmark Study on Sentiment Analysis for Software Engineering Research

Arxiv

3+阅读 · 2018年3月17日

微信扫码咨询专知VIP会员