软件缺陷预测中为深学习过度抽样的价值 (On the Value of Oversampling for Deep Learning in Software Defect Prediction) - 专知论文

会员服务 ·

0

过采样 · 学成 · CASE · 深度学习 · Engineering ·

2021 年 3 月 18 日

On the Value of Oversampling for Deep Learning in Software Defect Prediction

翻译：软件缺陷预测中为深学习过度抽样的价值

Rahul Yedida,Tim Menzies

from arxiv, v2, revision 1; submitted to TSE

One truism of deep learning is that the automatic feature engineering (seen in the first layers of those networks) excuses data scientists from performing tedious manual feature engineering prior to running DL. For the specific case of deep learning for defect prediction, we show that that truism is false. Specifically, when we preprocess data with a novel oversampling technique called fuzzy sampling, as part of a larger pipeline called GHOST (Goal-oriented Hyper-parameter Optimization for Scalable Training), then we can do significantly better than the prior DL state of the art in 14/20 defect data sets. Our approach yields state-of-the-art results significantly faster deep learners. These results present a cogent case for the use of oversampling prior to applying deep learning on software defect prediction datasets.

翻译：深层学习的一个道理是,自动特征工程(在这些网络的最初几层中都看到)为数据科学家在运行 DL 之前不进行无聊的手工特征工程提供了借口。关于深入学习预测缺陷的具体案例,我们证明,这种三重理论是虚假的。具体地说,当我们用一种叫作模糊取样的新颖的过度采样技术来预处理数据时,作为称为GHOST(面向目标的超参数优化可缩放培训)的更大管道的一部分,然后我们可以大大改进14/20的缺陷数据集中以前的DL状态。我们的方法产生最先进的深层学习者。这些结果为在对软件缺陷预测数据集进行深层学习之前使用过度采样提供了令人信服的证据。

2

相关内容

过采样

最新《贝叶斯深度学习》综述论文，35页pdf，A Survey on Bayesian Deep Learning

最新《贝叶斯深度学习》综述论文，35页pdf，A Survey on Bayesian Deep Learning

专知会员服务

209+阅读 · 2020年7月5日

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

专知会员服务

174+阅读 · 2020年5月1日

【牛津大学】深度学习时间序列预测，Time Series Forecasting With Deep Learning: A Survey

【牛津大学】深度学习时间序列预测，Time Series Forecasting With Deep Learning: A Survey

专知会员服务

142+阅读 · 2020年4月30日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

专知会员服务

47+阅读 · 2019年12月1日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

3+阅读 · 2018年10月11日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Interval Deep Learning for Uncertainty Quantification in Safety Applications

Interval Deep Learning for Uncertainty Quantification in Safety Applications

Arxiv

0+阅读 · 2021年5月13日

AutoDebias: Learning to Debias for Recommendation

Arxiv

1+阅读 · 2021年5月10日

Flow Sampling: Network Monitoring in Large-Scale Software-Defined IoT Networks

Arxiv

0+阅读 · 2021年5月5日

Multi-Task Learning for Dense Prediction Tasks: A Survey

Multi-Task Learning for Dense Prediction Tasks: A Survey

Arxiv

5+阅读 · 2020年9月16日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

Federated Learning for Mobile Keyboard Prediction

Federated Learning for Mobile Keyboard Prediction

Arxiv

5+阅读 · 2018年11月8日

Deep Structured Prediction with Nonlinear Output Transformations

Arxiv

4+阅读 · 2018年11月1日

Deep Reinforcement Learning: An Overview

Arxiv

15+阅读 · 2018年6月23日

A Study on Overfitting in Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年4月20日

DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction

Arxiv

6+阅读 · 2018年4月12日

VIP会员

文章信息

相关主题

相关VIP内容

最新《贝叶斯深度学习》综述论文，35页pdf，A Survey on Bayesian Deep Learning

最新《贝叶斯深度学习》综述论文，35页pdf，A Survey on Bayesian Deep Learning

专知会员服务

209+阅读 · 2020年7月5日

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

【牛津大学】深度学习时间序列预测，12页pdf, Deep Learning Time Series Forecasting

专知会员服务

174+阅读 · 2020年5月1日

【牛津大学】深度学习时间序列预测，Time Series Forecasting With Deep Learning: A Survey

【牛津大学】深度学习时间序列预测，Time Series Forecasting With Deep Learning: A Survey

专知会员服务

142+阅读 · 2020年4月30日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

专知会员服务

47+阅读 · 2019年12月1日

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

面向机器学习和数据分析的特征工程（Feature Engineering for Machine Learning and Data Analytics），附新书419页pdf

专知会员服务

62+阅读 · 2019年10月26日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

美军小型无人机项目

无人机蜂群——作为执行非常规战争的创新工具 | 2025最新文献

不确定环境下无人机与无人地面车辆编队的地下勘探规划算法 | 122页

接纳无人机多样性：西方军事在无人机战争中适应的五个挑战 | 28页报告

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

已删除

将门创投

3+阅读 · 2018年10月11日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Interval Deep Learning for Uncertainty Quantification in Safety Applications

Interval Deep Learning for Uncertainty Quantification in Safety Applications

Arxiv

0+阅读 · 2021年5月13日

AutoDebias: Learning to Debias for Recommendation

Arxiv

1+阅读 · 2021年5月10日

Flow Sampling: Network Monitoring in Large-Scale Software-Defined IoT Networks

Arxiv

0+阅读 · 2021年5月5日

Multi-Task Learning for Dense Prediction Tasks: A Survey

Multi-Task Learning for Dense Prediction Tasks: A Survey

Arxiv

5+阅读 · 2020年9月16日

Residual Policy Learning

Residual Policy Learning

Arxiv

4+阅读 · 2018年12月15日

Federated Learning for Mobile Keyboard Prediction

Federated Learning for Mobile Keyboard Prediction

Arxiv

5+阅读 · 2018年11月8日

Deep Structured Prediction with Nonlinear Output Transformations

Arxiv

4+阅读 · 2018年11月1日

Deep Reinforcement Learning: An Overview

Arxiv

15+阅读 · 2018年6月23日

A Study on Overfitting in Deep Reinforcement Learning

Arxiv

7+阅读 · 2018年4月20日

DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction

Arxiv

6+阅读 · 2018年4月12日

微信扫码咨询专知VIP会员