改善数据不平衡数据中代表不足的观察的预测的抽样 (Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data) - 专知论文

会员服务 ·

0

Performer · MoDELS · 样本 · Better · 情景 ·

2021 年 12 月 16 日

Sampling To Improve Predictions For Underrepresented Observations In Imbalanced Data

翻译：改善数据不平衡数据中代表不足的观察的预测的抽样

Rune D. Kjærsgaard,Manja G. Grønberg,Line K. H. Clemmensen

from arxiv, Presented at Workshop on Data-Centric AI (NeurIPS 2021); v2/v3 fixed incorrect axis labels

Data imbalance is common in production data, where controlled production settings require data to fall within a narrow range of variation and data are collected with quality assessment in mind, rather than data analytic insights. This imbalance negatively impacts the predictive performance of models on underrepresented observations. We propose sampling to adjust for this imbalance with the goal of improving the performance of models trained on historical production data. We investigate the use of three sampling approaches to adjust for imbalance. The goal is to downsample the covariates in the training data and subsequently fit a regression model. We investigate how the predictive power of the model changes when using either the sampled or the original data for training. We apply our methods on a large biopharmaceutical manufacturing data set from an advanced simulation of penicillin production and find that fitting a model using the sampled data gives a small reduction in the overall predictive performance, but yields a systematically better performance on underrepresented observations. In addition, the results emphasize the need for alternative, fair, and balanced model evaluations.

翻译：在生产数据中,数据不平衡现象很常见,受控生产环境要求数据属于范围狭窄的变异范围,而数据是在质量评估的基础上收集的,而不是数据分析的洞察力。这种不平衡现象对代表性不足的观测模型的预测性表现产生了负面影响。我们建议抽样以适应这种不平衡,目的是改善经过历史生产数据培训的模型的性能。我们调查三种抽样方法的使用情况,以适应不平衡现象。目标是缩小培训数据中的共变数,随后适合回归模式。我们调查在使用抽样或原始培训数据时模型变化的预测力。我们采用的方法是,从青霉素生产的高级模拟中,对一套大型生物制药制造数据进行应用。我们发现,利用抽样数据来安装模型,可以小幅减少总体预测性表现,但能系统地改善代表性不足观测的绩效。此外,结果强调需要采用替代、公平和平衡的模型评估。

0

相关内容

Performer

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

专知会员服务

17+阅读 · 2022年2月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

48+阅读 · 2020年5月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

人工智能 | NIPS 2019等国际会议信息8条

人工智能 | NIPS 2019等国际会议信息8条

Call4Papers

7+阅读 · 2019年3月21日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

人工智能 | 国际会议信息6条

人工智能 | 国际会议信息6条

Call4Papers

5+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

9+阅读 · 2018年12月19日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

Accuracy Gains from Privacy Amplification Through Sampling for Differential Privacy

Accuracy Gains from Privacy Amplification Through Sampling for Differential Privacy

Arxiv

0+阅读 · 2022年2月21日

Smooth multi-period forecasting with application to prediction of COVID-19 cases

Arxiv

0+阅读 · 2022年2月20日

ImportantAug: a data augmentation agent for speech

Arxiv

0+阅读 · 2022年2月19日

Listing Maximal k-Plexes in Large Real-World Graphs

Arxiv

0+阅读 · 2022年2月19日

Using Pilot Data to Size Observational Studies for the Estimation of Dynamic Treatment Regimes

Arxiv

0+阅读 · 2022年2月18日

Preferential Sampling for Bivariate Spatial Data

Preferential Sampling for Bivariate Spatial Data

Arxiv

0+阅读 · 2022年2月18日

Triangulating Instrumental Variable, confounder adjustment and Difference-in-Difference methods for comparative effectiveness research in observational data

Triangulating Instrumental Variable, confounder adjustment and Difference-in-Difference methods for comparative effectiveness research in observational data

Arxiv

0+阅读 · 2022年2月18日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Learning Causal Semantic Representation for Out-of-Distribution Prediction

Learning Causal Semantic Representation for Out-of-Distribution Prediction

Arxiv

7+阅读 · 2021年6月16日

Contrastive Learning with Hard Negative Samples

Arxiv

7+阅读 · 2020年10月9日

VIP会员

文章信息

相关主题

相关VIP内容

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

维多利亚运输政策研究所“Autonomous Vehicle Implementation Predictions：Implications for Transport Planning”（自动驾驶汽车实施预测:对交通规划的影响）

专知会员服务

17+阅读 · 2022年2月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

人工智能如何用于抵抗COVID-19？Mila这份《AI against COVID-19 》PPT

专知会员服务

48+阅读 · 2020年5月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《"半人马"训练计划：美国陆军北方司令部兵棋推演与Scale AI系统集成》最新报告

《飞行自组织网络通信协议评估体系：三维高斯-马尔科夫移动模型的创新升级》172页

地面无人作战平台：现代战争中的机器士兵

《面向边缘智能应用的AI模型优化技术研究》139页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

人工智能 | NIPS 2019等国际会议信息8条

人工智能 | NIPS 2019等国际会议信息8条

Call4Papers

7+阅读 · 2019年3月21日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

人工智能 | 国际会议信息6条

人工智能 | 国际会议信息6条

Call4Papers

5+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

已删除

将门创投

9+阅读 · 2018年12月19日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

相关论文

Accuracy Gains from Privacy Amplification Through Sampling for Differential Privacy

Accuracy Gains from Privacy Amplification Through Sampling for Differential Privacy

Arxiv

0+阅读 · 2022年2月21日

Smooth multi-period forecasting with application to prediction of COVID-19 cases

Arxiv

0+阅读 · 2022年2月20日

ImportantAug: a data augmentation agent for speech

Arxiv

0+阅读 · 2022年2月19日

Listing Maximal k-Plexes in Large Real-World Graphs

Arxiv

0+阅读 · 2022年2月19日

Using Pilot Data to Size Observational Studies for the Estimation of Dynamic Treatment Regimes

Arxiv

0+阅读 · 2022年2月18日

Preferential Sampling for Bivariate Spatial Data

Preferential Sampling for Bivariate Spatial Data

Arxiv

0+阅读 · 2022年2月18日

Triangulating Instrumental Variable, confounder adjustment and Difference-in-Difference methods for comparative effectiveness research in observational data

Triangulating Instrumental Variable, confounder adjustment and Difference-in-Difference methods for comparative effectiveness research in observational data

Arxiv

0+阅读 · 2022年2月18日

Imitation by Predicting Observations

Imitation by Predicting Observations

Arxiv

4+阅读 · 2021年7月8日

Learning Causal Semantic Representation for Out-of-Distribution Prediction

Learning Causal Semantic Representation for Out-of-Distribution Prediction

Arxiv

7+阅读 · 2021年6月16日

Contrastive Learning with Hard Negative Samples

Arxiv

7+阅读 · 2020年10月9日

微信扫码咨询专知VIP会员