数据质量对机器学习绩效的影响 (The Effects of Data Quality on Machine Learning Performance) - 专知论文

会员服务 ·

0

Performer · Learning · 测试数据 · Machine Learning · Extensibility ·

2022 年 8 月 29 日

The Effects of Data Quality on Machine Learning Performance

翻译：数据质量对机器学习绩效的影响

Lukas Budach,Moritz Feuerpfeil,Nina Ihde,Andrea Nathansen,Nele Noack,Hendrik Patzlaff,Hazar Harmouch,Felix Naumann

Modern artificial intelligence (AI) applications require large quantities of training and test data. This need creates critical challenges not only concerning the availability of such data, but also regarding its quality. For example, incomplete, erroneous or inappropriate training data can lead to unreliable models that produce ultimately poor decisions. Trustworthy AI applications require high-quality training and test data along many dimensions, such as accuracy, completeness, consistency, and uniformity. We explore empirically the relationship between six of the traditional data quality dimensions and the performance of fifteen widely used machine learning (ML) algorithms covering the tasks of classification, regression, and clustering, with the goal of explaining their performance in terms of data quality. Our experiments distinguish three scenarios based on the AI pipeline steps that were fed with polluted data: polluted training data, test data, or both. We conclude the paper with an extensive discussion of our observations.

翻译：现代人工智能(AI)应用需要大量的培训和测试数据,这种需要不仅在提供这类数据方面,而且在数据质量方面都构成重大挑战,例如,不完整、错误或不当的培训数据可能导致模型不可靠,最终导致决策不力。可靠的AI应用要求从许多方面进行高质量的培训和测试数据,如准确性、完整性、一致性和统一性。我们从经验上探讨传统数据质量的六个方面与15种广泛使用的机器学习算法的运作情况之间的关系,这些算法涵盖分类、回归和集中等任务,目的是解释它们在数据质量方面的性能。我们的实验根据AI编审中的步骤区分了三种情况,即污染的培训数据、测试数据或两者。我们通过广泛讨论我们的意见来完成这份文件。

0

相关内容

Performer

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

深部充填开采留巷球应力壳与偏应力场演化及协同控制

国家自然科学基金

0+阅读 · 2015年12月31日

模型驱动的移动应用测试方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

BAG3与MACC1相互作用在甲状腺癌细胞上皮间质转化(EMT) 及侵袭中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Newmark算法在复杂色散介质FDTD分析中的应用和多离子磁等离子体FDTD算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

海水入侵问题的并行子空间校正算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

磁性纳米粒蓄积诱导ROS激活自噬杀伤胶质瘤细胞的机制初探

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

介尺度磁性复合囊泡状结构材料的可控构筑及性能

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

气体吸附诱致应力煤岩膨胀的多尺度模型研究

国家自然科学基金

0+阅读 · 2011年12月31日

Systematic Evaluation of Predictive Fairness

Arxiv

0+阅读 · 2022年10月17日

New Secure Sparse Inner Product with Applications to Machine Learning

Arxiv

0+阅读 · 2022年10月16日

Modular machine learning-based elastoplasticity: generalization in the context of limited data

Arxiv

0+阅读 · 2022年10月15日

Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods

Arxiv

0+阅读 · 2022年10月13日

Deep Multiagent Reinforcement Learning: Challenges and Directions

Arxiv

0+阅读 · 2022年10月12日

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Arxiv

17+阅读 · 2022年5月10日

Challenges of Artificial Intelligence -- From Machine Learning and Computer Vision to Emotional Intelligence

Arxiv

19+阅读 · 2022年1月5日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

37+阅读 · 2021年8月2日

A Survey on the Explainability of Supervised Machine Learning

Arxiv

24+阅读 · 2020年11月16日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

VIP会员

文章信息

相关主题

Machine Learning

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

哥伦比亚大学最新《机器学习》课程，Fall-B 2020 (Machine Learning)

专知会员服务

39+阅读 · 2020年11月3日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争中的无人系统：新的战争方式与新兴趋势——来自前线的印象》报告

《海上自主水面船舶远程操作中心：安全可持续运行的多维度分析》

多模态大语言模型下游调优中“保持自我”的重要性

隐身自主无人水下航行器技术如何变革水下作战并重塑海军竞争

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Systematic Evaluation of Predictive Fairness

Arxiv

0+阅读 · 2022年10月17日

New Secure Sparse Inner Product with Applications to Machine Learning

Arxiv

0+阅读 · 2022年10月16日

Modular machine learning-based elastoplasticity: generalization in the context of limited data

Arxiv

0+阅读 · 2022年10月15日

Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods

Arxiv

0+阅读 · 2022年10月13日

Deep Multiagent Reinforcement Learning: Challenges and Directions

Arxiv

0+阅读 · 2022年10月12日

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey

Arxiv

17+阅读 · 2022年5月10日

Challenges of Artificial Intelligence -- From Machine Learning and Computer Vision to Emotional Intelligence

Arxiv

19+阅读 · 2022年1月5日

A Survey of Human-in-the-loop for Machine Learning

Arxiv

37+阅读 · 2021年8月2日

A Survey on the Explainability of Supervised Machine Learning

Arxiv

24+阅读 · 2020年11月16日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

相关基金

深部充填开采留巷球应力壳与偏应力场演化及协同控制

国家自然科学基金

0+阅读 · 2015年12月31日

模型驱动的移动应用测试方法研究

国家自然科学基金

1+阅读 · 2014年12月31日

BAG3与MACC1相互作用在甲状腺癌细胞上皮间质转化(EMT) 及侵袭中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

Newmark算法在复杂色散介质FDTD分析中的应用和多离子磁等离子体FDTD算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

海水入侵问题的并行子空间校正算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

磁性纳米粒蓄积诱导ROS激活自噬杀伤胶质瘤细胞的机制初探

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

介尺度磁性复合囊泡状结构材料的可控构筑及性能

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

气体吸附诱致应力煤岩膨胀的多尺度模型研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员