FFMO: 信息缺失的不健康恐惧。一种为健康模式消除误导数据的方法 (Information FOMO: The unhealthy fear of missing out on information. A method for removing misleading data for healthier models) - 专知论文

会员服务 ·

0

INFORMS · MoDELS · ML · 可约的 · 测试误差 ·

2022 年 8 月 27 日

Information FOMO: The unhealthy fear of missing out on information. A method for removing misleading data for healthier models

翻译：FFMO: 信息缺失的不健康恐惧。一种为健康模式消除误导数据的方法

Ethan Pickering,Themistoklis P. Sapsis

from arxiv, 8 pages, 6 figures

Not all data are equal. Misleading or unnecessary data can critically hinder the accuracy of Machine Learning (ML) models. When data is plentiful, misleading effects can be overcome, but in many real-world applications data is sparse and expensive to acquire. We present a method that substantially reduces the data size necessary to accurately train ML models, potentially opening the door for many new, limited-data applications in ML. Our method extracts the most informative data, while ignoring and omitting data that misleads the ML model to inferior generalization properties. Specifically, the method eliminates the phenomena of "double descent", where more data leads to worse performance. This approach brings several key features to the ML community. Notably, the method naturally converges and removes the traditional need to divide the dataset into training, testing, and validation data. Instead, the selection metric inherently assesses testing error. This ensures that key information is never wasted in testing or validation.

翻译：并非所有数据都是相等的。错误领导或不必要的数据会严重妨碍机器学习模型的准确性。当数据繁多时, 误导效应是可以克服的, 但在许多真实世界的应用数据中, 获取的数据很少, 费用昂贵。我们提出的方法是大幅降低准确培训 ML 模型所需的数据规模, 有可能为ML 中许多新的、有限数据应用打开大门。我们的方法提取了信息量最大的数据, 却忽略了数据, 从而误导 ML 模型, 进而降低一般化特性。具体地说, 该方法消除了“ 双向下降” 现象, 更多的数据导致更差的性能。这个方法为ML 社区带来了几个关键特征。值得注意的是, 该方法自然地将数据集成分为培训、测试和验证数据的传统需要。相反, 选择的参数必然评估测试错误。这确保关键信息在测试或验证过程中不会被浪费。

0

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Kartogenin预处理促进TGF-β3诱导骨髓间充质干细胞成软骨分化及阻抑骨化的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

有机金属卤化物钙钛矿量子点掺杂对全固态光伏电池性能及界面电荷行为的影响

国家自然科学基金

1+阅读 · 2014年12月31日

Long non-coding RNA MEG3分子对胶质瘤干细胞调控作用的研究

国家自然科学基金

0+阅读 · 2013年12月31日

氦在钨中扩散、融合和释放的多尺度模拟

国家自然科学基金

0+阅读 · 2013年12月31日

直径可控一维低熔点金属Sn、Bi、In、Zn纳米线阵列的构建及储锂过程机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

中空多孔SiO2纳米微球的自刻蚀制备及其亲和分离蛋白质研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin抑制糖脂毒性诱导的心肌胰岛素抵抗的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-29调控TGF-β1与COLI基因表达在增生性瘢痕形成中的作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

A Survey on Safety-Critical Driving Scenario Generation -- A Methodological Perspective

Arxiv

0+阅读 · 2022年10月17日

Confound-leakage: Confound Removal in Machine Learning Leads to Leakage

Arxiv

0+阅读 · 2022年10月17日

Supervised Learning for Coverage-Directed Test Selection in Simulation-Based Verification

Arxiv

0+阅读 · 2022年10月16日

Parameterizing and Simulating from Causal Models

Arxiv

0+阅读 · 2022年10月14日

Election of government ministers

Arxiv

0+阅读 · 2022年10月13日

How to Sift Out a Clean Data Subset in the Presence of Data Poisoning?

Arxiv

0+阅读 · 2022年10月12日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods

Arxiv

22+阅读 · 2022年4月30日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

A Survey on Safety-Critical Driving Scenario Generation -- A Methodological Perspective

Arxiv

0+阅读 · 2022年10月17日

Confound-leakage: Confound Removal in Machine Learning Leads to Leakage

Arxiv

0+阅读 · 2022年10月17日

Supervised Learning for Coverage-Directed Test Selection in Simulation-Based Verification

Arxiv

0+阅读 · 2022年10月16日

Parameterizing and Simulating from Causal Models

Arxiv

0+阅读 · 2022年10月14日

Election of government ministers

Arxiv

0+阅读 · 2022年10月13日

How to Sift Out a Clean Data Subset in the Presence of Data Poisoning?

Arxiv

0+阅读 · 2022年10月12日

A Survey of Learning on Small Data

Arxiv

19+阅读 · 2022年7月29日

Trust in Human-AI Interaction: Scoping Out Models, Measures, and Methods

Arxiv

22+阅读 · 2022年4月30日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

A Survey on Knowledge Graph-Based Recommender Systems

Arxiv

92+阅读 · 2020年2月28日

相关基金

Kartogenin预处理促进TGF-β3诱导骨髓间充质干细胞成软骨分化及阻抑骨化的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

有机金属卤化物钙钛矿量子点掺杂对全固态光伏电池性能及界面电荷行为的影响

国家自然科学基金

1+阅读 · 2014年12月31日

Long non-coding RNA MEG3分子对胶质瘤干细胞调控作用的研究

国家自然科学基金

0+阅读 · 2013年12月31日

氦在钨中扩散、融合和释放的多尺度模拟

国家自然科学基金

0+阅读 · 2013年12月31日

直径可控一维低熔点金属Sn、Bi、In、Zn纳米线阵列的构建及储锂过程机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

ING3：原发性肝癌的诊断与治疗新靶点

国家自然科学基金

0+阅读 · 2012年12月31日

中空多孔SiO2纳米微球的自刻蚀制备及其亲和分离蛋白质研究

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin抑制糖脂毒性诱导的心肌胰岛素抵抗的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-29调控TGF-β1与COLI基因表达在增生性瘢痕形成中的作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员