数据运行和神经测量法:基于分数的算法的基本限制 (Data pruning and neural scaling laws: fundamental limitations of score-based algorithms) - 专知论文

会员服务 ·

0

剪枝 · 缩放 · 可约的 · Performer · Less ·

2023 年 2 月 14 日

Data pruning and neural scaling laws: fundamental limitations of score-based algorithms

翻译：数据运行和神经测量法:基于分数的算法的基本限制

Fadhel Ayed,Soufiane Hayou

Data pruning algorithms are commonly used to reduce the memory and computational cost of the optimization process. Recent empirical results reveal that random data pruning remains a strong baseline and outperforms most existing data pruning methods in the high compression regime, i.e., where a fraction of $30\%$ or less of the data is kept. This regime has recently attracted a lot of interest as a result of the role of data pruning in improving the so-called neural scaling laws; in [Sorscher et al.], the authors showed the need for high-quality data pruning algorithms in order to beat the sample power law. In this work, we focus on score-based data pruning algorithms and show theoretically and empirically why such algorithms fail in the high compression regime. We demonstrate ``No Free Lunch" theorems for data pruning and present calibration protocols that enhance the performance of existing pruning algorithms in this high compression regime using randomization.

翻译：数据运行算法通常用于减少优化过程的内存和计算成本。最近的实证结果表明,随机数据运行算法依然是一个强大的基线,并且超过了高压缩机制中大多数现有的数据运行方法,也就是说,在高压缩制度中保留了30美元或以下的一小部分数据。由于数据运行算法在改进所谓的神经缩放法方面的作用,这一系统最近吸引了许多人的兴趣;在[Sorsecher 等人]中,作者表明需要高质量的数据运行算法,以击败抽样权力法。在这项工作中,我们侧重于基于分数的数据运行算法,并用理论和经验来显示在高压缩制度中这种算法失败的原因。我们展示了数据运行“无免费午餐”的标语,并展示了校准协议,用随机化方法提高当前高压缩制度中使用高压缩法的运行算法的性能。

0

相关内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

石墨烯/笼状聚合物纳微尺度协同限硫复合正极材料的原位构筑及电化学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

石墨烯、石墨烯无机复合物三维构架的构筑及其在超级电容器中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

无机石墨烯：计算设计、性能预测与应用探索

国家自然科学基金

0+阅读 · 2012年12月31日

多孔钒基锂离子电池电极材料的可控制备及性能

国家自然科学基金

0+阅读 · 2012年12月31日

限制性定理、谱乘子及其相关问题的研究

国家自然科学基金

1+阅读 · 2012年12月31日

量子点敏化半导体/石墨烯复合纳米材料的光催化特性及机理

国家自然科学基金

0+阅读 · 2012年12月31日

石墨烯纳米结构中的热控制以及热电性质的研究

国家自然科学基金

0+阅读 · 2012年12月31日

聚合物/富勒烯太阳能电池光活性层形貌的超分子调控及稳定化

国家自然科学基金

0+阅读 · 2011年12月31日

增强现实中多目标3D跟踪定位和WH-SIFT特征识别方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

Causal Discovery with Score Matching on Additive Models with Arbitrary Noise

Causal Discovery with Score Matching on Additive Models with Arbitrary Noise

Arxiv

0+阅读 · 2023年4月6日

From exemplar to copy: the scribal appropriation of a Hadewijch manuscript computationally explored

Arxiv

0+阅读 · 2023年4月6日

Online metric algorithms with untrusted predictions

Arxiv

0+阅读 · 2023年4月6日

Pruning Deep Neural Networks from a Sparsity Perspective

Arxiv

0+阅读 · 2023年4月6日

NTK-SAP: Improving neural network pruning by aligning training dynamics

Arxiv

0+阅读 · 2023年4月6日

Doubly Stochastic Matrix Models for Estimation of Distribution Algorithms

Arxiv

0+阅读 · 2023年4月5日

A step towards the applicability of algorithms based on invariant causal learning on observational data

Arxiv

0+阅读 · 2023年4月5日

Optimizing data-flow in Binary Neural Networks

Arxiv

0+阅读 · 2023年4月3日

A Revenue Function for Comparison-Based Hierarchical Clustering

Arxiv

0+阅读 · 2023年4月2日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

126+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

UC.Berkeley CS189讲义教材:《机器学习全面指南》，185页pdf

专知会员服务

162+阅读 · 2020年1月16日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

246+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津大学博士论文】将序列结构与几何结构融入深度神经网络

工程视角：影响战争进程的小型无人机

企业级AI应用开发：从技术选型到生产落地

AI生成代码缺陷综述

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

Causal Discovery with Score Matching on Additive Models with Arbitrary Noise

Causal Discovery with Score Matching on Additive Models with Arbitrary Noise

Arxiv

0+阅读 · 2023年4月6日

From exemplar to copy: the scribal appropriation of a Hadewijch manuscript computationally explored

Arxiv

0+阅读 · 2023年4月6日

Online metric algorithms with untrusted predictions

Arxiv

0+阅读 · 2023年4月6日

Pruning Deep Neural Networks from a Sparsity Perspective

Arxiv

0+阅读 · 2023年4月6日

NTK-SAP: Improving neural network pruning by aligning training dynamics

Arxiv

0+阅读 · 2023年4月6日

Doubly Stochastic Matrix Models for Estimation of Distribution Algorithms

Arxiv

0+阅读 · 2023年4月5日

A step towards the applicability of algorithms based on invariant causal learning on observational data

Arxiv

0+阅读 · 2023年4月5日

Optimizing data-flow in Binary Neural Networks

Arxiv

0+阅读 · 2023年4月3日

A Revenue Function for Comparison-Based Hierarchical Clustering

Arxiv

0+阅读 · 2023年4月2日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

相关基金

石墨烯/笼状聚合物纳微尺度协同限硫复合正极材料的原位构筑及电化学性能研究

国家自然科学基金

0+阅读 · 2015年12月31日

石墨烯、石墨烯无机复合物三维构架的构筑及其在超级电容器中的应用

国家自然科学基金

0+阅读 · 2014年12月31日

无机石墨烯：计算设计、性能预测与应用探索

国家自然科学基金

0+阅读 · 2012年12月31日

多孔钒基锂离子电池电极材料的可控制备及性能

国家自然科学基金

0+阅读 · 2012年12月31日

限制性定理、谱乘子及其相关问题的研究

国家自然科学基金

1+阅读 · 2012年12月31日

量子点敏化半导体/石墨烯复合纳米材料的光催化特性及机理

国家自然科学基金

0+阅读 · 2012年12月31日

石墨烯纳米结构中的热控制以及热电性质的研究

国家自然科学基金

0+阅读 · 2012年12月31日

聚合物/富勒烯太阳能电池光活性层形貌的超分子调控及稳定化

国家自然科学基金

0+阅读 · 2011年12月31日

增强现实中多目标3D跟踪定位和WH-SIFT特征识别方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

UGT基因簇进化及调控研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员