通用线性预测器的模型规模、测试损失和训练损失之间的权衡 (A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors) - 专知论文

会员服务 ·

0

线性预测 · 损失 · 训练数据 · 渐近分析 · 白化 ·

2023 年 4 月 18 日

A Universal Trade-off Between the Model Size, Test Loss, and Training Loss of Linear Predictors

翻译：通用线性预测器的模型规模、测试损失和训练损失之间的权衡

Nikhil Ghosh,Mikhail Belkin

from arxiv, Further polished writing

In this work we establish an algorithm and distribution independent non-asymptotic trade-off between the model size, excess test loss, and training loss of linear predictors. Specifically, we show that models that perform well on the test data (have low excess loss) are either "classical" -- have training loss close to the noise level, or are "modern" -- have a much larger number of parameters compared to the minimum needed to fit the training data exactly. We also provide a more precise asymptotic analysis when the limiting spectral distribution of the whitened features is Marchenko-Pastur. Remarkably, while the Marchenko-Pastur analysis is far more precise near the interpolation peak, where the number of parameters is just enough to fit the training data, it coincides exactly with the distribution independent bound as the level of overparametrization increases.

翻译：在这项工作中，我们建立了一个与算法和分布无关的非渐进权衡，它涉及线性预测器的模型规模、过量的测试损失和训练损失。具体而言，我们表明在测试数据上表现良好的模型（具有较低的过量损失）要么是“传统的”——它们的训练损失接近于噪声级别，要么是“现代的”——它们的参数数目比最小的恰好适合训练数据的参数数目要大得多。当白化特征的极限谱分布是Marchenko-Pastur分布时，我们还提供了更精确的渐近分析。值得注意的是，当过参数化的水平增加时，虽然Marchenko-Pastur分析在插值峰附近更加精确，其中参数数目正好足够拟合训练数据，但它与分布无关的上限完全一致。

0

相关内容

线性预测

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

斯坦福博士提出超快省显存Attention，GPT-2训练速度提升3.5倍，BERT速度创纪录

斯坦福博士提出超快省显存Attention，GPT-2训练速度提升3.5倍，BERT速度创纪录

量子位

1+阅读 · 2022年6月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

高维回归模型的预测稳定性研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于Realized GARCH框架的波动率和相关性模型理论和应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

汽车传动系统零部件动态可靠性敏感度及随机参数相关性判别研究

国家自然科学基金

0+阅读 · 2012年12月31日

MIMO认知无线电系统的最优线性联合收发机设计的统一框架研究

国家自然科学基金

0+阅读 · 2012年12月31日

框架的冗余度

国家自然科学基金

0+阅读 · 2012年12月31日

多个体集群运动中信息共享引致对称破缺的模型与实验

国家自然科学基金

0+阅读 · 2012年12月31日

基于参数和半参数回归模型的小区域估计问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

α混合样本下的经验Bayes推断

国家自然科学基金

0+阅读 · 2012年12月31日

灵武长枣贮藏过程中电物性变化的生物电学机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

可压Navier-Stokes方程及相关流体动力学方程研究

国家自然科学基金

0+阅读 · 2008年12月31日

Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Arxiv

0+阅读 · 2023年6月5日

Active Ranking of Experts Based on their Performances in Many Tasks

Arxiv

0+阅读 · 2023年6月5日

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Arxiv

0+阅读 · 2023年6月5日

Robust Collaborative Learning with Linear Gradient Overhead

Arxiv

0+阅读 · 2023年6月3日

Uniform Convergence of Deep Neural Networks with Lipschitz Continuous Activation Functions and Variable Widths

Arxiv

0+阅读 · 2023年6月2日

On the Reduction in Accuracy of Finite Difference Schemes on Manifolds without Boundary

Arxiv

0+阅读 · 2023年6月2日

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Arxiv

0+阅读 · 2023年6月1日

Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss

Arxiv

0+阅读 · 2023年6月1日

A Robust Permutation Test for Subvector Inference in Linear Regressions

Arxiv

0+阅读 · 2023年6月1日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

WWW21最新「比较学习」教程，135页PPT阐述从排名数据中学习

专知会员服务

37+阅读 · 2021年4月27日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

【Google论文】ALBERT:自我监督学习语言表达的精简BERT

专知会员服务

24+阅读 · 2019年11月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《物联网（IoT）中的无人机通信高效控制》135页

《在GNSS信号降级环境中利用共识实现无人机集群稳健协调》

中程单向攻击无人机的战略意义：俄乌战争启示

《面向无人机集群的避障动态传感器覆盖算法》最新38页

相关资讯

斯坦福博士提出超快省显存Attention，GPT-2训练速度提升3.5倍，BERT速度创纪录

斯坦福博士提出超快省显存Attention，GPT-2训练速度提升3.5倍，BERT速度创纪录

量子位

1+阅读 · 2022年6月8日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

相关论文

Efficient GPT Model Pre-training using Tensor Train Matrix Representation

Arxiv

0+阅读 · 2023年6月5日

Active Ranking of Experts Based on their Performances in Many Tasks

Arxiv

0+阅读 · 2023年6月5日

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Arxiv

0+阅读 · 2023年6月5日

Robust Collaborative Learning with Linear Gradient Overhead

Arxiv

0+阅读 · 2023年6月3日

Uniform Convergence of Deep Neural Networks with Lipschitz Continuous Activation Functions and Variable Widths

Arxiv

0+阅读 · 2023年6月2日

On the Reduction in Accuracy of Finite Difference Schemes on Manifolds without Boundary

Arxiv

0+阅读 · 2023年6月2日

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Arxiv

0+阅读 · 2023年6月1日

Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss

Arxiv

0+阅读 · 2023年6月1日

A Robust Permutation Test for Subvector Inference in Linear Regressions

Arxiv

0+阅读 · 2023年6月1日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

相关基金

高维回归模型的预测稳定性研究

国家自然科学基金

3+阅读 · 2015年12月31日

基于Realized GARCH框架的波动率和相关性模型理论和应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

汽车传动系统零部件动态可靠性敏感度及随机参数相关性判别研究

国家自然科学基金

0+阅读 · 2012年12月31日

MIMO认知无线电系统的最优线性联合收发机设计的统一框架研究

国家自然科学基金

0+阅读 · 2012年12月31日

框架的冗余度

国家自然科学基金

0+阅读 · 2012年12月31日

多个体集群运动中信息共享引致对称破缺的模型与实验

国家自然科学基金

0+阅读 · 2012年12月31日

基于参数和半参数回归模型的小区域估计问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

α混合样本下的经验Bayes推断

国家自然科学基金

0+阅读 · 2012年12月31日

灵武长枣贮藏过程中电物性变化的生物电学机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

可压Navier-Stokes方程及相关流体动力学方程研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员