研究无需再搜：最大化更新参数化实现跨尺度准确损失预测 (Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales) - 专知论文

会员服务 ·

0

参数化 · 跨尺度 · 小模型 · 损失 · 超参数 ·

2023 年 4 月 14 日

Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales

翻译：研究无需再搜：最大化更新参数化实现跨尺度准确损失预测

Yiqun Yao,Yequan Wang

As language models scale up, it becomes increasingly expensive to verify research ideas because conclusions on small models do not trivially transfer to large ones. A possible solution is to establish a generic system that directly predicts some metrics for large models solely based on the results and hyperparameters from small models. Existing methods based on scaling laws require hyperparameter search on the largest models, which is impractical with limited resources. We address this issue by presenting our discoveries indicating that Maximal Update parametrization (muP) enables accurate fitting of scaling laws for hyperparameters close to common loss basins, without any search. Thus, different models can be directly compared on large scales with loss prediction even before the training starts. We propose a new paradigm as a first step towards reliable academic research for any model scale without heavy computation. Code will be publicly available shortly.

翻译：随着语言模型的扩展，验证研究想法变得越来越昂贵，因为小模型上的结论不能轻易地转移到大模型上。一种可能的解决方案是建立一个通用系统，仅基于小模型的结果和超参数直接预测大模型的某些度量标准。现有的基于比例律的方法需要在最大的模型上进行超参数搜索，这在资源有限的情况下是不切实际的。我们通过提出我们的发现，表明最大更新参数化（muP）可以在接近常见损失深渊的超参数的情况下，准确拟合比例律，从而解决了这个问题。因此，不同的模型甚至在训练开始前就可以使用损失预测在大规模上进行直接比较。我们提出了一种新的范式，作为实现大规模任何模型规模可靠学术研究的第一步，而无需进行重计算。代码即将公开发布。

0

相关内容

参数化

【NeurIPS2022】基于最大熵编码的自监督学习

【NeurIPS2022】基于最大熵编码的自监督学习

专知会员服务

27+阅读 · 2022年10月23日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【CVPR2021】深度稳定学习分布外泛化

专知会员服务

30+阅读 · 2021年5月20日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

专知会员服务

13+阅读 · 2020年3月27日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【斯坦福大学ICLR2020】无任务的持续元学习，Continue Meta-learning without tasks

【斯坦福大学ICLR2020】无任务的持续元学习，Continue Meta-learning without tasks

专知会员服务

16+阅读 · 2019年12月18日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

度量学习中的pair-based loss

度量学习中的pair-based loss

极市平台

65+阅读 · 2019年7月17日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

深空通信中的自适应容错图像编码器实现方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于波长分幅和参量放大的超快多幅实时成像技术的研究

国家自然科学基金

0+阅读 · 2013年12月31日

GNSS自适应阵列天线引入测量误差的补偿技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于高通量组学方法研究Aurora-A对肿瘤干细胞干性的调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

非监督高光谱图像实时目标探测方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

肝星状细胞诱导生成调节T细胞在肝纤维化机制中的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于MAP的低复杂度LDPC译码算法理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

多天线OFDM信道全信息压缩估计理论与方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于优化理论的变几何水轮机的流动机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

αctinin 4介导NHERF1调节细胞微丝骨架及其对肿瘤细胞黏附与迁移的影响

国家自然科学基金

0+阅读 · 2011年12月31日

Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

Arxiv

0+阅读 · 2023年5月31日

When Does Optimizing a Proper Loss Yield Calibration?

Arxiv

0+阅读 · 2023年5月30日

Neural Network-based CUSUM for Online Change-point Detection

Arxiv

0+阅读 · 2023年5月30日

Exploring Self-Attention Mechanisms for Speech Separation

Arxiv

0+阅读 · 2023年5月27日

A physics-based reduced order model for urban air pollution prediction

Arxiv

0+阅读 · 2023年5月26日

Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis

Arxiv

0+阅读 · 2023年5月26日

Augmented Large Language Models with Parametric Knowledge Guiding

Arxiv

20+阅读 · 2023年5月8日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection

Arxiv

20+阅读 · 2019年10月25日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

VIP会员

文章信息

相关主题

相关VIP内容

【NeurIPS2022】基于最大熵编码的自监督学习

【NeurIPS2022】基于最大熵编码的自监督学习

专知会员服务

27+阅读 · 2022年10月23日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【CVPR2021】深度稳定学习分布外泛化

专知会员服务

30+阅读 · 2021年5月20日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

【CVPR2020-亚马逊】后向兼容表示学习，BackwardCompatible RepresentationLearning

专知会员服务

13+阅读 · 2020年3月27日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【斯坦福大学ICLR2020】无任务的持续元学习，Continue Meta-learning without tasks

【斯坦福大学ICLR2020】无任务的持续元学习，Continue Meta-learning without tasks

专知会员服务

16+阅读 · 2019年12月18日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新书册《几何深度学习的数学基础》

中程单向攻击无人机的战略意义：俄乌战争启示

在无标注条件下适配视觉—语言模型：全面综述

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

相关资讯

GNN 新基准！Long Range Graph Benchmark

GNN 新基准！Long Range Graph Benchmark

图与推荐

0+阅读 · 2022年10月18日

度量学习中的pair-based loss

度量学习中的pair-based loss

极市平台

65+阅读 · 2019年7月17日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

【论文推荐】最新5篇度量学习（Metric Learning）相关论文—人脸验证、BIER、自适应图卷积、注意力机制、单次学习

专知

17+阅读 · 2018年2月11日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory

Arxiv

0+阅读 · 2023年5月31日

When Does Optimizing a Proper Loss Yield Calibration?

Arxiv

0+阅读 · 2023年5月30日

Neural Network-based CUSUM for Online Change-point Detection

Arxiv

0+阅读 · 2023年5月30日

Exploring Self-Attention Mechanisms for Speech Separation

Arxiv

0+阅读 · 2023年5月27日

A physics-based reduced order model for urban air pollution prediction

Arxiv

0+阅读 · 2023年5月26日

Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis

Arxiv

0+阅读 · 2023年5月26日

Augmented Large Language Models with Parametric Knowledge Guiding

Arxiv

20+阅读 · 2023年5月8日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Meta-Learning with Dynamic-Memory-Based Prototypical Network for Few-Shot Event Detection

Arxiv

20+阅读 · 2019年10月25日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

相关基金

深空通信中的自适应容错图像编码器实现方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于波长分幅和参量放大的超快多幅实时成像技术的研究

国家自然科学基金

0+阅读 · 2013年12月31日

GNSS自适应阵列天线引入测量误差的补偿技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于高通量组学方法研究Aurora-A对肿瘤干细胞干性的调控机制

国家自然科学基金

0+阅读 · 2013年12月31日

非监督高光谱图像实时目标探测方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

肝星状细胞诱导生成调节T细胞在肝纤维化机制中的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于MAP的低复杂度LDPC译码算法理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

多天线OFDM信道全信息压缩估计理论与方法

国家自然科学基金

0+阅读 · 2011年12月31日

基于优化理论的变几何水轮机的流动机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

αctinin 4介导NHERF1调节细胞微丝骨架及其对肿瘤细胞黏附与迁移的影响

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员