M-FAC: 高效的无矩阵式第二顺序信息接近 (M-FAC: Efficient Matrix-Free Approximations of Second-Order Information) - 专知论文

会员服务 ·

0

INFORMS · 近似 · 优化器 · 估计/估计量 · 滑动窗口 ·

2021 年 11 月 18 日

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

翻译：M-FAC: 高效的无矩阵式第二顺序信息接近

Elias Frantar,Eldar Kurtic,Dan Alistarh

from arxiv, Accepted to NeurIPS 2021

Efficiently approximating local curvature information of the loss function is a key tool for optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational or storage costs, which can limit their practicality. In this work, we investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. We propose two new algorithms as part of a framework called M-FAC: the first algorithm is tailored towards network compression and can compute the IHVP for dimension $d$, if the Hessian is given as a sum of $m$ rank-one matrices, using $O(dm^2)$ precomputation, $O(dm)$ cost for computing the IHVP, and query cost $O(m)$ for any single element of the inverse Hessian. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction, as required for preconditioned SGD. We give an algorithm with cost $O(dm + m^2)$ for computing the IHVP and $O(dm + m^3)$ for adding or removing any gradient from the sliding window. These two algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods. Implementations are available at [9] and [17].

翻译：高效地接近损失函数的本地曲线信息是优化和压缩深层神经网络的关键工具。然而,大多数现有的近似二阶信息的方法都具有高计算或存储成本,这可能会限制其实用性。在这项工作中,我们调查了用于估算反赫西西亚矢量产品(IHVP)的无矩阵、线性时间方法,当赫西安人可以被比作一级矩阵的总和时,就像经验化渔业矩阵对赫西亚人的典型快速缩略图一样。我们建议了两个新的算法,作为称为M-FAC的框架的一部分:第一个算法是针对网络压缩的,并且可以按维特价计算IHVP的平价,如果赫西安人以一级矩阵总和美元计算,那么计算IHVP的第二位成本,然后用我们所希望的任何单元素在 Heseria-rorma 的平流成本。

1

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【论文推荐】逆问题，深度学习，对称性破缺，Inverse Problems, Deep Learning, and Symmetry Breaking

【论文推荐】逆问题，深度学习，对称性破缺，Inverse Problems, Deep Learning, and Symmetry Breaking

专知会员服务

26+阅读 · 2020年3月27日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

Graph Neural Network（GNN）最全资源整理分享

Graph Neural Network（GNN）最全资源整理分享

深度学习与NLP

339+阅读 · 2019年7月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

专知

21+阅读 · 2018年6月18日

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

专知

7+阅读 · 2018年2月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Faster ILP Algorithms for Problems with Sparse Matrices and Their Applications to Multipacking and Multicover Problems in Graphs and Hypergraphs

Arxiv

0+阅读 · 2022年1月22日

Approximating the discrete and continuous median line segments in $d$ dimensions

Arxiv

0+阅读 · 2022年1月21日

On the Satisfaction Probability of $k$-CNF Formulas

Arxiv

0+阅读 · 2022年1月21日

Asymptotic approximation of the likelihood of stationary determinantal point processes

Asymptotic approximation of the likelihood of stationary determinantal point processes

Arxiv

0+阅读 · 2022年1月21日

Dictionary-based Low-Rank Approximations and the Mixed Sparse Coding problem

Arxiv

0+阅读 · 2022年1月21日

Scalable Sampling for Nonsymmetric Determinantal Point Processes

Arxiv

0+阅读 · 2022年1月20日

Matrix Decomposition and Applications

Arxiv

54+阅读 · 2022年1月1日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Practical sketching algorithms for low-rank matrix approximation

Arxiv

4+阅读 · 2018年1月2日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

【经典书】线性代数，436页pdf

专知会员服务

78+阅读 · 2021年3月16日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

【论文推荐】逆问题，深度学习，对称性破缺，Inverse Problems, Deep Learning, and Symmetry Breaking

【论文推荐】逆问题，深度学习，对称性破缺，Inverse Problems, Deep Learning, and Symmetry Breaking

专知会员服务

26+阅读 · 2020年3月27日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

唯快不破：大型语言模型高效架构综述

光纤无人机：反无人机系统的重大挑战

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

相关资讯

Graph Neural Network（GNN）最全资源整理分享

Graph Neural Network（GNN）最全资源整理分享

深度学习与NLP

339+阅读 · 2019年7月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【SIGIR2018】五篇对抗训练文章

【SIGIR2018】五篇对抗训练文章

专知

12+阅读 · 2018年7月9日

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

【论文推荐】最新十篇推荐系统相关论文—内容感知、图卷积神经网络、博弈论、个性化排序、元学习、xDeepFM

专知

21+阅读 · 2018年6月18日

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

【论文推荐】最新5篇网络节点表示（Network Embedding）相关论文—高阶网络、矩阵分解、多视角、虚拟网络、云计算

专知

7+阅读 · 2018年2月9日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Faster ILP Algorithms for Problems with Sparse Matrices and Their Applications to Multipacking and Multicover Problems in Graphs and Hypergraphs

Arxiv

0+阅读 · 2022年1月22日

Approximating the discrete and continuous median line segments in $d$ dimensions

Arxiv

0+阅读 · 2022年1月21日

On the Satisfaction Probability of $k$-CNF Formulas

Arxiv

0+阅读 · 2022年1月21日

Asymptotic approximation of the likelihood of stationary determinantal point processes

Asymptotic approximation of the likelihood of stationary determinantal point processes

Arxiv

0+阅读 · 2022年1月21日

Dictionary-based Low-Rank Approximations and the Mixed Sparse Coding problem

Arxiv

0+阅读 · 2022年1月21日

Scalable Sampling for Nonsymmetric Determinantal Point Processes

Arxiv

0+阅读 · 2022年1月20日

Matrix Decomposition and Applications

Arxiv

54+阅读 · 2022年1月1日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Practical sketching algorithms for low-rank matrix approximation

Arxiv

4+阅读 · 2018年1月2日

Optimal Algorithms for Distributed Optimization

Arxiv

3+阅读 · 2017年12月1日

微信扫码咨询专知VIP会员