高效的矩阵-无二端信息接近,并应用于节制和优化 (Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization) - 专知论文

会员服务 ·

0

优化器 · 近似 · INFORMS · 剪枝 · 估计/估计量 ·

2021 年 7 月 7 日

Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization

翻译：高效的矩阵-无二端信息接近,并应用于节制和优化

Elias Frantar,Eldar Kurtic,Dan Alistarh

Efficiently approximating local curvature information of the loss function is a key tool for optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational or storage costs, which can limit their practicality. In this work, we investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. We propose two new algorithms as part of a framework called M-FAC: the first algorithm is tailored towards network compression and can compute the IHVP for dimension $d$, if the Hessian is given as a sum of $m$ rank-one matrices, using $O(dm^2)$ precomputation, $O(dm)$ cost for computing the IHVP, and query cost $O(m)$ for any single element of the inverse Hessian. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction, as required for preconditioned SGD. We give an algorithm with cost $O(dm + m^2)$ for computing the IHVP and $O(dm + m^3)$ for adding or removing any gradient from the sliding window. These two algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods. Implementations are available at [10] and [18].

翻译：有效接近损失函数的本地曲线信息( IHVP) 是优化和压缩深层神经网络的关键工具。然而, 大约二阶信息的大多数现有方法都具有高计算或存储成本, 这可能会限制其实用性。在这项工作中, 我们调查了用于估算逆向赫西西亚矢量产品( IHVP) 的无矩阵、线性时间方法, 当赫西亚人可以比近为一级矩阵的总和时, 正如经验化渔业矩阵对赫斯亚人的典型快速缩略缩缩( dm) 一样。我们提议两种新的算法, 作为称为 M- FAC 的框架的一部分: 第一种算法是针对网络压缩的, 并且可以按尺寸计算 IHVP PLO 的平价值, 使用 $( mm&2) 的预估算, 用于计算 IHVP 的第二位值成本, 以及将我们想要从 Heserian 的单个元素的平流的平流- 平流- 的平流成本。

0

相关内容

优化器

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【论文推荐】数据科学中有关矩阵方法的文献综述：A LITERATURE SURVEY OF MATRIX METHODS FOR DATASCIENCE

【论文推荐】数据科学中有关矩阵方法的文献综述：A LITERATURE SURVEY OF MATRIX METHODS FOR DATASCIENCE

专知会员服务

25+阅读 · 2019年12月19日

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

专知会员服务

16+阅读 · 2019年11月30日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

word2Vec总结

AINLP

3+阅读 · 2019年11月2日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

Arxiv

0+阅读 · 2021年9月10日

The volume-of-tube method for Gaussian random fields with inhomogeneous variance

Arxiv

0+阅读 · 2021年9月9日

A Second-Order Nonlocal Approximation for Manifold Poisson Model with Dirichlet Boundary

Arxiv

0+阅读 · 2021年9月8日

Convergence of a Jacobi-type method for the approximate orthogonal tensor diagonalization

Arxiv

0+阅读 · 2021年9月8日

The convergence of the Regula Falsi method

Arxiv

0+阅读 · 2021年9月8日

Fast approximations of pseudo-observations in the context of right-censoring and interval-censoring

Arxiv

0+阅读 · 2021年9月7日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Practical sketching algorithms for low-rank matrix approximation

Arxiv

4+阅读 · 2018年1月2日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

【KDD2020】基于矩阵和张量因子分解的高效自动机器学习搜索，Efficient AutoML Pipeline Search with Matrix and Tensor Factorization

专知会员服务

13+阅读 · 2020年6月10日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【论文推荐】数据科学中有关矩阵方法的文献综述：A LITERATURE SURVEY OF MATRIX METHODS FOR DATASCIENCE

【论文推荐】数据科学中有关矩阵方法的文献综述：A LITERATURE SURVEY OF MATRIX METHODS FOR DATASCIENCE

专知会员服务

25+阅读 · 2019年12月19日

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

【变分推断课件】Lectures on Variational Inference：Statistical Analysis of Variational Approximations（附带pdf）

专知会员服务

16+阅读 · 2019年11月30日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

word2Vec总结

AINLP

3+阅读 · 2019年11月2日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

【论文推荐】最新六篇主题模型相关论文—收敛率、大规模、深度主题建模、优化、情绪强度、广义动态主题模型

专知

11+阅读 · 2018年3月29日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

【推荐】GAN架构入门综述(资源汇总)

【推荐】GAN架构入门综述(资源汇总)

机器学习研究会

10+阅读 · 2017年9月3日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks

Arxiv

0+阅读 · 2021年9月10日

The volume-of-tube method for Gaussian random fields with inhomogeneous variance

Arxiv

0+阅读 · 2021年9月9日

A Second-Order Nonlocal Approximation for Manifold Poisson Model with Dirichlet Boundary

Arxiv

0+阅读 · 2021年9月8日

Convergence of a Jacobi-type method for the approximate orthogonal tensor diagonalization

Arxiv

0+阅读 · 2021年9月8日

The convergence of the Regula Falsi method

Arxiv

0+阅读 · 2021年9月8日

Fast approximations of pseudo-observations in the context of right-censoring and interval-censoring

Arxiv

0+阅读 · 2021年9月7日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

Testing Matrix Rank, Optimally

Arxiv

3+阅读 · 2018年10月18日

Practical sketching algorithms for low-rank matrix approximation

Arxiv

4+阅读 · 2018年1月2日

Being Robust (in High Dimensions) Can Be Practical

Arxiv

3+阅读 · 2017年12月14日

微信扫码咨询专知VIP会员