重新采用直向模拟器作为存储二进制网络的主要方法 (Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks) - 专知论文

会员服务 ·

0

估计/估计量 · binary · Networking · 离散化 · Weight ·

2021 年 2 月 2 日

Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks

翻译：重新采用直向模拟器作为存储二进制网络的主要方法

Alexander Shekhovtsov,Viktor Yanush

from arxiv, 30 pages, ICLR version (rejected)

Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights. Many successful experimental results have been achieved with empirical straight-through (ST) approaches, proposing a variety of ad-hoc rules for propagating gradients through non-differentiable activations and updating discrete weights. At the same time, ST methods can be truly derived as estimators in the stochastic binary network (SBN) model with Bernoulli weights. We advance these derivations to a more complete and systematic study. We analyze properties, estimation accuracy, obtain different forms of correct ST estimators for activations and weights, explain existing empirical approaches and their shortcomings, explain how latent weights arise from the mirror descent method when optimizing over probabilities. This allows to reintroduce, once empirical, ST methods as sound approximations, apply them with clarity and develop further improvements.

翻译：由于缺乏梯度和对离散重量进行优化的困难,培训具有二进制重量和活化作用的神经网络是一个具有挑战性的问题。许多成功的实验成果都是通过实证直通(ST)方法取得的,为通过非差别性活化来传播梯度提出了各种特别规则,并更新了离散重量。与此同时,可以真正将ST方法作为Stochatic二进制网络(SBN)模型中带有Bernoulli重量的估测器。我们将这些推向更完整和系统的研究中。我们分析特性、估计准确性、获得不同形式的正确的ST激励和重量估计器、解释现有的实证方法及其缺点、解释在优化概率超过概率时如何从镜像下沉方法中产生潜在重量。这允许在实验性后将ST方法作为声音近似值进行重新引入,以清晰地应用这些方法,并进一步发展这些方法。

0

相关内容

估计/估计量

估计/估计量

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

多伦多大学最新《机器学习导论》课程，Introduction to Machine Learning

多伦多大学最新《机器学习导论》课程，Introduction to Machine Learning

专知会员服务

25+阅读 · 2020年9月24日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

123+阅读 · 2020年5月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知

21+阅读 · 2020年5月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach

Arxiv

0+阅读 · 2021年3月29日

Smooth Online Parameter Estimation for time varying VAR models with application to rat's LFP data

Arxiv

0+阅读 · 2021年3月26日

Provably Correct Controller Synthesis of Switched Stochastic Systems with Metric Temporal Logic Specifications: A Case Study on Power Systems

Arxiv

0+阅读 · 2021年3月26日

On the Time Discretization of the Feynman-Kac Forward-Backward Stochastic Differential Equations for Value Function Approximation

Arxiv

0+阅读 · 2021年3月26日

Smoothing methods to estimate the hazard rate under double truncation

Arxiv

0+阅读 · 2021年3月25日

Near-optimal approximation methods for elliptic PDEs with lognormal coefficients

Arxiv

0+阅读 · 2021年3月25日

Stochastic Potential Games

Arxiv

0+阅读 · 2021年3月24日

Fast and Accurate Estimation of Non-Nested Binomial Hierarchical Models Using Variational Inference

Arxiv

0+阅读 · 2021年3月24日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

VIP会员

文章信息

相关主题

估计/估计量

相关VIP内容

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

多伦多大学最新《机器学习导论》课程，Introduction to Machine Learning

多伦多大学最新《机器学习导论》课程，Introduction to Machine Learning

专知会员服务

25+阅读 · 2020年9月24日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

123+阅读 · 2020年5月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【新书】Python编程基础，669页pdf

【新书】Python编程基础，669页pdf

专知会员服务

197+阅读 · 2019年10月10日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】《知识图谱与大语言模型的协同应用》，544页pdf

军事通信系统：安全行动的支柱

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

【新书】机器学习系统，2620页pdf

相关资讯

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知

21+阅读 · 2020年5月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Causal Inference Under Unmeasured Confounding With Negative Controls: A Minimax Learning Approach

Arxiv

0+阅读 · 2021年3月29日

Smooth Online Parameter Estimation for time varying VAR models with application to rat's LFP data

Arxiv

0+阅读 · 2021年3月26日

Provably Correct Controller Synthesis of Switched Stochastic Systems with Metric Temporal Logic Specifications: A Case Study on Power Systems

Arxiv

0+阅读 · 2021年3月26日

On the Time Discretization of the Feynman-Kac Forward-Backward Stochastic Differential Equations for Value Function Approximation

Arxiv

0+阅读 · 2021年3月26日

Smoothing methods to estimate the hazard rate under double truncation

Arxiv

0+阅读 · 2021年3月25日

Near-optimal approximation methods for elliptic PDEs with lognormal coefficients

Arxiv

0+阅读 · 2021年3月25日

Stochastic Potential Games

Arxiv

0+阅读 · 2021年3月24日

Fast and Accurate Estimation of Non-Nested Binomial Hierarchical Models Using Variational Inference

Arxiv

0+阅读 · 2021年3月24日

The Causal Learning of Retail Delinquency

Arxiv

15+阅读 · 2020年12月17日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

微信扫码咨询专知VIP会员