Bayesian神经网络中的图层适应节点选择:统计保障和实施细节 (Layer Adaptive Node Selection in Bayesian Neural Networks: Statistical Guarantees and Implementation Details)

Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies. Although several works have studied theoretical and numerical properties of sparse neural architectures, they have primarily focused on the edge selection. Sparsity through edge selection might be intuitively appealing; however, it does not necessarily reduce the structural complexity of a network. Instead pruning excessive nodes in each layer leads to a structurally sparse network which would have lower computational complexity and memory footprint. We propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for node selection during training. The use of spike-and-slab prior alleviates the need of an ad-hoc thresholding rule for pruning redundant nodes from a network. In addition, we adopt a variational Bayes approach to circumvent the computational challenges of traditional Markov Chain Monte Carlo (MCMC) implementation. In the context of node selection, we establish the fundamental result of variational posterior consistency together with the characterization of prior parameters. In contrast to the previous works, our theoretical development relaxes the assumptions of the equal number of nodes and uniform bounds on all network weights, thereby accommodating sparse networks with layer-dependent node structures or coefficient bounds. With a layer-wise characterization of prior inclusion probabilities, we also discuss optimal contraction rates of the variational posterior. Finally, we provide empirical evidence to substantiate that our theoretical work facilitates layer-wise optimal node recovery together with competitive predictive performance.

翻译：在大规模研究中,尽管一些作品研究了稀有神经结构的理论和数字特性,但它们主要侧重于边缘选择。通过边缘选择实现的分化可能直观地具有吸引力;然而,它不一定降低网络的结构复杂性。在每一层中运行过多的节点会导致一个结构上分散的网络,从而降低计算复杂性和记忆足迹。我们提议采用一种巴伊西亚稀释的解决方案,在培训期间选择节点。使用螺旋和悬浮高斯古斯古斯古斯古老之前的理论和数字特性,以缓解对边缘选择的需要。通过边缘选择实现的分解可能具有直线性;但是,通过边缘选择的分解可能不具有直线性;此外,我们采取变形湾办法,绕过传统马可夫链蒙特卡洛(MCCon)的计算挑战。在节选中,我们建议采用一种基本的结果,即:在前几个参数中,以顺序后序的顺序后序调,以允许进行节点选择。与以前的工程相比,我们理论发展前期证据的分差规则减少了对网络进行调整后序结构结构的假设,从而稳定地调整后序结构结构结构结构结构结构结构结构结构,不均平整。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

近期必读的 NeurIPS2020 80多篇【图机器学习】相关论文

专知会员服务

54+阅读 · 2020年11月3日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日

【斯坦福】凸优化圣经- Convex Optimization （附730pdf下载）

专知会员服务

229+阅读 · 2020年6月5日