Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies. Although several works have studied theoretical and numerical properties of sparse neural architectures, they have primarily focused on the edge selection. Sparsity through edge selection might be intuitively appealing; however, it does not necessarily reduce the structural complexity of a network. Instead pruning excessive nodes in each layer leads to a structurally sparse network which would have lower computational complexity and memory footprint. We propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for node selection during training. The use of spike-and-slab prior alleviates the need of an ad-hoc thresholding rule for pruning redundant nodes from a network. In addition, we adopt a variational Bayes approach to circumvent the computational challenges of traditional Markov Chain Monte Carlo (MCMC) implementation. In the context of node selection, we establish the fundamental result of variational posterior consistency together with the characterization of prior parameters. In contrast to the previous works, our theoretical development relaxes the assumptions of the equal number of nodes and uniform bounds on all network weights, thereby accommodating sparse networks with layer-dependent node structures or coefficient bounds. With a layer-wise characterization of prior inclusion probabilities, we also discuss optimal contraction rates of the variational posterior. Finally, we provide empirical evidence to substantiate that our theoretical work facilitates layer-wise optimal node recovery together with competitive predictive performance.
翻译:在大规模研究中,尽管一些作品研究了稀有神经结构的理论和数字特性,但它们主要侧重于边缘选择。通过边缘选择实现的分化可能直观地具有吸引力;然而,它不一定降低网络的结构复杂性。在每一层中运行过多的节点会导致一个结构上分散的网络,从而降低计算复杂性和记忆足迹。我们提议采用一种巴伊西亚稀释的解决方案,在培训期间选择节点。使用螺旋和悬浮高斯古斯古斯古斯古老之前的理论和数字特性,以缓解对边缘选择的需要。通过边缘选择实现的分解可能具有直线性;但是,通过边缘选择的分解可能不具有直线性;此外,我们采取变形湾办法,绕过传统马可夫链蒙特卡洛(MCCon)的计算挑战。在节选中,我们建议采用一种基本的结果,即:在前几个参数中,以顺序后序的顺序后序调,以允许进行节点选择。与以前的工程相比,我们理论发展前期证据的分差规则减少了对网络进行调整后序结构结构的假设,从而稳定地调整后序结构结构结构结构结构结构结构结构结构,不均平整。