Bayesian神经网络中的图层适应节点选择:统计保障和实施细节 (Layer Adaptive Node Selection in Bayesian Neural Networks: Statistical Guarantees and Implementation Details)

Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies. Although several works have studied theoretical and numerical properties of sparse neural architectures, they have primarily focused on the edge selection. Sparsity through edge selection might be intuitively appealing; however, it does not necessarily reduce the structural complexity of a network. Instead pruning excessive nodes leads to a structurally sparse network with significant computational speedup during inference. To this end, we propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for automatic node selection during training. The use of spike-and-slab prior alleviates the need of an ad-hoc thresholding rule for pruning. In addition, we adopt a variational Bayes approach to circumvent the computational challenges of traditional Markov Chain Monte Carlo (MCMC) implementation. In the context of node selection, we establish the fundamental result of variational posterior consistency together with the characterization of prior parameters. In contrast to the previous works, our theoretical development relaxes the assumptions of the equal number of nodes and uniform bounds on all network weights, thereby accommodating sparse networks with layer-dependent node structures or coefficient bounds. With a layer-wise characterization of prior inclusion probabilities, we discuss the optimal contraction rates of the variational posterior. We empirically demonstrate that our proposed approach outperforms the edge selection method in computational complexity with similar or better predictive performance. Our experimental evidence further substantiates that our theoretical work facilitates layer-wise optimal node recovery.

翻译：在大规模研究中,尽管一些作品研究了稀有神经结构的理论和数字特性,但它们主要侧重于边缘选择。通过边缘选择实现的分化可能直观地具有吸引力;然而,这并不一定降低网络的结构复杂性。过度的节点的修剪导致结构分散的网络,在推理过程中计算速度加快。为此,我们提议采用“峰值”和“斯拉布”前科,采用巴耶斯稀疏的解决方案,以便在培训期间进行自动节点选择。使用“螺旋”前科,主要侧重于边缘选择。通过边缘选择实现的分层分层分层可能减少网络结构结构结构结构的复杂性;此外,我们采用“变形海湾”方法来绕过传统的Markov连锁蒙特卡洛(MC ) 执行的计算挑战。在节点选择中,我们提出“变形”后端方法的一致性与先前参数的描述相近。与以往的工程相比,我们的理论发展放松了“螺旋-偏向”预估值的假设,即我们之前的正值网络的精确度结构结构,从而稳定地调整了我们之前的比重度结构结构。