Approximate inference in Bayesian deep networks exhibits a dilemma of how to yield high fidelity posterior approximations while maintaining computational efficiency and scalability. We tackle this challenge by introducing a novel variational structured approximation inspired by the Bayesian interpretation of Dropout regularization. Concretely, we focus on the inflexibility of the factorized structure in Dropout posterior and then propose an improved method called Variational Structured Dropout (VSD). VSD employs an orthogonal transformation to learn a structured representation on the variational Gaussian noise with plausible complexity, and consequently induces statistical dependencies in the approximate posterior. Theoretically, VSD successfully addresses the pathologies of previous Variational Dropout methods and thus offers a standard Bayesian justification. We further show that VSD induces an adaptive regularization term with several desirable properties which contribute to better generalization. Finally, we conduct extensive experiments on standard benchmarks to demonstrate the effectiveness of VSD over state-of-the-art variational methods on predictive accuracy, uncertainty estimation, and out-of-distribution detection.
翻译:巴伊西亚深层网络的近似推论表明,如何在保持计算效率和可缩放性的同时产生高度忠诚的后近似值,是一个两难的难题。我们通过采用一种由巴伊西亚对辍学规范化的解释所启发的新的变异结构近似值来应对这一挑战。具体地说,我们侧重于辍学后继体中因子结构化结构不灵活的问题,然后提出一个改进的方法,称为变式结构脱落(VSD)。VSD使用一个正方位转换方法来学习结构化地代表变异的高比值噪音,其复杂性似乎相当复杂,从而在近似后继体中产生统计依赖性。理论上,VSD成功地解决了先前静态脱轨方法的病理,从而提供了标准的巴伊西亚理由。我们进一步表明,VSD引出了适应性调整性调整性术语,其中有若干可取的属性,有助于更好地概括化。最后,我们对标准基准进行了广泛的实验,以证明VSD在预测准确性、不确定性估计和分配外检测方面对最新变异方法的有效性。