变式内扣式退学 (Variational Nested Dropout)

Nested dropout is a variant of dropout operation that is able to order network parameters or features based on the pre-defined importance during training. It has been explored for: I. Constructing nested nets: the nested nets are neural networks whose architectures can be adjusted instantly during testing time, e.g., based on computational constraints. The nested dropout implicitly ranks the network parameters, generating a set of sub-networks such that any smaller sub-network forms the basis of a larger one. II. Learning ordered representation: the nested dropout applied to the latent representation of a generative model (e.g., auto-encoder) ranks the features, enforcing explicit order of the dense representation over dimensions. However, the dropout rate is fixed as a hyper-parameter during the whole training process. For nested nets, when network parameters are removed, the performance decays in a human-specified trajectory rather than in a trajectory learned from data. For generative models, the importance of features is specified as a constant vector, restraining the flexibility of representation learning. To address the problem, we focus on the probabilistic counterpart of the nested dropout. We propose a variational nested dropout (VND) operation that draws samples of multi-dimensional ordered masks at a low cost, providing useful gradients to the parameters of nested dropout. Based on this approach, we design a Bayesian nested neural network that learns the order knowledge of the parameter distributions. We further exploit the VND under different generative models for learning ordered latent distributions. In experiments, we show that the proposed approach outperforms the nested network in terms of accuracy, calibration, and out-of-domain detection in classification tasks. It also outperforms the related generative models on data generation tasks.

翻译：内网辍学是一种辍学操作的变体,它能够根据培训中预先界定的重要性来订购网络参数或特征。它已被探讨用于:I. 建造嵌网:嵌网网是神经网络,其结构可以在测试期间即刻调整,例如,基于计算限制。嵌网辍学隐含地排列网络参数,产生一系列子网络,使任何较小的子网络都形成较大网络的基础。二. 学习按顺序排列的表述:嵌网退出适用于一种基因化模型(例如,自动编码精度)的潜在代表形式,将功能排序,在尺寸上执行密度代表的清晰顺序。然而,在整个培训过程中,退网是作为超参数固定的。对于巢网,当网络参数被删除时,性能会腐蚀成一个人类指定的轨迹,而不是从数据中学习的轨迹。对于变形模型来说,特性的重要性被指定为固定矢量,限制代表学习的灵活性。为了解决问题,我们专注于在嵌巢化的内型序列化模型下,我们学习了这个稳定的内座模型,在构建网络中学习一个高层次的模型。我们提议在设计中进行一个低级的升级的变式的模型。我们提议在模型中进行一个低级的变式的变式数据。