Nested networks or slimmable networks are neural networks whose architectures can be adjusted instantly during testing time, e.g., based on computational constraints. Recent studies have focused on a "nested dropout" layer, which is able to order the nodes of a layer by importance during training, thus generating a nested set of sub-networks that are optimal for different configurations of resources. However, the dropout rate is fixed as a hyper-parameter over different layers during the whole training process. Therefore, when nodes are removed, the performance decays in a human-specified trajectory rather than in a trajectory learned from data. Another drawback is the generated sub-networks are deterministic networks without well-calibrated uncertainty. To address these two problems, we develop a Bayesian approach to nested neural networks. We propose a variational ordering unit that draws samples for nested dropout at a low cost, from a proposed Downhill distribution, which provides useful gradients to the parameters of nested dropout. Based on this approach, we design a Bayesian nested neural network that learns the order knowledge of the node distributions. In experiments, we show that the proposed approach outperforms the nested network in terms of accuracy, calibration, and out-of-domain detection in classification tasks. It also outperforms the related approach on uncertainty-critical tasks in computer vision.
翻译:内嵌网络或微薄的网络是神经网络,其结构可以在测试期间即刻调整,例如,基于计算限制。最近的研究侧重于一个“免职”层,该层能够在培训期间按重要程度排列一层节点,从而形成一套最适合不同资源配置的嵌套子网络。然而,在整个培训过程中,辍学率被固定为跨不同层的超参数。因此,当节点被删除时,性能在人类指定的轨迹上,而不是从数据中学习的轨迹上下降。另一个缺陷是生成的子网络是确定性网络,没有很好地校准的不确定性。为了解决这两个问题,我们开发了一套贝叶斯式的嵌套神经网络方法。我们提议了一个变式指令单位,从提议的下山分布中以低成本提取宿位辍学样本,为固态方法的参数提供了有用的梯度。基于这一方法,我们设计了一个Bayesian 嵌套的神经网络,在不精确度的网络配置中学习了定位方法的顺序知识。在不精确的网络配置中,我们设计了一个测试,在不精确的分类中,在计算方法中显示与不精确的模型。