In the context of statistical learning, the Information Bottleneck method seeks a right balance between accuracy and generalization capability through a suitable tradeoff between compression complexity, measured by minimum description length, and distortion evaluated under logarithmic loss measure. In this paper, we study a variation of the problem, called scalable information bottleneck, in which the encoder outputs multiple descriptions of the observation with increasingly richer features. The model, which is of successive-refinement type with degraded side information streams at the decoders, is motivated by some application scenarios that require varying levels of accuracy depending on the allowed (or targeted) level of complexity. We establish an analytic characterization of the optimal relevance-complexity region for vector Gaussian sources. Then, we derive a variational inference type algorithm for general sources with unknown distribution; and show means of parametrizing it using neural networks. Finally, we provide experimental results on the MNIST dataset which illustrate that the proposed method generalizes better to unseen data during the training phase.
翻译:在统计学习方面,信息瓶颈方法力求通过适当权衡压缩复杂性(以最低描述长度衡量)和对数损失测量评估的扭曲之间的适当权衡,在准确性和一般化能力之间求得正确的平衡。在本文中,我们研究了这一问题的变异性,称为可缩放信息瓶颈,其中编码器输出的观测特征越来越丰富的多种描述。该模型是分解器分流退化的侧端信息流的相继精度类型,其动因是某些应用情景,这些情景要求根据允许(或目标)复杂程度的不同精确度。我们为矢量高斯源的最佳相关性-相容性区域制定了分析性的定性。随后,我们为分布不明的一般源得出了一种变异的推论类型算法;并展示了使用神经网络使其半化的手段。最后,我们提供了MNIST数据集的实验结果,表明,拟议的方法在培训阶段更好地概括了隐形数据。