We address the question of characterizing and finding optimal representations for supervised learning. Traditionally, this question has been tackled using the Information Bottleneck, which compresses the inputs while retaining information about the targets, in a decoder-agnostic fashion. In machine learning, however, our goal is not compression but rather generalization, which is intimately linked to the predictive family or decoder of interest (e.g. linear classifier). We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of the desired predictive family. As a result, DIB gives rise to representations that are optimal in terms of expected test performance and can be estimated with guarantees. Empirically, we show that the framework can be used to enforce a small generalization gap on downstream classifiers and to predict the generalization ability of neural networks.
翻译:我们处理的是监督学习的特征化和找到最佳表现方式的问题。传统上,这个问题是通过信息瓶颈来解决的,信息瓶颈一方面压缩投入,另一方面保留关于目标的信息,以解码器和不可知性的方式加以压缩。然而,在机器学习中,我们的目标不是压缩,而是概括化,这与预测性家庭或利益分离者(如线性分类者)密切相关。我们建议代号信息瓶颈(DIB)从理想的预测性家庭的角度考虑信息保留和压缩问题。结果,DIB产生了在预期测试性能方面最理想的表述,并且可以有保证地加以估计。我们生动地表明,这个框架可以用来在下游分类者或利益分离者(如线性分类者)上实施小型的概括化差距,并预测神经网络的普遍化能力。