The Information Bottleneck (IB) provides an information theoretic principle for representation learning, by retaining all information relevant for predicting label while minimizing the redundancy. Though IB principle has been applied to a wide range of applications, its optimization remains a challenging problem which heavily relies on the accurate estimation of mutual information. In this paper, we present a new strategy, Variational Self-Distillation (VSD), which provides a scalable, flexible and analytic solution to essentially fitting the mutual information but without explicitly estimating it. Under rigorously theoretical guarantee, VSD enables the IB to grasp the intrinsic correlation between representation and label for supervised training. Furthermore, by extending VSD to multi-view learning, we introduce two other strategies, Variational Cross-Distillation (VCD) and Variational Mutual-Learning (VML), which significantly improve the robustness of representation to view-changes by eliminating view-specific and task-irrelevant information. To verify our theoretically grounded strategies, we apply our approaches to cross-modal person Re-ID, and conduct extensive experiments, where the superior performance against state-of-the-art methods are demonstrated. Our intriguing findings highlight the need to rethink the way to estimate mutual
翻译:信息博特内克(IB)为代表性学习提供了一个信息理论原则,它保留了所有与预测标签有关的信息,同时尽量减少冗余。尽管IB原则已应用于广泛的应用,但其优化仍然是一个挑战性问题,严重依赖对相互信息的准确估计。在本文件中,我们提出了一个新的战略,即动态自我蒸馏(VSD),它提供了一种可扩展、灵活、分析的解决方案,基本上适应相互信息,但没有明确估计这些信息。在严格的理论保障下,VSD使IB能够掌握监督培训的代表和标签之间的内在关联。此外,通过将VSD扩大到多视角学习,我们引入了另外两种战略,即动态交叉蒸馏(VCD)和动态相互学习(VML),通过消除特定观点和任务相关的信息,大大提高了代表对变化的稳健性。为了核实我们基于理论的战略,我们运用了跨模式的人再ID,并进行了广泛的实验,在这种实验中,针对州级再思考方法的优异性表现是展示的。