Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC). A lower bound on MI can be obtained from a multi-class classification problem, where a critic attempts to distinguish a positive sample drawn from the underlying joint distribution from $(m-1)$ negative samples drawn from a suitable proposal distribution. Using this approach, MI estimates are bounded above by $\log m$, and could thus severely underestimate unless $m$ is very large. To overcome this limitation, we introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify multiple positive samples at the same time. We show that using the same amount of negative samples, multi-label CPC is able to exceed the $\log m$ bound, while still being a valid lower bound of mutual information. We demonstrate that the proposed approach is able to lead to better mutual information estimation, gain empirical improvements in unsupervised representation learning, and beat a current state-of-the-art knowledge distillation method over 10 out of 13 tasks.
翻译:互换信息(MI)估计值被广泛用于无人监督的代表性学习方法,如对比预测编码(CPC)等。 对 MI 的下限界限可以从多级分类问题中获得。 批评者试图将从基础联合分配中得出的正样本与适当建议分配中得出的负值样本(m-1美元)区别开来。 采用这种方法,MI 估计值受美元(m)以上的约束,因此可能严重低估,除非美元非常大。 为了克服这一限制,我们引入了基于多标签分类问题的新颖的估算值。 在多标签分类问题上,批评者需要同时共同确定多个正值样本。 我们表明,使用相同数量的负值样本,多标签的CPC能够超过$\log m绑定值,同时仍然是相互信息的有效下限。 我们证明,拟议的方法能够导致更好的相互信息估计,在未监督的代表学习中获得经验上的改进,并在13项任务中的10项中击败了当前最先进的知识蒸馏法。