Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future of some step in the sequence. Contrastive lower bounds on MI are easy to optimize, but have a strong underestimation bias when estimating large amounts of MI. We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views. This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI, which facilitates approximation via contrastive bounds. To maximize the sum, we formulate a contrastive lower bound on the conditional MI which can be approximated efficiently. We refer to our general approach as Decomposed Estimation of Mutual Information (DEMI). We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting, and learns better representations in a vision domain and for dialogue generation.
翻译:最近的对比性代表性学习方法取决于对一个基本背景的多种观点之间的相互信息估算(MI) 。 例如,我们可以通过应用数据增强来获取对特定图像的多重观点,或者我们可以将一个序列分成由过去和今后某个步骤组成的不同观点。 对比性较低的MI的下限很容易优化,但在估算大量MI的数值时却有很大的低估偏差。 我们提议将完整的MI估算问题分解为一个小的估算问题,将一个观点分解成一个逐渐更知情的子视图,并通过对混合的视图适用MI的链规则。 这个表达方式包含一个无条件和有条件的MI术语的总和,每个测量总MI的微小块,通过对比性界限促进接近。 为了最大化,我们在有条件的MI上形成一个对比性较低的界限,这个界限可以有效估计。 我们称之为相互信息分解的估算(DEI)的一般方法。 我们表明,DEMI在合成环境中可以捕捉到比标准非扭曲的对比性约束性强的MI,在一代对话中可以学习更好的图像。