We present a noisy channel generative model of two sequences, for example text and speech, which enables uncovering the association between the two modalities when limited paired data is available. To address the intractability of the exact model under a realistic data setup, we propose a variational inference approximation. To train this variational model with categorical data, we propose a KL encoder loss approach which has connections to the wake-sleep algorithm. Identifying the joint or conditional distributions by only observing unpaired samples from the marginals is only possible under certain conditions in the data distribution and we discuss under what type of conditional independence assumptions that might be achieved, which guides the architecture designs. Experimental results show that even tiny amount of paired data (5 minutes) is sufficient to learn to relate the two modalities (graphemes and phonemes here) when a massive amount of unpaired data is available, paving the path to adopting this principled approach for all seq2seq models in low data resource regimes.
翻译:我们提出了一个由两个序列(例如文字和言语)组成的噪音频道基因模型,它能够在有有限的对称数据时发现两种模式之间的联系。为了在现实的数据设置下解决精确模型的可吸引性,我们提议了一个变式推论近似值。为了用绝对数据来训练这一变式模型,我们提议了一种KL编码器损失方法,该方法与休眠算法有联系。确定联合或有条件分布的方法,只观察边际未取样品,只有在数据分布的某些条件下才有可能,我们讨论在何种有条件的独立假设下可以实现,指导结构设计。实验结果显示,即使是少量的配对数据(5分钟)也足以在可获得大量未调数据时将两种模式(此处的绘图和电话)联系起来,为在低数据资源系统中对所有后继2等模型采用这种原则方法铺平坦了道路。