SentenceMIM is a probabilistic auto-encoder for language data, trained with Mutual Information Machine (MIM) learning to provide a fixed length representation of variable length language observations (i.e., similar to VAE). Previous attempts to learn VAEs for language data faced challenges due to posterior collapse. MIM learning encourages high mutual information between observations and latent variables, and is robust against posterior collapse. As such, it learns informative representations whose dimension can be an order of magnitude higher than existing language VAEs. Importantly, the SentenceMIM loss has no hyper-parameters, simplifying optimization. We compare sentenceMIM with VAE, and AE on multiple datasets. SentenceMIM yields excellent reconstruction, comparable to AEs, with a rich structured latent space, comparable to VAEs. The structured latent representation is demonstrated with interpolation between sentences of different lengths. We demonstrate the versatility of sentenceMIM by utilizing a trained model for question-answering and transfer learning, without fine-tuning, outperforming VAE and AE with similar architectures.
翻译:句号MIM是语言数据的一个概率自动编码器,受过相互信息机器(MIM)培训,以提供不同语言的固定长度表示不同语言的观察(即类似于VAE)。以前为语言数据而学习VAE的尝试由于后天崩溃而面临挑战。MIM学习鼓励观测和潜在变量之间的高度相互信息,并有力地防止后天崩溃。因此,它学会了信息化的表述,其范围可能比现有语言VAEs高出一个数量级。重要的是,句号MIM损失没有超度参数,简化了优化。我们在多个数据集上将句号MIM与VAE和AE作比较。句号与AE作对比,可以进行与AE相比的极佳的重建,结构化潜力空间与VAE相似。结构化的潜在空间与VAE相似,通过不同长度的句号之间的相互调而表现出了结构化的潜在代表性。我们通过使用经过培训的问答和转移学习模式来证明句号的多重性。我们用一个经过精细调的模型来证明MIM具有类似的结构。