Structuring the latent space in probabilistic deep generative models, e.g., variational autoencoders (VAEs), is important to yield more expressive models and interpretable representations, and to avoid overfitting. One way to achieve this objective is to impose a sparsity constraint on the latent variables, e.g., via a Laplace prior. However, such approaches usually complicate the training phase, and they sacrifice the reconstruction quality to promote sparsity. In this paper, we propose a simple yet effective methodology to structure the latent space via a sparsity-promoting dictionary model, which assumes that each latent code can be written as a sparse linear combination of a dictionary's columns. In particular, we leverage a computationally efficient and tuning-free method, which relies on a zero-mean Gaussian latent prior with learnable variances. We derive a variational inference scheme to train the model. Experiments on speech generative modeling demonstrate the advantage of the proposed approach over competing techniques, since it promotes sparsity while not deteriorating the output speech quality.
翻译:在概率深深的基因模型(如变异自动转换器)中构建潜伏空间,对于产生更清晰的模型和可解释的表达方式以及避免过度适应十分重要。实现这一目标的一个办法是对潜伏变量施加宽度限制,例如,之前的拉帕特。然而,这种方法通常使培训阶段复杂化,牺牲重建质量,以促其过度。在本文中,我们提出一种简单而有效的方法,通过宽度促进字典模型来构建潜伏空间,假设每个潜伏代码可以作为词典列的稀薄线性组合书写。特别是,我们利用一种计算高效和无调制方法,这种方法在学习差异之前依赖于零值的高值潜值。我们用一种变推法来培训模型。关于语音变色模型的实验显示了拟议方法相对于竞争技术的优势,因为它既能促进节度,又不会使输出语音质量恶化。