We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complexvalued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. While this is the commonly used approach, in this paper we propose a weighted variance generative model, where the contribution of each TF point in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram modeling and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.
翻译:我们处理基于变式自动编码器的语音增强问题,这涉及在时频域内学习语言先前的分布。对于变异模型,通常假设为变异模型使用零平均值的复合高斯分布,即语音信息以差异编码作为潜在变量的函数。虽然这是常用的方法,但我们在本文中提议了一个加权差异组合模型,即每个变异点在参数学习中的贡献是加权的。我们对这些加权进行加马先前的分布,这将有效地导致学生的 t分布,而不是用于语言模型的Gaussian。我们根据拟议的变异模型制定高效的培训和语音增强算法。我们在光谱建模和语音增强方面的实验结果显示了拟议方法与标准非加权变异模型相比的有效性和稳健性。