Recently, variational autoencoders have been successfully used to learn a probabilistic prior over speech signals, which is then used to perform speech enhancement. However, variational autoencoders are trained on clean speech only, which results in a limited ability of extracting the speech signal from noisy speech compared to supervised approaches. In this paper, we propose to guide the variational autoencoder with a supervised classifier separately trained on noisy speech. The estimated label is a high-level categorical variable describing the speech signal (e.g. speech activity) allowing for a more informed latent distribution compared to the standard variational autoencoder. We evaluate our method with different types of labels on real recordings of different noisy environments. Provided that the label better informs the latent distribution and that the classifier achieves good performance, the proposed approach outperforms the standard variational autoencoder and a conventional neural network-based supervised approach.
翻译:最近,变式自动电解码器被成功地用于学习对语音信号的概率前程,然后用来进行语音增强;然而,变式自动电解码器仅接受清洁言语培训,因此,与监督方法相比,从吵闹的语音中提取语音信号的能力有限。在本文中,我们提议用监督的分类器指导变式自动电解码器,对噪音言语进行单独培训。估计标签是一个高层次的绝对变量,描述语音信号(例如语音活动),允许与标准变异自动电解码器相比,更知情的潜在分布。我们用不同噪音环境真实录音的不同类型标签来评估我们的方法。只要标签更好地通报潜在分布情况,而且分类器取得良好业绩,拟议方法就超过标准变式自动电解码器和常规神经网络监督方法。