In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could be improved by deep representation learning (DRL). Based on our previous work, we in this paper propose to use $\beta$-VAE to further improve PVAE's ability of representation learning. More specifically, our $\beta$-VAE can improve PVAE's capacity of disentangling different latent variables from the observed signal without the trade-off problem between disentanglement and signal reconstruction. This trade-off problem widely exists in previous $\beta$-VAE algorithms. Unlike the previous $\beta$-VAE algorithms, the proposed $\beta$-VAE strategy can also be used to optimize the DNN's structure. This means that the proposed method can not only improve PVAE's SE performance but also reduce the number of PVAE training parameters. The experimental results show that the proposed method can acquire better speech and noise latent representation than PVAE. Meanwhile, it also obtains a higher scale-invariant signal-to-distortion ratio, speech quality, and speech intelligibility.
翻译:在先前的工作中,我们建议采用基于变式自动读数(VAE)的贝叶斯变换语言强化培训语言强化方法(PVAE),该方法表明传统的深神经网络强化方法(DNN)的SE性能可以通过深层代表性学习(DRL)得到改进。根据我们以前的工作,我们本文建议使用$\beta$-VAE来进一步提高PVAE的代表学习能力。更具体地说,我们美元\beta$-VAE可以提高PVAE的演示信号分解不同潜在变量的能力,而不会在分解和信号重建之间出现权衡问题。这一交易问题在以前的美元\beta$-VAE算法中广泛存在。与以前的美元\beta$-VAE算法不同,拟议的$\beta$-VAE战略也可以用来优化DNNE的结构。这意味着,拟议的方法不仅可以改善PVAE的性能性能,而且还可以减少PVAE培训参数的数量。实验性结果显示,在PVA-VA演讲中,更高的语言代表比例也能够获得更好的比例。