Unsupervised blind source separation methods do not require a training phase and thus cannot suffer from a train-test mismatch, which is a common concern in neural network based source separation. The unsupervised techniques can be categorized in two classes, those building upon the sparsity of speech in the Short-Time Fourier transform domain and those exploiting non-Gaussianity or non-stationarity of the source signals. In this contribution, spatial mixture models which fall in the first category and independent vector analysis (IVA) as a representative of the second category are compared w.r.t. their separation performance and the performance of a downstream speech recognizer on a reverberant dataset of reasonable size. Furthermore, we introduce a serial concatenation of the two, where the result of the mixture model serves as initialization of IVA, which achieves significantly better WER performance than each algorithm individually and even approaches the performance of a much more complex neural network based technique.
翻译:不受监督的盲源分离方法不需要培训阶段,因此不会受到火车测试不匹配的影响,这是神经网络源分离的一个共同关切。不受监督的技术可以分为两类:建立在短时Fourier变异域的语音宽度基础上的技术,以及利用非Gausianity或源信号不静止的技术。在这一贡献中,属于第一类的空间混合模型和作为第二类代表的独立矢量分析(IVA)是比较它们的分离性能和下游语音识别器在合理规模的反动数据集上的性能。此外,我们引入了两种技术的序列组合,混合模型的结果是四A的初始化,其性能大大优于每种算法,甚至接近于更为复杂的神经网络技术的性能。