Speaker verification (SV) suffers from unsatisfactory performance in far-field scenarios due to environmental noise andthe adverse impact of room reverberation. This work presents a benchmark of multichannel speech enhancement for far-fieldspeaker verification. One approach is a deep neural network-based, and the other is a combination of deep neural network andsignal processing. We integrated a DNN architecture with signal processing techniques to carry out various experiments. Ourapproach is compared to the existing state-of-the-art approaches. We examine the importance of enrollment in pre-processing,which has been largely overlooked in previous studies. Experimental evaluation shows that pre-processing can improve the SVperformance as long as the enrollment files are processed similarly to the test data and that test and enrollment occur within similarSNR ranges. Considerable improvement is obtained on the generated and all the noise conditions of the VOiCES dataset.
翻译:由于环境噪音和室反响的不利影响,音员校验(SV)在远方情景中表现不尽如人意,因为环境噪音和室反响的不利影响。这项工作为远方播音器校验提供了多频道语音增强的基准。一种方法是深神经网络,另一种是深神经网络和信号处理的结合。我们结合了带有信号处理技术的DNN结构来进行各种实验。我们的处方与现有的最先进的处理方法相比较。我们研究了预处理录录入的重要性,而以前的研究基本上忽视了这一点。实验性评估表明,只要录入档案的处理与测试数据相似,并且测试和录入在类似的SRNR范围内进行,预处理前可改进SVV的性能。对VICS数据集的生成和所有噪音条件都取得了很大的改进。