After their introduction to robust speech recognition, power normalized cepstral coefficient (PNCC) features were successfully adopted to other tasks, including speaker verification. However, as a feature extractor with long-term operations on the power spectrogram, its temporal processing and amplitude scaling steps dedicated on environmental compensation may be redundant. Further, they might suppress intrinsic speaker variations that are useful for speaker verification based on deep neural networks (DNN). Therefore, in this study, we revisit and optimize PNCCs by ablating its medium-time processor and by introducing channel energy normalization. Experimental results with a DNN-based speaker verification system indicate substantial improvement over baseline PNCCs on both in-domain and cross-domain scenarios, reflected by relatively 5.8% and 61.2% maximum lower equal error rate on VoxCeleb1 and VoxMovies, respectively.
翻译:在引入强力语音识别后,成功地将电源正常阴极系数(PNCC)特性应用于其他任务,包括发言者核查;然而,作为动力光谱、其时间处理和环境补偿的振幅缩尺度等长期操作的特征提取器,可能多余;此外,这些特性可能抑制基于深层神经网络(DNN)对语音验证有用的内在语音变异。因此,在本研究中,我们通过缩短中时处理器和引入频道能源正常化来重新审视和优化PNCC。 DNN发言人核查系统的实验结果显示,分别反映为VoxCeleb1和VoxMovies相对5.8%和61.2 % 最高平均误差率的底线内和跨界情景,大大改进了PNC。