In this study, we focus on nonlinear compression methods in spectral features for speaker verification based on deep neural network. We consider different kinds of channel-dependent (CD) nonlinear compression methods optimized in a data-driven manner. Our methods are based on power nonlinearities and dynamic range compression (DRC). We also propose multi-regime (MR) design on the nonlinearities, at improving robustness. Results on VoxCeleb1 and VoxMovies data demonstrate improvements brought by proposed compression methods over both the commonly-used logarithm and their static counterparts, especially for ones based on power function. While CD generalization improves performance on VoxCeleb1, MR provides more robustness on VoxMovies, with a maximum relative equal error rate reduction of 21.6%.
翻译:在本研究中,我们侧重于光谱特征中的非线性压缩方法,用于基于深神经网络的语音校验。我们考虑以数据驱动方式优化的基于频道(CD)的非线性压缩方法。我们的方法基于非线性动力和动态范围压缩(DRC)。我们还提出了非线性多系统设计,以提高强度。VoxCeleb1 和VoxMovies数据的结果显示,拟议压缩方法对常用的对数及其静态对数都带来了改进,特别是基于功率的对数。虽然CD的概括化提高了VoxCeleb1的性能,但MR在VoxMovies上提供了较强的多系统性能性能,最大相对平均误差率减少21.6%。