Speaker Identification using i-vector has gradually been replaced by speaker Identification using deep learning. Speaker Identification based on Convolutional Neural Networks (CNNs) has been widely used in recent years, which learn low-level speech representations from raw waveforms. On this basis, a CNN architecture called SincNet proposes a kind of unique convolutional layer, which has achieved band-pass filters. Compared with standard CNNs, SincNet learns the low and high cutoff frequencies of each filter.This paper proposes an improved CNNs architecture called LineNet, which encourages the first convolutional layer to implement more specific filters than SincNet. LineNet parameterizes the frequency domain shape and can realize band-pass filters by learning some deformation points in frequency domain. Compared with standard CNN, LineNet can learn the characteristics of each filter. Compared with SincNet, LineNet can learn more characteristic parameters, instead of only low and high cutoff frequencies. This provides a personalized filter bank for different tasks. As a result, our experiments show that the LineNet converges faster than standard CNN and performs better than SincNet.
翻译:使用 i- Verctor 的语音识别器逐渐被使用深层学习的语音识别器所取代。 近些年来,基于进化神经网络的语音识别器被广泛使用, 从原始波形中学习低层次的语音表达。 在此基础上, 名为 SincNet 的CNN 的CNN 架构提出了一种独特的进化层, 实现了带宽过滤器。 与标准的CNN 相比, SincNet 能够学习每个过滤器的特性参数, 与标准的CNN 相比, SincNet 能够学习每个过滤器的低和高端连接频率。 本文建议改进CNN 结构, 称为 LineNet, 以鼓励第一个进化层实施比 SincNet 更具体的过滤器。 LineNet 将频率域元化, 并能够通过在频率域内学习某些变形点来实现带宽过滤器。 与标准的CNN CNN 相比, LineNet 能够学习每个过滤器的特性。 与SincNet 相比, LineNet 能够学习更多的特征参数参数参数,, 而不是只有低和高端截断频率。 这为不同任务的个化过滤库。 。 。 。 。 因此, 我们的实验显示线网比标准的CNNNNCNCNCNCNet 和运行得更好。