与SincNet的原波浪声声扬声器 (Speaker Recognition from raw waveform with SincNet)

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Promising results have been recently obtained with Convolutional Neural Networks (CNNs) when fed by raw speech samples directly. Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants. Proper design of the neural network is crucial to achieve this goal. This paper proposes a novel CNN architecture, called SincNet, that encourages the first convolutional layer to discover more meaningful filters. SincNet is based on parametrized sinc functions, which implement band-pass filters. In contrast to standard CNNs, that learn all elements of each filter, only low and high cutoff frequencies are directly learned from data with the proposed method. This offers a very compact and efficient way to derive a customized filter bank specifically tuned for the desired application. Our experiments, conducted on both speaker identification and speaker verification tasks, show that the proposed architecture converges faster and performs better than a standard CNN on raw waveforms.

翻译：深层次的学习正在逐渐作为i-矢量器的可行替代物受到欢迎,供发言者识别。当通过原始语音样本直接提供原始语音样本时,革命神经网络(CNNs)最近取得了有希望的结果。后者不是直接使用标准手工艺特征,而是从波形中学习低层次的语音表达方式,这有可能使网络能够更好地捕捉重要的窄带喇叭特性,如声波和形成器。神经网络的正确设计对于实现这一目标至关重要。本文件提议了一个新的CNN结构,称为SincNet,鼓励第一个进化层发现更有意义的过滤器。SincNet以配带式螺旋形功能为基础,实施带式过滤器。与标准的CNN相比,每个过滤器的所有元素都只直接学习低和高端断层频率,这提供了非常紧凑和高效的方法,可以生成一个专门为理想应用而调整的定制的过滤库。我们在语音识别和发言者核查任务上进行的实验显示,拟议的结构比标准原波形CNN系统更快和表现得更好。

相关内容

声纹识别

关注 444

说话人识别（Speaker Recognition），或者称为声纹识别（Voiceprint Recognition, VPR），是根据语音中所包含的说话人个性信息，利用计算机以及现在的信息识别技术，自动鉴别说话人身份的一种生物特征识别技术。说话人识别研究的目的就是从语音中提取具有说话人表征性的特征，建立有效的模型和系统，实现自动精准的说话人鉴别。

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

专知会员服务

51+阅读 · 2020年2月16日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日