In this paper, a CNN-based structure for time-frequency localization of audio signal information in the ASR acoustic model is proposed for Persian speech recognition. Research has shown that the receptive fields' time-frequency flexibility in some mammals' auditory neurons system improves recognition performance. Biosystems have inspired many artificial systems because of their high efficiency and performance, so time-frequency localization has been used extensively to improve system performance. In the last few years, much work has been done to localize time-frequency information in ASR systems, which has used the spatial immutability properties of methods such as TDNN, CNN and LSTM-RNN. However, most of these models have large parameter volumes and are challenging to train. In the structure we have designed, called Time-Frequency Convolutional Maxout Neural Network (TFCMNN), two parallel blocks consisting of 1D-CMNN each have weight sharing in one dimension, are applied simultaneously but independently to the feature vectors. Then their output is concatenated and applied to a fully connected Maxout network for classification. To improve the performance of this structure, we have used newly developed methods and models such as the maxout, Dropout, and weight normalization. Two experimental sets were designed and implemented on the Persian FARSDAT speech data set to evaluate the performance of this model compared to conventional 1D-CMNN models. According to the experimental results, the average recognition score of TFCMNN models is about 1.6% higher than the average of conventional models. In addition, the average training time of the TFCMNN models is about 17 hours lower than the average training time of traditional models. As a result, as mentioned in other references, time-frequency localization in ASR systems increases system accuracy and speeds up the model training process.


翻译:在本文中,为波斯语语音识别提议了一个基于CNN的ASR音响模型中音频信号信息时间频率本地化结构。研究显示,某些哺乳动物听觉神经系统的接收场时间-频率灵活性提高了认知性能。生物系统激励了许多人工系统,因为其效率和性能高,因此广泛使用了时间-频率本地化来提高系统性能。过去几年,在ASR系统中,为将时间-频率信息本地化做了大量工作,该系统使用了诸如TDNN、CNN和LSTM-RNN等方法的空间不可移动性特性。然而,这些模型中的大多数具有较大的参数数量,而且具有培训的挑战性。在这种结构中,我们设计了称为时间-频变变变变变变的MASNNNNNN网络(TNMNNNNNNN),由1D-MNMNNNN的重量共享,同时应用但独立地对特性矢变换。然后,将其输出归为完全连通的 Maxout 分类。但是,为了改进这一结构的性能性能结构,我们使用了新开发的SDFNFA和SDMA平均时间模型,我们使用了SDMA的正常化模型, 标准模型, 和最高级的模型比SDMDMDMDMA的模型, 和最高级的SDMDMDA的模型比的模型, 和最高级的SDA。在SDA的模型是用来的模型。

0
下载
关闭预览

相关内容

专知会员服务
59+阅读 · 2020年3月19日
【ICLR-2020】网络反卷积,NETWORK DECONVOLUTION
专知会员服务
37+阅读 · 2020年2月21日
内涵网络嵌入:Content-rich Network Embedding
我爱读PAMI
4+阅读 · 2019年11月5日
深度卷积神经网络中的降采样
极市平台
12+阅读 · 2019年5月24日
哇~这么Deep且又轻量的Network,实时目标检测
计算机视觉战队
7+阅读 · 2018年8月15日
ResNet, AlexNet, VGG, Inception:各种卷积网络架构的理解
全球人工智能
19+阅读 · 2017年12月17日
Highway Networks For Sentence Classification
哈工大SCIR
4+阅读 · 2017年9月30日
Maxout Network原理及其TensorFlow实现
深度学习每日摘要
5+阅读 · 2017年6月4日
Neural Speech Synthesis with Transformer Network
Arxiv
5+阅读 · 2019年1月30日
Arxiv
8+阅读 · 2018年11月27日
Arxiv
6+阅读 · 2018年7月29日
Arxiv
3+阅读 · 2017年10月1日
VIP会员
相关VIP内容
专知会员服务
59+阅读 · 2020年3月19日
【ICLR-2020】网络反卷积,NETWORK DECONVOLUTION
专知会员服务
37+阅读 · 2020年2月21日
Top
微信扫码咨询专知VIP会员