利用波斯语语音识别中极富进进动最大输出神经网络的时间- 时间- 时间本地化 (Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition)

In this paper, a CNN-based structure for the time-frequency localization of information is proposed for Persian speech recognition. Research has shown that the receptive fields' spectrotemporal plasticity of some neurons in mammals' primary auditory cortex and midbrain makes localization facilities improve recognition performance. Over the past few years, much work has been done to localize time-frequency information in ASR systems, using the spatial or temporal immutability properties of methods such as HMMs, TDNNs, CNNs, and LSTM-RNNs. However, most of these models have large parameter volumes and are challenging to train. For this purpose, we have presented a structure called Time-Frequency Convolutional Maxout Neural Network (TFCMNN) in which parallel time-domain and frequency-domain 1D-CMNNs are applied simultaneously and independently to the spectrogram, and then their outputs are concatenated and applied jointly to a fully connected Maxout network for classification. To improve the performance of this structure, we have used newly developed methods and models such as Dropout, maxout, and weight normalization. Two sets of experiments were designed and implemented on the FARSDAT dataset to evaluate the performance of this model compared to conventional 1D-CMNN models. According to the experimental results, the average recognition score of TFCMNN models is about 1.6% higher than the average of conventional 1D-CMNN models. In addition, the average training time of the TFCMNN models is about 17 hours lower than the average training time of traditional models. Therefore, as proven in other sources, time-frequency localization in ASR systems increases system accuracy and speeds up the training process.

翻译：在本文中,为波斯语语音识别提议了一个基于CNN的信息时间频率本地化结构。研究显示,哺乳动物初级听觉皮层和中脑中的某些神经元的可接受字段的分光当量可塑性提高了本地化设施的认知性。在过去几年里,已经做了大量工作,利用HMM、TDNN、CNN和LSTM-RNNs等方法的空间或时间不可移动性等方法的空间或时间性能将时间频信息本地化。然而,大多数这些模型都具有巨大的参数量,并且具有培训的挑战性。为此,我们提出了一个称为时间-宽度共振动神经网络(TFCMNNN)的结构,其中同时和频度常态1DMNNNNNN的系统应用了平行的时间间隔和频率信息,随后,其产出被组合起来并被联合应用于完全连接的 Maxout网络进行分类。为了改进这一结构的性能,我们使用了新开发的方法和模型,例如:降压的、峰值1和重量1NFA标准系统的平均培训速度速度速度,这是对常规标准模型的实验结果,这是对NFAFDMFA平均培训模型的平均模型的实验,这是对常规模型的平均时间模型的实验,对常规模型进行比对常规分析。