Conventional automatic speech recognition (ASR) system uses second-order minkowski loss during inference time which is suboptimal as it incorporates only first order statistics in posterior estimation [2]. In this paper we have proposed higher order minkowski loss (4th Order and 6th Order) during inference time, without any changes during training time. The main contribution of the paper is to show that higher order loss uses higher order statistics in posterior estimation, which improves the prediction ability of acoustic model in ASR system. We have shown mathematically that posterior probability obtained due to higher order loss is function of second order posterior and thus the method can be incorporated in standard ASR system in an easy manner. It is to be noted that all changes are proposed during test(inference) time, we do not make any change in any training pipeline. Multiple baseline systems namely, TDNN1, TDNN2, DNN and LSTM are developed to verify the improvement incurred due to higher order minkowski loss. All experiments are conducted on LibriSpeech dataset and performance metrics are word error rate (WER) on "dev-clean", "test-clean", "dev-other" and "test-other" datasets.
翻译:常规自动言语识别(ASR)系统在发酵时间使用第二等的软骨质损失,这是次最佳的,因为它只包含后游估算的第一顺序统计[2]。在本文中,我们建议,在发酵时间内,在不作任何改动的情况下,在试验(推论)时间里,高顺序损失使用第二等的软骨质损失(第4顺序和第6顺序)。文件的主要贡献是表明,较高顺序损失在后游估算中使用较高的顺序统计,这提高了ASR系统声波模型的预测能力。我们用数学显示,由于更高级顺序损失而获得的后游概率是第二顺序后游体的功能,因此,这种方法可以轻松地纳入标准ASR系统。应该指出,在试验(推论)时间里,我们不对任何培训管道作任何改动。正在开发多种基线系统,即TDNN1、TDNN2、DNNNN2、DNNN和LSTM,以核实由于更高顺序损失而导致的改进。所有关于ListripSpeech数据设置和性能指标的试验都是“WER-CRED-CERASY”上的词错误率和“TARTest-Cry-Cry-Cry-Cry-Cry-Cry-CREst-Cry”。