Many studies have examined the shortcomings of word error rate (WER) as an evaluation metric for automatic speech recognition (ASR) systems, particularly when used for spoken language understanding tasks such as intent recognition and dialogue systems. In this paper, we propose Hybrid-SD (H_SD), a new hybrid evaluation metric for ASR systems that takes into account both semantic correctness and error rate. To generate sentence dissimilarity scores (SD), we built a fast and lightweight SNanoBERT model using distillation techniques. Our experiments show that the SNanoBERT model is 25.9x smaller and 38.8x faster than SRoBERTa while achieving comparable results on well-known benchmarks. Hence, making it suitable for deploying with ASR models on edge devices. We also show that H_SD correlates more strongly with downstream tasks such as intent recognition and named-entity recognition (NER).
翻译:许多研究审查了单词错误率(WER)作为自动语音识别(ASR)系统评价指标的缺点,特别是在用于诸如意向识别和对话系统等口语理解任务时;在本文件中,我们建议采用混合-SD(H_SD),这是一种考虑到语义正确性和误差率的新混合评价指标;为了产生判决差异分数(SD),我们利用蒸馏技术建立了一个快速和轻量的SnanoBERT模型;我们的实验表明,SnanoBERTA模型比SROBERTA模型要小25.9x和38.8x,同时在众所周知的基准上取得可比较的结果。因此,我们建议采用混合-SD(H_SD),它适合于在边缘装置上部署ASR模型。我们还表明,H_SD(SD)与下游任务(例如意向识别和点名实体识别(NER)的关系更为密切。