外部声波关注和语义认知数据增强改进背景拼法校正 (Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation)

We previously proposed contextual spelling correction (CSC) to correct the output of end-to-end (E2E) automatic speech recognition (ASR) models with contextual information such as name, place, etc. Although CSC has achieved reasonable improvement in the biasing problem, there are still two drawbacks for further accuracy improvement. First, due to information limitation in text only hypothesis or weak performance of ASR model on rare domains, the CSC model may fail to correct phrases with similar pronunciation or anti-context cases where all biasing phrases are not present in the utterance. Second, there is a discrepancy between the training and inference of CSC. The bias list in training is randomly selected but in inference there may be more similarity between ground truth phrase and other phrases. To solve above limitations, in this paper we propose an improved non-autoregressive (NAR) spelling correction model for contextual biasing in E2E neural transducer-based ASR systems to improve the previous CSC model from two perspectives: Firstly, we incorporate acoustics information with an external attention as well as text hypotheses into CSC to better distinguish target phrase from dissimilar or irrelevant phrases. Secondly, we design a semantic aware data augmentation schema in training phrase to reduce the mismatch between training and inference to further boost the biasing accuracy. Experiments show that the improved method outperforms the baseline ASR+Biasing system by as much as 20.3% relative name recall gain and achieves stable improvement compared to the previous CSC method over different bias list name coverage ratio.

翻译：我们先前曾提议背景拼写校正(CSC),以校正端对端自动语音识别(ASR)模型的输出,并提供名称、地点等背景信息。虽然CSC在偏向问题上取得了合理的改进,但仍存在两个缺陷,以便进一步提高准确性。首先,由于文本中的信息限制,只是假设,或者ASR模式在稀有域上的表现不力,CSC模型可能无法纠正在语句中没有出现所有偏差词的类似发音或反文本案例的词句。第二,CSC的培训和推断比率之间存在差异。培训中的偏差列表是随机选择的,但推断可能更加相似的偏差问题。为了解决上述限制,我们建议改进E2E神经透视基于ASR系统的校正模式,以便从两个角度改进先前的CSC模式。首先,我们将外部关注和外部关注作为SSC的相对准确性范围,在CSAC的排序中,在A类比性定义中,在A类比性培训中,将前一个不相近于CSC的缩缩缩缩定义中,在CSqualememem del del ASememreal lection laction aqual lection squal laction slaction squal laction squal laction slaction slaction squal des squal des des sem des des des des bes bes be sqol des des bes bes bes bes bes des sqs bes