We propose new, data-efficient training tasks for BERT models that improve performance of automatic speech recognition (ASR) systems on conversational speech. We include past conversational context and fine-tune BERT on transcript disambiguation without external data to rescore ASR candidates. Our results show word error rate recoveries up to 37.2%. We test our methods in low-resource data domains, both in language (Norwegian), tone (spontaneous, conversational), and topics (parliament proceedings and customer service phone calls). These techniques are applicable to any ASR system and do not require any additional data, provided a pre-trained BERT model. We also show how the performance of our context-augmented rescoring methods strongly depends on the degree of spontaneity and nature of the conversation.
翻译:我们为BERT模型提出了新的、数据效率高的培训任务,以提高自动语音识别(ASR)系统在谈话式演讲方面的性能;我们包括过去的交谈背景和微调BERT,在没有外部数据的情况下,将笔录脱钩功能微调显示给再生ASR候选人;我们的结果显示,单词错误率回收率高达37.2%;我们在低资源数据领域测试我们的方法,包括语言(挪威语)、语调(自发性、谈话性)和主题(调解程序和客户服务电话),这些技术适用于任何ASR系统,不需要任何额外数据,提供了预先培训的BERT模型;我们还展示了我们背景强化的重校方法的性能如何在很大程度上取决于对话的自发性和性质。