Punctuation prediction for automatic speech recognition (ASR) output transcripts plays a crucial role for improving the readability of the ASR transcripts and for improving the performance of downstream natural language processing applications. However, achieving good performance on punctuation prediction often requires large amounts of labeled speech transcripts, which is expensive and laborious. In this paper, we propose a Discriminative Self-Training approach with weighted loss and discriminative label smoothing to exploit unlabeled speech transcripts. Experimental results on the English IWSLT2011 benchmark test set and an internal Chinese spoken language dataset demonstrate that the proposed approach achieves significant improvement on punctuation prediction accuracy over strong baselines including BERT, RoBERTa, and ELECTRA models. The proposed Discriminative Self-Training approach outperforms the vanilla self-training approach. We establish a new state-of-the-art (SOTA) on the IWSLT2011 test set, outperforming the current SOTA model by 1.3% absolute gain on F$_1$.
翻译:自动语音识别(ASR)产出记录誊本的标点预测对于提高ASR记录誊本的可读性以及提高下游自然语言处理应用程序的性能具有关键作用。然而,要在标点预测上取得良好表现,往往需要大量贴有标签的语音记录誊本,这种记录誊本成本高昂,而且难度很大。在本文中,我们建议采用带有加权损失和歧视性标签的有差异的自我培训方法,平稳地利用未贴标签的语音记录誊本。2011年英语IWSLT2011基准测试集和中国内部口语数据集的实验结果表明,拟议方法大大改进了标点预测的准确性,超过了包括BERT、ROBERTA和ELECTRA模型在内的强基线。拟议的有差异的自我培训方法优于香草的自我培训方法。我们在IWSLT2011年测试中确立了一个新的状态,将目前的SOTA模型的绝对收益比F$1美元高1.3%。