In this work, we unify several existing decoding strategies for punctuation prediction in one framework and introduce a novel strategy which utilises multiple predictions at each word across different windows. We show that significant improvements can be achieved by optimising these strategies after training a model, only leading to a potential increase in inference time, with no requirement for retraining. We further use our decoding strategy framework for the first comparison of tagging and classification approaches for punctuation prediction in a real-time setting. Our results show that a classification approach for punctuation prediction can be beneficial when little or no right-side context is available.
翻译:在这项工作中,我们将现有的几个标点预测解码战略统一在一个框架中,并引入一个新颖的战略,利用不同窗口每个字的多个预测。我们表明,通过优化这些战略,在培训一个模型之后,只有导致推论时间的潜在增加,而无需再培训,才能取得显著的改进。我们进一步使用我们的解码战略框架,首次比较实时标点预测的标记和分类方法。我们的结果显示,当很少有或没有右侧环境时,标点预测的分类方法会有益。