Lattices are compact representations that encode multiple hypotheses, such as speech recognition results or different word segmentations. It is shown that encoding lattices as opposed to 1-best results generated by automatic speech recognizer (ASR) boosts the performance of spoken language understanding (SLU). Recently, pretrained language models with the transformer architecture have achieved the state-of-the-art results on natural language understanding, but their ability of encoding lattices has not been explored. Therefore, this paper aims at adapting pretrained transformers to lattice inputs in order to perform understanding tasks specifically for spoken language. Our experiments on the benchmark ATIS dataset show that fine-tuning pretrained transformers with lattice inputs yields clear improvement over fine-tuning with 1-best results. Further evaluation demonstrates the effectiveness of our methods under different acoustic conditions. Our code is available at https://github.com/MiuLab/Lattice-SLU
翻译:缩略语是将多种假设(如语音识别结果或不同的字条分割)编码起来的缩略语的缩略语的缩略语,它表明,相对于自动语音识别器(ASR)产生的1个最佳结果而言,自动语音识别器(SLU)生成的缩略语编码比1个最佳结果,可以促进口语理解(SLU)的性能。最近,与变压器结构有关的预先培训的语言模型在自然语言理解方面达到了最先进的效果,但是尚未探索其编码拉提语的能力。因此,本文件旨在调整预先培训的变压器,使其适应拉特语输入,以便执行具体针对口语的理解任务。我们在基准 ATIS 数据集上进行的实验显示,用拉蒂输入进行微调的预先培训变压器比微调明显改进了1个最佳结果。进一步评估显示了我们在不同声学条件下的方法的有效性。我们的代码可在https://github.com/MiuLab/Lattice-SLU中查阅。