标题：在口语理解任务中，基于文本、声学和网络格表示的有效性。摘要：在这篇论文中，我们对不同表征进行了详尽的评估，以解决口语理解（SLU）设置下的意图分类问题。我们基准了三种类型的系统来执行SLU意图检测任务：1）基于文本的方法，2）基于网络格的方法，以及一种新的3）多模方式。我们的工作提供了一种全面的分析，即在不同情况下（例如，自动生成的转录本和手动创建的转录本）不同最先进的SLU系统可以实现的性能。我们在公开可用的SLURP口语资源语料库上评估这些系统。我们的结果表明，使用更丰富的自动语音识别（ASR）输出形式，即单词一致性网络，使SLU系统在与1-best设置相比时得到改善（相对改善了5.5％）。但是，跨模态方法，即从声学和文本嵌入中学习，获得的性能与oracle设置类似，在1-best配置上相对改善了17.8％，成为克服自动生成的转录本的限制的推荐替代方法。 (Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks)

翻译：标题：在口语理解任务中，基于文本、声学和网络格表示的有效性。摘要：在这篇论文中，我们对不同表征进行了详尽的评估，以解决口语理解（SLU）设置下的意图分类问题。我们基准了三种类型的系统来执行SLU意图检测任务：1）基于文本的方法，2）基于网络格的方法，以及一种新的3）多模方式。我们的工作提供了一种全面的分析，即在不同情况下（例如，自动生成的转录本和手动创建的转录本）不同最先进的SLU系统可以实现的性能。我们在公开可用的SLURP口语资源语料库上评估这些系统。我们的结果表明，使用更丰富的自动语音识别（ASR）输出形式，即单词一致性网络，使SLU系统在与1-best设置相比时得到改善（相对改善了5.5％）。但是，跨模态方法，即从声学和文本嵌入中学习，获得的性能与oracle设置类似，在1-best配置上相对改善了17.8％，成为克服自动生成的转录本的限制的推荐替代方法。

Esaú Villatoro-Tello,Srikanth Madikeri,Juan Zuluaga-Gomez,Bidisha Sharma,Seyyed Saeed Sarfjoo,Iuliia Nigmatulina,Petr Motlicek,Alexei V. Ivanov,Aravind Ganapathiraju

from arxiv, Accepted in ICASSP 2023

In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable performance of different state-of-the-art SLU systems under different circumstances, e.g., automatically- vs. manually-generated transcripts. We evaluate the systems on the publicly available SLURP spoken language resource corpus. Our results indicate that using richer forms of Automatic Speech Recognition (ASR) outputs, namely word-consensus-networks, allows the SLU system to improve in comparison to the 1-best setup (5.5% relative improvement). However, crossmodal approaches, i.e., learning from acoustic and text embeddings, obtains performance similar to the oracle setup, a relative improvement of 17.8% over the 1-best configuration, being a recommended alternative to overcome the limitations of working with automatically generated transcripts.

翻译：