Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation and misunderstandings in end-user applications. However, publicly available SLU resources are limited. In this paper, we release SLURP, a new SLU package containing the following: (1) A new challenging dataset in English spanning 18 domains, which is substantially bigger and linguistically more diverse than existing datasets; (2) Competitive baselines based on state-of-the-art NLU and ASR systems; (3) A new transparent metric for entity labelling which enables a detailed error analysis for identifying potential areas of improvement. SLURP is available at https: //github.com/pswietojanski/slurp.
翻译:口头语言理解直接从音频数据中推断出语义含义,从而有可能减少终端用户应用中的错误传播和误解,然而,公开的SLU资源有限,在本文件中,我们发布了一个新的SLURP软件包,其中包含以下内容:(1) 一个新的具有挑战性的英文数据集,涵盖18个领域,比现有数据集大得多,语言上更为多样化;(2) 基于最新NLU和ASR系统的竞争性基线;(3) 新的实体标签透明度指标,能够进行详细的错误分析,以确定潜在的改进领域。 SLURP可在https:/github.com/pswietojanski/slurp上查阅。