Despite recent progress in Natural Language Understanding (NLU), the creation of multilingual NLU systems remains a challenge. It is common to have NLU systems limited to a subset of languages due to lack of available data. They also often vary widely in performance. We launch a three-phase approach to address the limitations in NLU and help propel NLU technology to new heights. We release a 52 language dataset called the Multilingual Amazon SLU resource package (SLURP) for Slot-filling, Intent classification, and Virtual assistant Evaluation, or MASSIVE, in an effort to address parallel data availability for voice assistants. We organize the Massively Multilingual NLU 2022 Challenge to provide a competitive environment and push the state-of-the art in the transferability of models into other languages. Finally, we host the first Massively Multilingual NLU workshop which brings these components together. The MMNLU workshop seeks to advance the science behind multilingual NLU by providing a platform for the presentation of new research in the field and connecting teams working on this research direction. This paper summarizes the dataset, workshop and the competition and the findings of each phase.
翻译:尽管在自然语言理解(NLU)方面最近取得了进展,但创建多语言的NLU系统仍是一项挑战,由于缺乏可用数据,国家语言系统仅限于一组语言,其性能也往往差异很大。我们启动了一个三阶段办法,以解决国家语言系统的局限性,并帮助将NLU技术推向新的高度。我们发布了一个52个语言数据集,称为 " 多语言的亚马逊语言 SLU资源包 " (SLURP),用于Slot-填充、意向分类和虚拟助理评价,或Massive,以努力解决语音助理平行提供数据的问题。我们组织了大规模多语言的NLU 2022挑战,以提供一个竞争性的环境,并将模型的可移植性推向其他语言。最后,我们主办了第一个多语言NLUMNLU多语言讲习班,将这些组成部分汇集在一起。MNLU讲习班的目的是通过提供一个平台,介绍实地的新研究,并将研究方向上的小组联系起来,推进多语言NLU背后的科学。本文概述了数据集、讲习班以及每个阶段的竞争和研究结果。