使用大语言模型推进多语种语义解析器 (Bootstrapping Multilingual Semantic Parsers using Large Language Models)

Despite cross-lingual generalization demonstrated by pre-trained multilingual models, the translate-train paradigm of transferring English datasets across multiple languages remains to be the key ingredient for training task-specific multilingual models. However, for many low-resource languages, the availability of a reliable translation service entails significant amounts of costly human-annotated translation pairs. Further, the translation services for low-resource languages may continue to be brittle due to domain mismatch between the task-specific input text and the general-purpose text used while training the translation models. We consider the task of multilingual semantic parsing and demonstrate the effectiveness and flexibility offered by large language models (LLMs) for translating English datasets into several languages via few-shot prompting. We provide (i) Extensive comparisons with prior translate-train methods across 50 languages demonstrating that LLMs can serve as highly effective data translators, outperforming prior translation based methods on 40 out of 50 languages; (ii) A comprehensive study of the key design choices that enable effective data translation via prompted LLMs.

翻译：尽管经过事先培训的多语文模式展示了跨语言的交叉概括,但将英文数据集跨越多种语文的翻译培训模式仍然是培训具体任务多语种模式的关键要素,然而,对于许多低资源语言而言,提供可靠的翻译服务需要大量昂贵的人工附加说明的翻译配对;此外,低资源语言的翻译服务可能仍然很困难,因为具体任务的投入文本和在培训翻译模式时使用的普通用途文本在域上不匹配。我们考虑多语言语义拼写的任务,并展示大语言模式在通过几发提示将英文数据集翻译成几种语文方面提供的效力和灵活性。我们提供了(一) 与50种语言的先前翻译方法的广泛比较,表明LLMMs可以发挥高度有效的数据翻译作用,在50种语言中的40种语言中,比先前使用的翻译方法要好;(二) 全面研究关键设计选择,以便能够通过激励LLMSMs进行有效的数据翻译。