JamPatoisNLI provides the first dataset for natural language inference in a creole language, Jamaican Patois. Many of the most-spoken low-resource languages are creoles. These languages commonly have a lexicon derived from a major world language and a distinctive grammar reflecting the languages of the original speakers and the process of language birth by creolization. This gives them a distinctive place in exploring the effectiveness of transfer from large monolingual or multilingual pretrained models. While our work, along with previous work, shows that transfer from these models to low-resource languages that are unrelated to languages in their training set is not very effective, we would expect stronger results from transfer to creoles. Indeed, our experiments show considerably better results from few-shot learning of JamPatoisNLI than for such unrelated languages, and help us begin to understand how the unique relationship between creoles and their high-resource base languages affect cross-lingual transfer. JamPatoisNLI, which consists of naturally-occurring premises and expert-written hypotheses, is a step towards steering research into a traditionally underserved language and a useful benchmark for understanding cross-lingual NLP.
翻译:JamPatoisNLI 提供了以克里奥尔语(牙买加帕托瓦语)表示的自然语言引文的第一个数据集。许多最讲的低资源语言都是克里奥尔语。这些语言通常有一个来自主要世界语言的词汇和独特的语法。这些语言通常有一个来自主要世界语言的词典和反映原讲者语言和通过克里奥尔语产生语言过程的独特语法。这使他们在探索从大型单一语言或多语言预先培训的单一模式转让的有效性方面具有独特的地位。我们的工作与以前的工作一样,表明从这些模式向与其培训组合中的语言无关的低资源语言的转移并不十分有效,我们期望通过向克里奥尔语的转移而取得更大的结果。事实上,我们的实验表明,从几张热莱语的学习中取得比这些不相干的语言更好的结果,帮助我们开始了解从高资源基础语言转移的独特关系如何影响跨语言的转移。 JamPatois NLI, 它由自然生成的馆舍和专家撰写的假说词组成,是将研究转向一种传统的低语言和有用的交叉理解基准的一个步骤。