In recent years, the natural language processing (NLP) community has given increased attention to the disparity of efforts directed towards high-resource languages over low-resource ones. Efforts to remedy this delta often begin with translations of existing English datasets into other languages. However, this approach ignores that different language communities have different needs. We consider a group of low-resource languages, Creole languages. Creoles are both largely absent from the NLP literature, and also often ignored by society at large due to stigma, despite these languages having sizable and vibrant communities. We demonstrate, through conversations with Creole experts and surveys of Creole-speaking communities, how the things needed from language technology can change dramatically from one language to another, even when the languages are considered to be very similar to each other, as with Creoles. We discuss the prominent themes arising from these conversations, and ultimately demonstrate that useful language technology cannot be built without involving the relevant community.
翻译:近年来,自然语言处理社区(NLP)日益重视高资源语言与低资源语言的努力差异。纠正这种三角洲的努力通常始于将现有的英语数据集翻译成其他语言。然而,这一办法忽视了不同语言社区有不同需要。我们认为,低资源语言群体克里奥尔语在NLP文献中基本没有,尽管这些语言规模大、充满活力,但也常常被全社会所忽视。我们通过与克里奥尔语专家的交谈和对克里奥尔语社区的调查,表明语言技术所需要的东西如何能从一种语言急剧改变到另一种语言,即使语言被认为与克里奥尔人非常相似。我们讨论这些对话产生的突出主题,并最终表明,没有相关社区的参与,便无法建立有用的语言技术。