Natural language processing (NLP) systems have become a central technology in communication, education, medicine, artificial intelligence, and many other domains of research and development. While the performance of NLP methods has grown enormously over the last decade, this progress has been restricted to a minuscule subset of the world's 6,500 languages. We introduce a framework for estimating the global utility of language technologies as revealed in a comprehensive snapshot of recent publications in NLP. Our analyses involve the field at large, but also more in-depth studies on both user-facing technologies (machine translation, language understanding, question answering, text-to-speech synthesis) as well as more linguistic NLP tasks (dependency parsing, morphological inflection). In the process, we (1) quantify disparities in the current state of NLP research, (2) explore some of its associated societal and academic factors, and (3) produce tailored recommendations for evidence-based policy making aimed at promoting more global and equitable language technologies.
翻译:自然语言处理系统已成为通信、教育、医学、人工智能和许多其他研发领域的中心技术。虽然在过去十年中,自然语言处理方法的绩效大幅提高,但这一进展仅限于世界6,500种语言中的一小部分。我们引入了一个框架来估计语言技术的全球效用,如最近国家语言处理系统出版物的全面简况所揭示的那样。我们的分析涉及整个领域,但也更深入地研究了以用户为主的技术(机器翻译、语言理解、回答问题、文本到语音合成)以及更多语言的国家语言处理技术任务(依赖性平衡、形态分析)。在此过程中,我们(1) 量化目前国家语言处理研究状况的差异,(2) 探讨其相关的一些社会和学术因素,(3) 提出有针对性的建议,用于制定循证政策,目的是促进更加全球和公平的语言技术。