More than 43% of the languages spoken in the world are endangered, and language loss currently occurs at an accelerated rate because of globalization and neocolonialism. Saving and revitalizing endangered languages has become very important for maintaining the cultural diversity on our planet. In this work, we focus on discussing how NLP can help revitalize endangered languages. We first suggest three principles that may help NLP practitioners to foster mutual understanding and collaboration with language communities, and we discuss three ways in which NLP can potentially assist in language education. We then take Cherokee, a severely-endangered Native American language, as a case study. After reviewing the language's history, linguistic features, and existing resources, we (in collaboration with Cherokee community members) arrive at a few meaningful ways NLP practitioners can collaborate with community partners. We suggest two approaches to enrich the Cherokee language's resources with machine-in-the-loop processing, and discuss several NLP tools that people from the Cherokee community have shown interest in. We hope that our work serves not only to inform the NLP community about Cherokee, but also to provide inspiration for future work on endangered languages in general. Our code and data will be open-sourced at https://github.com/ZhangShiyue/RevitalizeCherokee
翻译:世界上超过43%的语言濒临灭绝,目前由于全球化和新殖民主义,语言流失的速度加快。拯救和振兴濒危语言对维护我们星球的文化多样性变得非常重要。在这项工作中,我们侧重于讨论NLP如何帮助振兴濒危语言。我们首先建议三项原则,以帮助NLP的实践者促进与语言社区的相互了解与合作,我们讨论NLP有可能协助语言教育的三个方法。我们然后将Cerokee(一种严重危害的美国本土语言)作为案例研究来研究。在审查该语言的历史、语言特征和现有资源之后,我们(与Cherokee社区成员合作)达成了一些有意义的NLP实践者可以与社区伙伴合作的方法。我们建议了两种方法,用机器在Loop处理中丰富Cherokee语言资源,并讨论Cherokee社区人士感兴趣的若干NLP工具。我们希望我们的工作不仅向NLP社区介绍Cherokee、语言特征和现有资源,而且还为未来濒危语言工作提供灵感。我们一般代码和MESHI/REGHA。