St. Lawrence Island Yupik (ISO 639-3: ess) is an endangered polysynthetic language in the Inuit-Yupik language family indigenous to Alaska and Chukotka. This work presents a step-by-step pipeline for the digitization of written texts, and the first publicly available digital corpus for St. Lawrence Island Yupik, created using that pipeline. This corpus has great potential for future linguistic inquiry and research in NLP. It was also developed for use in Yupik language education and revitalization, with a primary goal of enabling easy access to Yupik texts by educators and by members of the Yupik community. A secondary goal is to support development of language technology such as spell-checkers, text-completion systems, interactive e-books, and language learning apps for use by the Yupik community.
翻译:圣劳伦斯岛尤比克语(ISO 639-3: ess)是阿拉斯加和楚科特卡土著因努伊特-尤皮克语家庭的一种濒危综合合成语言,这项工作是书面文本数字化的分步骤管道,是圣劳伦斯岛尤比克第一个公开使用的数字资料库,利用该管道为圣劳伦斯岛尤皮克创建了这一平台,该平台对今后国家语言方案的语言调查和研究具有巨大潜力。 该平台还开发用于尤皮克语教育和振兴,主要目标是使教育工作者和尤皮克社区成员能够方便地获取尤皮克语文本。一个次级目标是支持开发语言技术,如拼写检查器、文本完成系统、交互式电子书籍和供尤皮克社区使用的语言学习应用软件。