为数字时代准备一种濒危语言:犹太-西班牙语案例 (Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish)

We develop machine translation and speech synthesis systems to complement the efforts of revitalizing Judeo-Spanish, the exiled language of Sephardic Jews, which survived for centuries, but now faces the threat of extinction in the digital age. Building on resources created by the Sephardic community of Turkey and elsewhere, we create corpora and tools that would help preserve this language for future generations. For machine translation, we first develop a Spanish to Judeo-Spanish rule-based machine translation system, in order to generate large volumes of synthetic parallel data in the relevant language pairs: Turkish, English and Spanish. Then, we train baseline neural machine translation engines using this synthetic data and authentic parallel data created from translations by the Sephardic community. For text-to-speech synthesis, we present a 3.5 hour single speaker speech corpus for building a neural speech synthesis engine. Resources, model weights and online inference engines are shared publicly.

翻译：我们开发了机器翻译和语言合成系统,以补充振兴犹太裔西班牙裔犹太裔被流放的语言 -- -- 犹太裔犹太裔犹太人 -- -- 的努力,这些语言已存在几个世纪,但如今面临在数字时代灭绝的威胁。我们利用土耳其和其他地方的土裔犹太裔社区创造的资源,创建了公司和工具,帮助为后代保护这一语言。对于机器翻译,我们首先开发了西班牙语到犹太裔西班牙裔基于规则的机器翻译系统,以便在土耳其语、英语和西班牙语等相关语言中生成大量合成平行数据。然后,我们利用这一合成数据以及来自赛法裔社区翻译的真实平行数据,对神经机器翻译引擎进行了培训。关于文本到语音合成,我们提出了一个3.5小时的单一语音语音材料,用于建设神经语音合成引擎。资源、模型重量和在线推断引擎是公开共享的。

相关内容

Machine Translation

关注 209

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日