Taking advantage of computationally lightweight, but high-quality translators prompt consideration of new applications that address neglected languages. Locally run translators for less popular languages may assist data projects with protected or personal data that may require specific compliance checks before posting to a public translation API, but which could render reasonable, cost-effective solutions if done with an army of local, small-scale pair translators. Like handling a specialist's dialect, this research illustrates translating two historically interesting, but obfuscated languages: 1) hacker-speak ("l33t") and 2) reverse (or "mirror") writing as practiced by Leonardo da Vinci. The work generalizes a deep learning architecture to translatable variants of hacker-speak with lite, medium, and hard vocabularies. The original contribution highlights a fluent translator of hacker-speak in under 50 megabytes and demonstrates a generator for augmenting future datasets with greater than a million bilingual sentence pairs. The long short-term memory, recurrent neural network (LSTM-RNN) extends previous work demonstrating an English-to-foreign translation service built from as little as 10,000 bilingual sentence pairs. This work further solves the equivalent translation problem in twenty-six additional (non-obfuscated) languages and rank orders those models and their proficiency quantitatively with Italian as the most successful and Mandarin Chinese as the most challenging. For neglected languages, the method prototypes novel services for smaller niche translations such as Kabyle (Algerian dialect) which covers between 5-7 million speakers but one which for most enterprise translators, has not yet reached development. One anticipates the extension of this approach to other important dialects, such as translating technical (medical or legal) jargon and processing health records.
翻译:利用计算上轻轻的、但高质量的笔译员的优势,可以迅速考虑针对被忽视语言的新应用。 当地操作的低流行语言翻译员可以协助数据项目,提供受保护或个人数据的数据项目,在张贴到公共翻译 API 之前可能需要具体的合规性检查,但如果与当地小规模的一对笔译员进行翻译,则可以提供合理、具有成本效益的解决办法。 与处理专家方言一样,这项研究表明翻译两种历史上有趣但模糊的语言:(1) 黑客语(“l33t”)和(2) 反向(或“mirror” ), 以Leonardo da 芬奇实践的方式书写。 这项工作将深度学习结构概括为黑客语的可翻译变异版本,在张贴到公共翻译之前,如果使用精通、中、硬的词汇,则可以提供合理、成本效益合理的解决方案。 最具有挑战性的纸质翻译方式是: 最具有挑战性的纸质翻译的纸质翻译、最有挑战性的纸质翻译、最具有挑战性的纸质翻译方式的纸质翻译、最有20万种语言的纸质翻译。