We introduce ChrEnTranslate, an online machine translation demonstration system for translation between English and an endangered language Cherokee. It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability, two user feedback interfaces for experts and common users respectively, example inputs to collect human translations for monolingual data, word alignment visualization, and relevant terms from the Cherokee-English dictionary. The quantitative evaluation demonstrates that our backbone translation models achieve state-of-the-art translation performance and our quality estimation well correlates with both BLEU and human judgment. By analyzing 216 pieces of expert feedback, we find that NMT is preferable because it copies less than SMT, and, in general, current models can translate fragments of the source sentence but make major mistakes. When we add these 216 expert-corrected parallel texts into the training set and retrain models, equal or slightly better performance is observed, which demonstrates indicates the potential of human-in-the-loop learning. Our online demo is at https://chren.cs.unc.edu/; our code is open-sourced at https://github.com/ZhangShiyue/ChrEnTranslate; and our data is available at https://github.com/ZhangShiyue/ChrEn.
翻译:我们引入了ChrEntranslate, 这是一种用于英语和濒危语言Cherokee之间翻译的在线机器翻译演示系统,它支持统计和神经翻译模式,并提供质量估算,向用户通报可靠性、两个供专家和普通用户使用的用户反馈界面,例如收集人类翻译单语数据的投入、字对齐的视觉化以及切罗基-英语词典的相关术语。定量评估表明,我们的骨干翻译模式实现了最先进的翻译性能,我们的质量估算与BLEU和人类的判断密切相关。通过分析216项专家反馈,我们发现NMT更可取,因为它的复制量少于SMT,而且一般来说,目前的模型可以翻译源句的碎片,但犯重大错误。当我们将这些216项专家校正的平行文本加入培训组和再培训模型时,其性能相同或稍好一些,这显示了人类在网上学习的潜力。我们的在线演示在 https://chren.c.unc.edu/;我们的代码在 https://chrub/EnshembZ.