We introduce ChrEnTranslate, an online machine translation demonstration system for translation between English and an endangered language Cherokee. It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability, two user feedback interfaces for experts and common users respectively, example inputs to collect human translations for monolingual data, word alignment visualization, and relevant terms from the Cherokee-English dictionary. The quantitative evaluation demonstrates that our backbone translation models achieve state-of-the-art translation performance and our quality estimation well correlates with both BLEU and human judgment. By analyzing 216 pieces of expert feedback, we find that NMT is preferable because it copies less than SMT, and, in general, current models can translate fragments of the source sentence but make major mistakes. When we add these 216 expert-corrected parallel texts back into the training set and retrain models, equal or slightly better performance is observed, which indicates the potential of human-in-the-loop learning. Our online demo is at https://chren.cs.unc.edu/ , our code is open-sourced at https://github.com/ZhangShiyue/ChrEnTranslate , and our data is available at https://github.com/ZhangShiyue/ChrEn
翻译:我们引入了ChrEntranslate,这是一个用于英语和濒危语言Cherokee之间翻译的在线机器翻译演示系统,它支持统计和神经翻译模式,并提供质量估算,向用户通报可靠性、专家和普通用户的两种用户反馈界面,为收集人类翻译单语数据提供实例投入,对单语数据进行词校准视觉化,以及切罗基-英语词典的相关术语。定量评估表明,我们的骨干翻译模式实现了最新翻译业绩,我们的质量估算与BLEU和人类判断都非常相关。通过分析216项专家反馈,我们发现NMT更可取,因为它的拷贝少于SMT,而且一般来说,目前的模型可以翻译源句的碎片,但犯重大错误。当我们将这些216项专家校正的平行文本重新纳入培训组合和再培训模式时,我们观察到了相同或稍好的业绩,这表明人类在网上学习的潜力。我们的在线演示在https://chren.c.unc.edu/,我们的代码是在https://chrub/Enshemb/Zreaddal。