In this paper we describe our efforts to make a bidirectional Congolese Swahili (SWC) to French (FRA) neural machine translation system with the motivation of improving humanitarian translation workflows. For training, we created a 25,302-sentence general domain parallel corpus and combined it with publicly available data. Experimenting with low-resource methodologies like cross-dialect transfer and semi-supervised learning, we recorded improvements of up to 2.4 and 3.5 BLEU points in the SWC-FRA and FRA-SWC directions, respectively. We performed human evaluations to assess the usability of our models in a COVID-domain chatbot that operates in the Democratic Republic of Congo (DRC). Direct assessment in the SWC-FRA direction demonstrated an average quality ranking of 6.3 out of 10 with 75% of the target strings conveying the main message of the source text. For the FRA-SWC direction, our preliminary tests on post-editing assessment showed its potential usefulness for machine-assisted translation. We make our models, datasets containing up to 1 million sentences, our development pipeline, and a translator web-app available for public use.
翻译:在本文中,我们描述了我们为使刚果斯瓦希里双向斯瓦希里语(SWC)向法兰西语(FRA)神经机器翻译系统(SWC)做双向转换的努力,其动机是改善人道主义翻译工作流程。在培训方面,我们创建了25,302个普通领域平行体,并将它与公开的数据结合起来。实验了跨对流传输和半监督学习等低资源方法,我们记录了分别在SWC-FRA和FRA-SWC方向上达到2.4和3.5个BLEU点的改进。我们进行了人类评估,以评估我们模型在刚果民主共和国境内运行的COVID-Domain聊天室(DRC)中的可用性。对SWC-FRA方向的直接评估显示,平均质量排名为6.3,10分之6.3,目标链75%传达源文的主要信息。关于FRASWC方向,我们关于编辑后评估的初步试验显示,它对机器辅助翻译的潜在作用。我们制作了模型、数据集,包含多达100万句子、我们的发展管道和翻译网络应用程序供公众使用。