The machine translation mechanism translates texts automatically between different natural languages, and Neural Machine Translation (NMT) has gained attention for its rational context analysis and fluent translation accuracy. However, processing low-resource languages that lack relevant training attributes like supervised data is a current challenge for Natural Language Processing (NLP). We incorporated a technique known Active Learning with the NMT toolkit Joey NMT to reach sufficient accuracy and robust predictions of low-resource language translation. With active learning, a semi-supervised machine learning strategy, the training algorithm determines which unlabeled data would be the most beneficial for obtaining labels using selected query techniques. We implemented two model-driven acquisition functions for selecting the samples to be validated. This work uses transformer-based NMT systems; baseline model (BM), fully trained model (FTM) , active learning least confidence based model (ALLCM), and active learning margin sampling based model (ALMSM) when translating English to Hindi. The Bilingual Evaluation Understudy (BLEU) metric has been used to evaluate system results. The BLEU scores of BM, FTM, ALLCM and ALMSM systems are 16.26, 22.56 , 24.54, and 24.20, respectively. The findings in this paper demonstrate that active learning techniques helps the model to converge early and improve the overall quality of the translation system.
翻译:机器翻译机制在各种自然语言之间自动翻译文本,神经机器翻译(NMT)得到合理的背景分析和流畅翻译准确性方面的注意;然而,处理缺乏相关培训属性的低资源语言(如受监督的数据)对自然语言处理(NLP)目前是一项挑战。我们采用了与NMT工具包Joey NMT一起的已知积极学习技术,以达到足够的准确性和对低资源语言翻译的可靠预测。随着积极的学习,半监督的机器学习战略,培训算法决定了哪些未贴标签的数据最有利于使用选定的查询技术获取标签。我们实施了两种模式驱动的获取功能,用于选择要验证的样本。这项工作使用了基于变压器的NMT系统;基线模型(BM)、充分培训的模型(FTM)、积极学习基于信任度最低的模型(ALMM)和在将英语翻译成印地语时积极学习边际抽样模型(ALMSMM)。双语评价基础测试(BLEU)用于评价系统结果。BM、FTM、ALMMM和ALMSM系统(ALM)的BS)的分评分数将分别用于16.26、22.56和22.LS的整文件。