Machine Translation (MT) system generally aims at automatic representation of source language into target language retaining the originality of context using various Natural Language Processing (NLP) techniques. Among various NLP methods, Statistical Machine Translation(SMT). SMT uses probabilistic and statistical techniques to analyze information and conversion. This paper canvasses about the development of bilingual SMT models for translating English to fifteen low-resource Indian Languages (ILs) and vice versa. At the outset, all 15 languages are briefed with a short description related to our experimental need. Further, a detailed analysis of Samanantar and OPUS dataset for model building, along with standard benchmark dataset (Flores-200) for fine-tuning and testing, is done as a part of our experiment. Different preprocessing approaches are proposed in this paper to handle the noise of the dataset. To create the system, MOSES open-source SMT toolkit is explored. Distance reordering is utilized with the aim to understand the rules of grammar and context-dependent adjustments through a phrase reordering categorization framework. In our experiment, the quality of the translation is evaluated using standard metrics such as BLEU, METEOR, and RIBES
翻译:一般而言,机器翻译系统(MT)旨在将源语言自动表述成目标语言,保留原始背景,使用各种自然语言处理技术(NLP),在各种国家语言处理方法中,统计机翻译(SMT) 统计机翻译(SMT) 。SMT使用概率和统计技术分析信息和转换。本文就开发双语SMT模型,将英语翻译为15种低资源印度语言(ILs)和反之亦然。在开始时,所有15种语言都有与我们实验需要有关的简短描述。此外,还详细分析用于模型建设的Samanantar和OPUS数据集,以及用于微调和测试的标准基准数据集(Flores-200),作为我们实验的一部分。本文提出了处理数据集噪音的不同预处理方法。为创建系统,MOSES开放源SMT工具包进行了探索。使用远程重新排序的目的是通过一个短语重新排序框架来理解语法调整和根据背景调整的规则。在我们的实验中,对翻译的质量进行了评估,使用标准的MSIB 和MELEU等标准矩阵。