Previous work suggests that performance of cross-lingual information retrieval correlates highly with the quality of Machine Translation. However, there may be a threshold beyond which improving query translation quality yields little or no benefit to further improve the retrieval performance. This threshold may depend upon multiple factors including the source and target languages, the existing MT system quality and the search pipeline. In order to identify the benefit of improving an MT system for a given search pipeline, we investigate the sensitivity of retrieval quality to the presence of different levels of MT quality using experimental datasets collected from actual traffic. We systematically improve the performance of our MT systems quality on language pairs as measured by MT evaluation metrics including Bleu and Chrf to determine their impact on search precision metrics and extract signals that help to guide the improvement strategies. Using this information we develop techniques to compare query translations for multiple language pairs and identify the most promising language pairs to invest and improve.
翻译:以往的工作表明,跨语文信息检索的进行与机器翻译的质量密切相关,然而,可能有一个门槛,超过这一门槛,提高查询翻译质量几乎或无益于进一步提高检索绩效,这一门槛可能取决于多种因素,包括源和目标语言、现有的MT系统质量和搜索管道。为了确定改进某一搜索管道的MT系统的好处,我们利用从实际交通中收集的实验数据集,调查检索质量对不同水平MT质量的存在是否敏感。我们系统地改进了我们的MT系统对口语言质量的绩效,通过MT评价指标(包括Bleu和Chrf)衡量,以确定其对搜索精确度量度的影响,并提取有助于指导改进战略的信号。我们利用这一信息开发各种技术,对多种语文对口的翻译进行比较,并确定投资和改进的最有希望的语言对口语。