Code-mixing is a phenomenon of mixing words and phrases from two or more languages in a single utterance of speech and text. Due to the high linguistic diversity, code-mixing presents several challenges in evaluating standard natural language generation (NLG) tasks. Various widely popular metrics perform poorly with the code-mixed NLG tasks. To address this challenge, we present a metric independent evaluation pipeline MIPE that significantly improves the correlation between evaluation metrics and human judgments on the generated code-mixed text. As a use case, we demonstrate the performance of MIPE on the machine-generated Hinglish (code-mixing of Hindi and English languages) sentences from the HinGE corpus. We can extend the proposed evaluation strategy to other code-mixed language pairs, NLG tasks, and evaluation metrics with minimal to no effort.
翻译:代码混合是一种将两种或两种以上语言的词句和短语混为一谈的现象。由于语言的多样性很高,代码混合在评估标准自然语言生成任务(NLG)方面存在若干挑战。各种广受欢迎的指标与代码混合的NLG任务表现不佳。为了应对这一挑战,我们提出了一个通用独立评价管道MIPE, 大大改善了评价指标与对生成代码混合文本的人类判断之间的相互关系。作为一个使用的例子,我们用HinGE文中机器生成的Hinglish(印地语和英语混合编码)的功能展示了MIPE在机器生成的Hinglish(印地语和英语混合编码)句中的性能。我们可以将拟议评价战略推广到其他代码混合语言配对、NLG任务和评价指标,但不会做任何努力。