爱沙尼亚语WinoGrande数据集：大型语言模型在人工与机器翻译上的性能对比分析 (Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and Machine Translation)

In this paper, we present a localized and culturally adapted Estonian translation of the test set from the widely used commonsense reasoning benchmark, WinoGrande. We detail the translation and adaptation process carried out by translation specialists and evaluate the performance of both proprietary and open source models on the human translated benchmark. Additionally, we explore the feasibility of achieving high-quality machine translation by incorporating insights from the manual translation process into the design of a detailed prompt. This prompt is specifically tailored to address both the linguistic characteristics of Estonian and the unique translation challenges posed by the WinoGrande dataset. Our findings show that model performance on the human translated Estonian dataset is slightly lower than on the original English test set, while performance on machine-translated data is notably worse. Additionally, our experiments indicate that prompt engineering offers limited improvement in translation quality or model accuracy, and highlight the importance of involving language specialists in dataset translation and adaptation to ensure reliable and interpretable evaluations of language competency and reasoning in large language models.

翻译：本文针对广泛使用的常识推理基准测试集WinoGrande，提出了一个经过本地化与文化适配的爱沙尼亚语翻译版本。我们详细阐述了由专业翻译人员执行的翻译与适配流程，并评估了专有模型与开源模型在人工翻译基准上的表现。此外，我们通过将人工翻译过程中的洞见融入详细提示的设计，探索了实现高质量机器翻译的可行性。该提示特别针对爱沙尼亚语的语言特性及WinoGrande数据集带来的独特翻译挑战进行定制。研究结果表明，模型在人工翻译的爱沙尼亚语数据集上的表现略低于原始英语测试集，而在机器翻译数据上的表现显著更差。同时，实验显示提示工程对翻译质量或模型准确性的提升有限，并强调了在数据集翻译与适配中引入语言专家的必要性，以确保对大型语言模型的语言能力与推理进行可靠且可解释的评估。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

带入您自己的知识：大型语言模型（LLM）知识扩展方法综述

专知会员服务

38+阅读 · 2月21日

多语言大型语言模型：资源、分类和前沿综述

专知会员服务

53+阅读 · 2024年4月9日

复杂的序列数据分析：现有算法的系统文献综述，Complex Sequential Data Analysis: A Systematic Literature Review of Existing Algorithms

专知会员服务

27+阅读 · 2020年7月24日

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日