This position paper discusses the problem of multilingual evaluation. Using simple statistics, such as average language performance, might inject linguistic biases in favor of dominant language families into evaluation methodology. We argue that a qualitative analysis informed by comparative linguistics is needed for multilingual results to detect this kind of bias. We show in our case study that results in published works can indeed be linguistically biased and we demonstrate that visualization based on URIEL typological database can detect it.
翻译:本立场文件讨论了多语种评估问题。使用普通语言表现等简单统计数据,可能会在评估方法中引入有利于主要语言家庭的语言偏见。我们认为,多语种结果需要用比较语言分析的质量分析来发现这种偏见。 我们在案例研究中显示,出版作品的结果确实可能在语言上带有偏见,我们证明基于URIEL类型数据库的可视化可以检测到这一点。