Retrieval based open-domain QA systems use retrieved documents and answer-span selection over retrieved documents to find best-answer candidates. We hypothesize that multilingual Question Answering (QA) systems are prone to information inconsistency when it comes to documents written in different languages, because these documents tend to provide a model with varying information about the same topic. To understand the effects of the biased availability of information and cultural influence, we analyze the behavior of multilingual open-domain question answering models with a focus on retrieval bias. We analyze if different retriever models present different passages given the same question in different languages on TyDi QA and XOR-TyDi QA, two multilingualQA datasets. We speculate that the content differences in documents across languages might reflect cultural divergences and/or social biases.
翻译:检索基于开放域域的 QA 系统使用检索到的文档和对检索到的文档进行答题选择,以找到最佳的回答候选人。我们假设多语种问题解答(QA)系统在以不同语言撰写的文件时容易出现信息不一致的情况,因为这些文件往往为同一主题提供不同信息的模型。为了了解信息和文化影响提供偏差的影响,我们分析了多语种开放域问题解答模型的行为,重点是检索偏差。我们分析不同的检索器模型是否在不同语言的Tydi QA和 XOR-Tydi QA这两个多语种QA数据集中提出了相同问题的不同段落。我们推测,不同语言文件的内容差异可能反映文化差异和/或社会偏见。