As Large Language Models (LLMs) are increasingly popularized in the multilingual world, ensuring hallucination-free factuality becomes markedly crucial. However, existing benchmarks for evaluating the reliability of Multimodal Large Language Models (MLLMs) predominantly focus on textual or visual modalities with a primary emphasis on English, which creates a gap in evaluation when processing multilingual input, especially in speech. To bridge this gap, we propose a novel Cross-lingual and Cross-modal Factuality benchmark (CCFQA). Specifically, the CCFQA benchmark contains parallel speech-text factual questions across 8 languages, designed to systematically evaluate MLLMs' cross-lingual and cross-modal factuality capabilities. Our experimental results demonstrate that current MLLMs still face substantial challenges on the CCFQA benchmark. Furthermore, we propose a few-shot transfer learning strategy that effectively transfers the Question Answering (QA) capabilities of LLMs in English to multilingual Spoken Question Answering (SQA) tasks, achieving competitive performance with GPT-4o-mini-Audio using just 5-shot training. We release CCFQA as a foundational research resource to promote the development of MLLMs with more robust and reliable speech understanding capabilities. Our code and dataset are available at https://github.com/yxduir/ccfqa.


翻译:随着大语言模型(LLMs)在多语言环境中的日益普及,确保无幻觉的事实性变得尤为关键。然而,现有评估多模态大语言模型(MLLMs)可靠性的基准主要集中于文本或视觉模态,且以英语为主要关注点,这在处理多语言输入(尤其是语音)时形成了评估空白。为填补这一空白,我们提出了一个新颖的跨语言与跨模态事实性基准(CCFQA)。具体而言,CCFQA基准包含涵盖8种语言的平行语音-文本事实性问题,旨在系统评估MLLMs的跨语言与跨模态事实性能力。我们的实验结果表明,当前MLLMs在CCFQA基准上仍面临显著挑战。此外,我们提出了一种少样本迁移学习策略,能够有效将LLMs在英语中的问答(QA)能力迁移至多语言口语问答(SQA)任务,仅通过5样本训练即可达到与GPT-4o-mini-Audio相竞争的性能。我们发布CCFQA作为基础研究资源,以促进具备更鲁棒可靠语音理解能力的MLLMs的发展。我们的代码与数据集可在 https://github.com/yxduir/ccfqa 获取。

0
下载
关闭预览

相关内容

自动问答(Question Answering, QA)是指利用计算机自动回答用户所提出的问题以满足用户知识需求的任务。不同于现有搜索引擎,问答系统是信息服务的一种高级形式,系统返回用户的不再是基于关键词匹配排序的文档列表,而是精准的自然语言答案。近年来,随着人工智能的飞速发展,自动问答已经成为倍受关注且发展前景广泛的研究方向。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等
Top
微信扫码咨询专知VIP会员