This paper introduces FrenchMedMCQA, the first publicly available Multiple-Choice Question Answering (MCQA) dataset in French for medical domain. It is composed of 3,105 questions taken from real exams of the French medical specialization diploma in pharmacy, mixing single and multiple answers. Each instance of the dataset contains an identifier, a question, five possible answers and their manual correction(s). We also propose first baseline models to automatically process this MCQA task in order to report on the current performances and to highlight the difficulty of the task. A detailed analysis of the results showed that it is necessary to have representations adapted to the medical domain or to the MCQA task: in our case, English specialized models yielded better results than generic French ones, even though FrenchMedMCQA is in French. Corpus, models and tools are available online.
翻译:本文介绍了FrenchMedMCQA,这是第一个公开的用于医学领域的法语多项选择问题回答(MCQA)数据集。该数据集由3105个真实的法国医学专业文凭考试题目组成,包括单选和多选。每个实例包含一个标识符、一个问题、五个可能的答案及其手动纠正。我们还提出了第一个基础模型,以自动处理这个MCQA任务,以报告当前的性能并突出任务的难度。对结果的详细分析表明,需要具有适应于医学领域或MCQA任务的表示形式:在我们的情况下,英语专业模型的表现优于通用的法语模型,即使FrenchMedMCQA是用法语编写的。该数据集、模型和工具均已在线上公开。