Question answering (QA) systems have gained explosive attention in recent years. However, QA tasks in Vietnamese do not have many datasets. Significantly, there is mostly no dataset in the medical domain. Therefore, we built a Vietnamese Healthcare Question Answering dataset (ViHealthQA), including 10,015 question-answer passage pairs for this task, in which questions from health-interested users were asked on prestigious health websites and answers from highly qualified experts. This paper proposes a two-stage QA system based on Sentence-BERT (SBERT) using multiple negatives ranking (MNR) loss combined with BM25. Then, we conduct diverse experiments with many bag-of-words models to assess our system's performance. With the obtained results, this system achieves better performance than traditional methods.
翻译:近年来,问题解答(QA)系统引起了爆炸性的关注,然而,越南的质量解答(QA)系统并没有很多数据集。重要的是,医疗领域基本上没有数据集。 因此,我们建立了一个越南保健问题解答(ViHealthQA)数据集(ViHealthQA),其中包括10,015个关于这项任务的问答段落,其中在有声望的卫生网站上询问健康用户的问题,以及来自高素质专家的回答。本文件建议采用多重负值排名(MNR)损失与B25相结合的两阶段的质量解答(QA)系统(SBERT)系统(SBERT)系统。然后,我们用许多字包模型进行多种实验,以评估我们系统的业绩。通过取得的结果,该系统比传统方法取得更好的业绩。