For the last two years, from 2020 to 2021, COVID-19 has broken disease prevention measures in many countries, including Vietnam, and negatively impacted various aspects of human life and the social community. Besides, the misleading information in the community and fake news about the pandemic are also serious situations. Therefore, we present the first Vietnamese community-based question answering dataset for developing question answering systems for COVID-19 called UIT-ViCoV19QA. The dataset comprises 4,500 question-answer pairs collected from trusted medical sources, with at least one answer and at most four unique paraphrased answers per question. Along with the dataset, we set up various deep learning models as baseline to assess the quality of our dataset and initiate the benchmark results for further research through commonly used metrics such as BLEU, METEOR, and ROUGE-L. We also illustrate the positive effects of having multiple paraphrased answers experimented on these models, especially on Transformer - a dominant architecture in the field of study.
翻译:在过去的两年里,从2020年到2021年,COVID-19在包括越南在内的许多国家打破了疾病预防措施,并对人类生活和社会社会的各个方面产生了负面影响。此外,社区的误导信息和有关这一流行病的假消息也是严重的情况。因此,我们提出了第一个越南社区问答数据集,用于开发称为UIT-ViCoV19QA的COVID-19问答系统。该数据集包括从可靠医疗来源收集的4 500对问答,每个问题至少有一个答案,最多有四个独特的解答。除了数据集之外,我们还建立了各种深层次学习模型作为基线,评估我们数据集的质量,并通过常用的计量标准,如BLEU、METEOR和ROUGE-L,启动进一步研究的基准结果。我们还说明了对这些模型进行多处解答实验的积极效果,特别是对变换器——研究领域的主导结构。