Question Answering (QA) is a task in which a machine understands a given document and a question to find an answer. Despite impressive progress in the NLP area, QA is still a challenging problem, especially for non-English languages due to the lack of annotated datasets. In this paper, we present the Japanese Question Answering Dataset, JaQuAD, which is annotated by humans. JaQuAD consists of 39,696 extractive question-answer pairs on Japanese Wikipedia articles. We finetuned a baseline model which achieves 78.92% for F1 score and 63.38% for EM on test set. The dataset and our experiments are available at https://github.com/SkelterLabsInc/JaQuAD.
翻译:问题解答( QA) 是一项任务, 机器可以理解给定的文件, 也可以找到答案。 尽管在NLP领域取得了令人印象深刻的进展, QA仍然是一个棘手的问题, 特别是因为缺少附加说明的数据集, 特别是对于非英语语言来说。 在本文中, 我们介绍日本问题解答数据集, JaQAD, 由人类附加说明。 JaQOAD 包括39, 696对日本维基百科文章的抽取问答。 我们微调了一个基准模型, F1分达到78.92%, 测试集的EM达到63.38%。 数据集和我们的实验可以在 https:// github.com/SkelterLabsInc/JaQUAD上查阅 。