We demonstrate an end-to-end question answering system that integrates BERT with the open-source Anserini information retrieval toolkit. In contrast to most question answering and reading comprehension models today, which operate over small amounts of input text, our system integrates best practices from IR with a BERT-based reader to identify answers from a large corpus of Wikipedia articles in an end-to-end fashion. We report large improvements over previous results on a standard benchmark test collection, showing that fine-tuning pretrained BERT with SQuAD is sufficient to achieve high accuracy in identifying answer spans.
翻译:我们展示了一个端到端问答系统,它将BERT与开放源码 Anserini 信息检索工具包相结合。 与当今大多数问题解答和阅读理解模型相比,我们的系统将IR的最佳做法与基于BERT的读者整合在一起,以便以端到端的方式从大量维基百科文章中找到答案。 我们报告标准基准测试集比以往的结果大有改进,显示经过精细调整的BERT与SQUAD的预先培训BERT足以在确定答案范围方面实现高度准确性。