Open book question answering is a subset of question answering tasks where the system aims to find answers in a given set of documents (open-book) and common knowledge about a topic. This article proposes a solution for answering natural language questions from a corpus of Amazon Web Services (AWS) technical documents with no domain-specific labeled data (zero-shot). These questions can have yes-no-none answers, short answers, long answers, or any combination of the above. This solution comprises a two-step architecture in which a retriever finds the right document and an extractor finds the answers in the retrieved document. We are introducing a new test dataset for open-book QA based on real customer questions on AWS technical documentation. After experimenting with several information retrieval systems and extractor models based on extractive language models, the solution attempts to find the yes-no-none answers and text answers in the same pass. The model is trained on the The Stanford Question Answering Dataset - SQuAD (Rajpurkaret al., 2016) and Natural Questions (Kwiatkowski et al., 2019) datasets. We were able to achieve 49% F1 and 39% exact match score (EM) end-to-end with no domain-specific training.
翻译:开放书解答是一个解答问题的子集, 系统的目的是在特定文件集中找到答案( 公开书), 并获得关于某个主题的常见知识。 文章建议了一种解答亚马逊网络服务( AWS) 技术文件汇编中自然语言问题的解决方案, 没有特定域标签数据( 零点 ) 。 这些问题可以有 " 无 " 回答、 简短回答、 长解答或上述任何组合。 这个解答包含一个两步结构, 让检索者找到正确的文档, 并在检索的文档中找到解答。 我们正在为基于 AWS 技术文件上真实客户问题的开放书 QA 引入一个新的测试数据集。 在实验了几个基于采掘语言模型的信息检索系统和提取模型( 零点 ) 的解答和文本解答 。 模型在斯坦福解解答数据集- SQuAD( Rajpurkaret al., 2016) 和自然问答( Kwiatkowski et al., 2019 et al. ) 和 Frendset- riet- riet- riet- riet- riet