The objective of automated Question Answering (QA) systems is to provide answers to user queries in a time efficient manner. The answers are usually found in either databases (or knowledge bases) or a collection of documents commonly referred to as the corpus. In the past few decades there has been a proliferation of acquisition of knowledge and consequently there has been an exponential growth in new scientific articles in the field of biomedicine. Therefore, it has become difficult to keep track of all the information in the domain, even for domain experts. With the improvements in commercial search engines, users can type in their queries and get a small set of documents most relevant for answering their query, as well as relevant snippets from the documents in some cases. However, it may be still tedious and time consuming to manually look for the required information or answers. This has necessitated the development of efficient QA systems which aim to find exact and precise answers to user provided natural language questions in the domain of biomedicine. In this paper, we introduce the basic methodologies used for developing general domain QA systems, followed by a thorough investigation of different aspects of biomedical QA systems, including benchmark datasets and several proposed approaches, both using structured databases and collection of texts. We also explore the limitations of current systems and explore potential avenues for further advancement.
翻译:自动问答(QA)系统的目标是以高效的时间为用户查询提供答案,答案通常在数据库(或知识库)或通常称为“本体”的文件集中找到,在过去几十年里,知识的获取激增,生物医学领域新的科学文章也随之急剧增加,因此,很难跟踪该领域的所有信息,甚至对域内专家来说也是如此。随着商业搜索引擎的改进,用户可以在查询中输入查询,并获得与回答其查询最相关的一小套文件,以及某些情况下从文件中找到的有关片段。然而,人工搜索所需信息或答案可能仍然很乏味,而且耗费时间,这就需要开发高效的“本体”系统,以便找到准确和准确的答案,让用户在生物医学领域提供自然语言问题。在本文件中,用户可以输入用于开发通用的“QA”系统的基本方法,然后对生物医学问答系统的不同方面进行彻底调查,包括对现有数据库和各种潜在进展途径进行基准化研究,同时利用我们现有的数据库和各种潜在进展途径进行进一步的探索。