Coronavirus disease (COVID-19) is an infectious disease, which is caused by the SARS-CoV-2 virus. Due to the growing literature on COVID-19, it is hard to get precise, up-to-date information about the virus. Practitioners, front-line workers, and researchers require expert-specific methods to stay current on scientific knowledge and research findings. However, there are a lot of research papers being written on the subject, which makes it hard to keep up with the most recent research. This problem motivates us to propose the design of the COVID-19 Search Engine (CO-SE), which is an algorithmic system that finds relevant documents for each query (asked by a user) and answers complex questions by searching a large corpus of publications. The CO-SE has a retriever component trained on the TF-IDF vectorizer that retrieves the relevant documents from the system. It also consists of a reader component that consists of a Transformer-based model, which is used to read the paragraphs and find the answers related to the query from the retrieved documents. The proposed model has outperformed previous models, obtaining an exact match ratio score of 71.45% and a semantic answer similarity score of 78.55%. It also outperforms other benchmark datasets, demonstrating the generalizability of the proposed approach.
翻译:科罗纳病毒(COVID-19)是一种传染病,是由SARS-CoV-2病毒引起的传染病。由于COVID-19的文献越来越多,很难获得有关病毒的准确的最新信息。从业者、一线工作者和研究人员需要专家特有的方法来保持科学知识和研究结果。然而,关于这个主题正在编写许多研究文件,因此很难跟上最新的研究。这个问题促使我们提议设计COVID-19搜索引擎(CO-SE),这是一个算法系统,为每个查询(用户询问)找到相关文件,并通过搜索大量出版物回答复杂的问题。从系统检索有关文件的TF-IDF载体,COSE有一个检索器组件,由基于变压器的模型组成,用来阅读段落,并从检索的文件中找到与查询有关的答案。拟议的模型优于以前的模型,获得了78比率的精确比比率,还展示了为75 %的一般数据评分率,还展示了为78比率,还展示了其他的精确比值。