Structured and unstructured data and facts about drugs, genes, protein, viruses, and their mechanism are spread across a huge number of scientific articles. These articles are a large-scale knowledge source and can have a huge impact on disseminating knowledge about the mechanisms of certain biological processes. A domain-specific knowledge graph~(KG) is an explicit conceptualization of a specific subject-matter domain represented w.r.t semantically interrelated entities and relations. A KG can be constructed by integrating such facts and data and be used for data integration, exploration, and federated queries. However, exploration and querying large-scale KGs is tedious for certain groups of users due to a lack of knowledge about underlying data assets or semantic technologies. Such a KG will not only allow deducing new knowledge and question answering(QA) but also allows domain experts to explore. Since cross-disciplinary explanations are important for accurate diagnosis, it is important to query the KG to provide interactive explanations about learned biomarkers. Inspired by these, we construct a domain-specific KG, particularly for cancer-specific biomarker discovery. The KG is constructed by integrating cancer-related knowledge and facts from multiple sources. First, we construct a domain-specific ontology, which we call OncoNet Ontology (ONO). The ONO ontology is developed to enable semantic reasoning for verification of the predictions for relations between diseases and genes. The KG is then developed and enriched by harmonizing the ONO, additional metadata schemas, ontologies, controlled vocabularies, and additional concepts from external sources using a BERT-based information extraction method. BioBERT and SciBERT are finetuned with the selected articles crawled from PubMed. We listed down some queries and some examples of QA and deducing knowledge based on the KG.
翻译:有关药物、基因、蛋白质、病毒及其机制的结构性和非结构性数据及事实,在大量科学文章中传播。这些文章是一个大规模的知识来源,对传播有关某些生物过程机制的知识具有巨大影响。一个特定领域的知识图~(KG)是一个清晰的概念化的,代表了某种特定的主题-事项领域,代表了W.r.t 语义相互关联的实体和关系。一个KG可以通过整合这些事实和数据来构建,并用于数据整合、探索和联合生物查询。然而,大规模KG的探索和查询对某些用户群体来说是乏味的,因为缺乏关于数据资产或语义技术的基础知识。这样一个特定领域的知识图集不仅能够激发新的知识和问题回答(QA),而且还允许域专家进行探索。由于跨学科的解释对于准确的诊断很重要,因此必须让KGG对所学生物标志进行互动解释。根据这些,我们从一个特定领域的KG, 特别是用于精细的GG, 用于精密生物标定的GO, 用于我们所开发的SO 的理论和O 的域域域系的理论研究。