There is a large number of online documents data sources available nowadays. The lack of structure and the differences between formats are the main difficulties to automatically extract information from them, which also has a negative impact on its use and reuse. In the biomedical domain, the DISNET platform emerged to provide researchers with a resource to obtain information in the scope of human disease networks by means of large-scale heterogeneous sources. Specifically in this domain, it is critical to offer not only the information extracted from different sources, but also the evidence that supports it. This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations; with the objective of providing an schema to improve the publication and description of evidences and biomedical associations in this domain. The ontology has been successfully evaluated to ensure there are no errors, modelling pitfalls and that it meets the previously defined functional requirements. Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed according to the proposed ontology to create a Knowledge Graph that can be used in real scenarios, and which has also been used for the evaluation of the presented ontology.
翻译:现有大量在线文件数据源; 缺乏结构和格式之间的差异是自动从中获取信息的主要困难,这也对其使用和再利用产生了负面影响; 在生物医学领域,DISNET平台的出现是为了向研究人员提供资源,以便通过大规模多种来源获得人类疾病网络范围内的信息; 具体来说,在这一领域,不仅必须提供从不同来源提取的信息,而且必须提供支持这些信息的证据; 本文提议EBOCA,即说明(一) 生物医学领域概念和它们之间的关联的本体学,以及(二) 支持这些协会的证据; 目的是提供一个计划,以改进该领域证据和生物医学协会的出版和描述; 对本体学进行了成功的评估,以确保不存在错误,模拟陷阱,并确保它符合先前界定的功能要求; 测试数据来自DISNET的一个子集,以及从文本中自动提取的关联数据,已经根据拟议的本体学改编成一个知识图表,可用于真实的情景,并且也用于对本体学进行评估。