项目名称: 基于生物医学文献的隐含知识发现方法研究
项目编号: No.61272373
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 杨志豪
作者单位: 大连理工大学
项目金额: 80万元
中文摘要: 以MEDLINE为代表的海量生物医学文献资源中,存在大量的蕴含于公开发表的文献,而尚未被人们认知的隐含知识。采用文本挖掘技术从这座宝库有效的提取隐含生物医学知识的需求变得非常迫切。本项目以MEDLINE的生物医学文献为主要数据源,在生物实体识别和名称标准化的基础上,利用UMLS、MeSH、GO、UniProt、DrugBank、KEGG Database等生物医学资源,建立基于生物医学概念的概念空间和基于概率潜在语义索引的潜在语义空间表示;利用信息抽取和段落检索技术进行医学概念共现识别;引入监督学习方法,选取包含潜在语义空间相似度、医学语义关联度和全局语料库统计量以及信息抽取的统计特征等多种特征来综合评定概念间的相关度量,最终获得生物医学文献中的隐含知识。此外,本项目将实现挖掘知识的可视化,建立具有实用价值的生物医学文献隐含知识发现平台。
中文关键词: 文本挖掘;实体识别;信息提取;隐含知识发现;生物信息学
英文摘要: In the huge biomedical literature storages represented by MEDLINE, vast hidden biomedical knowledge is remaining undiscovered in the large amount of data, which offeres an excellent opportunity for the discovery of these knowledge by applying text mining technologies.Based on biomedical entity recognition and name normalization, utlizing the biomedical resources such as UMLS, MESH, GO, UniProt, DrugBank, KEGG Database,etc., our project constructs the text reprsentatives of biomedical concept space and Latent Semantic space based on Probabilistic Latent Semantic Index for the biomedical literature in MEDLINE; detects the occurrence of biomedical concepts based on information extraction and passage retrieval technologies; introduces the supervised learning method and choose rich features including Latent Semantic and biomedical semantic relation similarity, global corpus and infromation extraction statistical information to evaluate the correlation between biomedical concepts and finally achieves the hidden knowledge in the literature. Our project also implements the visualization of the hidden knowledge discovery process and establishes the platform of valuable hidden knowledge discovery based on biomedical literature.
英文关键词: Text Mining;Entity Recognition;Information Extraction;Hidden Knowledge Discovery;Bioinformatics