项目名称: 基于深度学习的乳腺癌分子生物信息的文本挖掘研究
项目编号: No.61502243
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 计算机科学学科
项目作者: 龚乐君
作者单位: 南京邮电大学
项目金额: 21万元
中文摘要: 乳腺癌严重威胁着世界范围内女性的健康,是现代生物医学亟需解决的难题,与之相关的分子生物信息成为研究的一个突破口。随着信息技术的发展,不断涌现出来的乳腺癌研究中的新成果及新发现大多以电子文献形式发布出来,携带着大量的分子生物信息,对这些文献进行挖掘,可提炼丰富的乳腺癌分子生物信息,发现新的生物医学知识,从而有助于理解乳腺癌的发生机制。在这一背景下,本课题研究(1)乳腺癌静态分子生物信息的识别,发展一种基于深度学习与本体相结合的方法识别文本中的分子实体;(2)乳腺癌动态分子生物信息的抽取,提出一种基于统计推理与深层语言剖析相结合的方法,定量分析分子实体间的关系,明确乳腺癌分子功能信息,揭示乳腺癌的分子机制。(3)将抽取的分子生物信息与收集的乳腺癌文献结合构建语料库,丰富生物医学语料库的建设,同时建立乳腺癌分子生物信息平台,形成乳腺癌生物医学知识库。
中文关键词: 生物信息;海量数据;生物医学文本挖掘;深度学习;乳腺癌
英文摘要: Breast cancer is a problem to be solved urgently in biomedine, which severely threatens the health of the worldwide women. Molecular bioinformation become a breakthrough in the study of breast cancer area. Along with the development of information technology, the latest achievements and discoveries in breast cancer research are mostly emerging from the published journals by electronic text form. This could carry a lot of molecular bio-information which make that text mining has great potential in breast cancer research. Mining the biomedical literature could extract some rich molecular biology information of breast cancer, and discover new biomedical knowledge. Based on the background, this research mainly includes the following: (1) recognition of static molecular biology information of breast cancer, and an identified approach based on deep learning and ontologies for recognizing molecular biology entities associated with breast cancer from literature; (2) extraction of dynamic molecular biology information of breast cancer, and an extracted approach based on statistical reasoning and deep language parsing for the quantitative and detail relationships between molecular entities to reveal the molecular mechanisms of breast cancer. (3) corpus construction based on the extracted molecular bio-information and collected breast cancer literature for enriching biomedical corpora, and development of molecular biology information platform of breast cancer for biomedical knowledge base.
英文关键词: bioinformation;big data;biomedical text mining;deep learning;breast cancer