Gene/protein interactions provide critical information for a thorough understanding of cellular processes. Recently, considerable interest and effort has been focused on the construction and analysis of genome-wide gene networks. The large body of biomedical literature is an important source of gene/protein interaction information. Recent advances in text mining tools have made it possible to automatically extract such documented interactions from free-text literature. In this paper, we propose a comprehensive framework for constructing and analyzing large-scale gene functional networks based on the gene/protein interactions extracted from biomedical literature repositories using text mining tools. Our proposed framework consists of analyses of the network topology, network topology-gene function relationship, and temporal network evolution to distill valuable information embedded in the gene functional interactions in literature. We demonstrate the application of the proposed framework using a testbed of P53-related PubMed abstracts, which shows that literature-based P53 networks exhibit small-world and scale-free properties. We also found that high degree genes in the literature-based networks have a high probability of appearing in the manually curated database and genes in the same pathway tend to form local clusters in our literature-based networks. Temporal analysis showed that genes interacting with many other genes tend to be involved in a large number of newly discovered interactions.
翻译:最近,大量生物医学文献是基因/蛋白相互作用的重要信息来源。最近,文本采矿工具方面的进展使得能够从自由文本文献中自动提取这种有文件记载的相互作用。在本文件中,我们提出了一个根据利用文本采矿工具从生物医学文献库提取的基因/蛋白相互作用建立和分析大规模基因功能网络的全面框架。我们提议的框架包括分析网络地形、网络表层-基因功能关系和时间网络演变,以提取遗传功能相互作用在文献中所包含的宝贵信息。我们用P53相关PubMed摘要的测试台展示了拟议框架的应用情况,这表明基于文献的P53网络具有小世界和规模自由的特性。我们还发现,基于文献的网络中的高度基因极有可能出现在手工整理的数据库中,同一路径的基因往往形成我们基于文献的网络中的本地集群。我们新发现的基因互动中,大量基因与其它基因相互作用。