项目名称: 面向基因组相关性研究的迁移学习理论与方法
项目编号: No.11471256
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 数理科学和化学
项目作者: 李丽敏
作者单位: 西安交通大学
项目金额: 70万元
中文摘要: 基因组相关性研究是近年来生物信息中的一个研究热点,其主要目的是在特定物种的整个基因组上寻找与某种疾病相关的基因或位点。由于数据收集的高成本或不可抗拒因素,对某些物种或种群的研究必然面临小样本或强噪声的困境。在该项目中,我们创新性地设想疾病的某些特征可以在不同物种或种群之间迁移,以及在同一物种或种群中不同疾病的特征之间也可以迁移,从而可以用一个领域中相对成熟的知识帮助另一个领域中的数据解译或学习。我们拟利用迁移学习的思想来研究这些问题。为了将迁移学习原理应用于基因组相关性研究中,我们聚焦研究以下三个尚未解决的问题:(1)迁移在何种情况下可以进行;(2)多源域如何实施迁移学习;(3)如何避免负迁移。本项目拟通过解决所述三个问题来发展适用于基因组相关性研究的创新迁移学习理论,以期为基因组相关性研究提供新的理论与方法支撑,并以拟南芥的基因组相关性研究为实例进行讨论和验证,从而应用于其他物种和疾病。
中文关键词: 基因组相关性研究;迁移学习;模式识别;生物信息
英文摘要: Genome-wide association study has been a popular research topic in recent years. It aims to identify associated SNPs or genes in the whole genome. Due to the high price of collecting samples, the study in some populations or species might be difficult. In this project, we propose to explore how to transfer disease knowledge from one population to another, and how to transfer population or species knowledge from one disease to another. Thus the old or good knowledge in one domain can be used to help for the study in another domain. We can do this by borrowing the idea of transfer learning in manifold learning. However, the theory and methods in transfer learning are far from complete,which hampers its application in GWAS. In this project, we will explore the theories, models and applications for transfer learning, based on the problem in genome-wide association study. We focus on three unsolved problems in transfer learning. One is when one can transfer the knowledge from one domain to another. The second is the theory and methods of multi-source domain transfer learning. The final one is how to avoid negative transfer learning. The research will help complete the theory and methods of transfer learning, will help develop theories and methods for GWAS, and influence the applications in many fields such as computer science, engineering, biology and medicine. The introduction of transfer learning idea to genome-wide association study will inject vigor into the field of bioinformatics, and the complete of this project will enhance the development of biology and medicine.
英文关键词: genome-wide association study;transfer learning;bioinformatics