项目名称: 高维时间过程型数据的聚类及变量选择分析
项目编号: No.11301064
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 数理科学和化学
项目作者: 黄伟
作者单位: 东北师范大学
项目金额: 22万元
中文摘要: 对高维时间过程型数据的分析一直是生物学、医学用于研究某个自然过程(细胞分化、癌变细胞生长)的重要手段。时间过程型数据不仅反映了各个变量在某个时间区间的变化过程,往往还带有类结构信息。本项目研究类结构随时间变化的高维时间过程型数据的聚类分析以及变量选择问题。通过融合聚类和变量选择算法对数据在整个时间区间内类结构的变化情况以及每个类中具有相似响应模式的特征变量进行全面的评估。项目主要利用隐树层次混合效应模型刻画含有特殊类结构的时间过程型数据,在模型第一层中,类结构部分即类指标变量利用随机分枝模型加以描述, 每个变量在各个分支上的响应曲线利用以B样条为基底的线性模型加以描述;在第二层中,观测数据通过类指示变量与模型的潜层部分相连。项目最后通过设计一套高效快捷的MCMC算法来对模型中的参数(类结构、响应曲线参数、关键特征变量)进行推断。本项目具有重要的理论和应用价值。
中文关键词: 聚类分析;高维时间过程数据;隐树混合模型;胚胎细胞;细胞分化
英文摘要: The analysis of high dimensional time-course data is an important method in biology and medicine area for studying a natural process, such as cell differentiation, caner cell generation. Becides time-wise variant process, there is also clustering structure information included in time-course data. In this project, we study clustering and variable selection method for high dimensional time-course data with variant clustering sturcture. Clustering structure and features with similar reponse curve are inferred by combining new clustering and variable selection algorithm. A special hidden tree hierachical mixture model is used to describe time-course data and its clustering structure. In the hidden layer, clustering structure is described by a modified random tree branching process, while the relative response curve is presented by B-spline linear model. In the second layer, the observed data are connected with the first layer by unobserved cluster indicator variables. An effective MCMC algorithm is developed in order to infer the model parameters, such as clustering structure, response curve. This project has important theoretical and practical values.
英文关键词: Clustering analysis;High dimensional time course data;Hidden tree mixture modelse data;Stem cells;Cell differentiation