Time-course gene expression datasets provide insight into the dynamics of complex biological processes, such as immune response and organ development. It is of interest to identify genes with similar temporal expression patterns because such genes are often biologically related. However, this task is challenging due to the high dimensionality of these datasets and the nonlinearity of gene expression time dynamics. We propose an empirical Bayes approach to estimating ordinary differential equation (ODE) models of gene expression, from which we derive a similarity metric between genes called the Bayesian lead-lag $R^2$ (LLR2). Importantly, the calculation of the LLR2 leverages biological databases that document known interactions amongst genes; this information is automatically used to define informative prior distributions on the ODE model's parameters. As a result, the LLR2 is a biologically-informed metric that can be used to identify clusters or networks of functionally-related genes with co-moving or time-delayed expression patterns. We then derive data-driven shrinkage parameters from Stein's unbiased risk estimate that optimally balance the ODE model's fit to both data and external biological information. Using real gene expression data, we demonstrate that our methodology allows us to recover interpretable gene clusters and sparse networks. These results reveal new insights about the dynamics of biological systems.
翻译:时间周期基因表达数据集能够洞察到复杂的生物过程的动态,例如免疫反应和器官发育。重要的是,LLL2的计算利用了记录已知基因间相互作用的生物数据库;这一信息被自动用于确定ODE模型参数的先前信息分布。结果,LLLR2是一种生物信息化的衡量标准,可用于确定具有同步移动或时间延迟表达模式的功能相关基因的集群或网络。然后我们从Stein的公正风险估算中得出数据驱动的缩微参数,这种估算使我们的ODE模型模型能够最好地平衡我们的数据和外部生物动态分析结果。