We present a new combinatorial model for identifying regulatory modules in gene co-expression data using a decomposition into weighted cliques. To capture complex interaction effects, we generalize the previously-studied weighted edge clique partition problem. As a first step, we restrict ourselves to the noise-free setting, and show that the problem is fixed parameter tractable when parameterized by the number of modules (cliques). We present two new algorithms for finding these decompositions, using linear programming and integer partitioning to determine the clique weights. Further, we implement these algorithms in Python and test them on a biologically-inspired synthetic corpus generated using real-world data from transcription factors and a latent variable analysis of co-expression in varying cell types.
翻译:我们提出了一个新的组合模型,用于通过对加权晶体进行分解,确定基因共表达数据中的调控模块。为了捕捉复杂的交互效应,我们推广了先前研究过的加权边缘分块问题。作为第一步,我们仅限于无噪音设置,并表明问题在于固定参数,在按模块数量(类别)进行参数化时,可以用固定参数来定位。我们提出了两种新的算法,以寻找这些分解,即使用线性编程和整数分割法来确定分层重量。此外,我们在Python中应用这些算法,并测试利用来自转录因素的真实世界数据和对不同类型细胞共表态的潜在变量分析产生的生物激励合成物质。