In medical genetics, each genetic variant is evaluated as an independent entity regarding its clinical importance. However, in most complex diseases, variant combinations in specific gene networks, rather than the presence of a particular single variant, predominates. In the case of complex diseases, disease status can be evaluated by considering the success level of a team of specific variants. We propose a high dimensional modelling based method to analyse all the variants in a gene network together. To evaluate our method, we selected two gene networks, mTOR and TGF-Beta. For each pathway, we generated 400 control and 400 patient group samples. mTOR and TGF-? pathways contain 31 and 93 genes of varying sizes, respectively. We produced Chaos Game Representation images for each gene sequence to obtain 2-D binary patterns. These patterns were arranged in succession, and a 3-D tensor structure was achieved for each gene network. Features for each data sample were acquired by exploiting Enhanced Multivariance Products Representation to 3-D data. Features were split as training and testing vectors. Training vectors were employed to train a Support Vector Machines classification model. We achieved more than 96% and 99% classification accuracies for mTOR and TGF-Beta networks, respectively, using a limited amount of training samples.
翻译:在医学遗传学中,每种基因变异都作为独立的实体来评价其临床重要性。然而,在大多数复杂的疾病中,特定基因网络中的变异组合,而不是特定单一变异的组合,占主导地位。在复杂的疾病中,疾病状况可以通过考虑特定变异小组的成功水平来评价。我们提出了一个基于高维的建模方法来分析基因网络中的所有变异。为了评估我们的方法,我们选择了两个基因网络,即Mtoror和TGF-Beta。对于每一个路径,我们产生了400个控制和400个病人群体样本。MTOR和TGF-?路径分别包含31和93个不同大小的基因。我们为每个基因序列制作了Chaos游戏图示图像,以获得2-D二进制模式。这些模式是连续排列的,每个基因网络都实现了3-D色标结构。通过将强化的多变产品代表制到3-D数据,获得了每个数据样本的特征。对于每个路径,我们进行了分解为培训和测试矢量。培训病媒的分类分别包括31和93个基因结构模型,我们分别实现了96%和99个样本。