A major task in genetic studies is to identify genes related to human diseases and traits to understand functional characteristics of genetic mutations and enhance patient diagnosis. Besides marginal analyses of individual genes, identification of gene pathways, i.e., a set of genes with known interactions that collectively contribute to specific biological functions, can provide more biologically meaningful results. Such gene pathway analysis can be formulated into a high-dimensional two-sample testing problem. Due to the typically limited sample size of gene expression datasets, most existing two-sample tests may have compromised powers because they ignore or only inefficiently incorporate the auxiliary pathway information on gene interactions. We propose T2-DAG, a Hotelling's $T^2$-type test for detecting differentially expressed gene pathways, which efficiently leverages the auxiliary pathway information on gene interactions through a linear structural equation model. We establish the asymptotic distribution of the test statistic under pertinent assumptions. Simulation studies under various scenarios show that T2-DAG outperforms several representative existing methods with well-controlled type-I error rates and substantially improved powers, even with incomplete or inaccurate pathway information or unadjusted confounding effects. We also illustrate the performance of T2-DAG in an application to detect differentially expressed KEGG pathways between different stages of lung cancer.
翻译:基因研究的一项主要任务是确定与人类疾病有关的基因和特征,以了解基因突变的功能特征,并加强病人诊断; 除了对个别基因进行边际分析外,查明基因路径,即一组已知相互作用的基因,可以提供更具有生物学意义的结果; 这种基因路径分析可以形成一个高维的二类抽样测试问题; 由于基因表达数据集的抽样规模通常有限,大多数现有的双类测试可能具有妥协的权力,因为它们忽视或只是没有有效地纳入基因相互作用的辅助路径信息; 我们提议T2-DAG, 一家旅馆的$T ⁇ 2美元类型的测试,用于检测不同表达的基因路径,通过线性结构等式模型有效地利用基因相互作用的辅助路径信息; 我们根据有关假设确定试验统计的无症状分布; 各种假设下的模拟研究表明,T2-DAG超越了几种具有良好控制的类型一型错误率和显著改进的功能,甚至以不完全或不精确的路径信息或不精确的路径测量T2-D型癌症的不同性能。