在 Transcriptomic 中的差别分析: 随机选择“ 参考” 基因的强度 (Differential analysis in Transcriptomic: The strength of randomly picking 'reference' genes)

Transcriptomic analysis are characterized by being not directly quantitative and only providing relative measurements of expression levels up to an unknown individual scaling factor. This difficulty is enhanced for differential expression analysis. Several methods have been proposed to circumvent this lack of knowledge by estimating the unknown individual scaling factors however, even the most used one, are suffering from being built on hardly justifiable biological hypotheses or from having weak statistical background. Only two methods withstand this analysis: one based on largest connected graph component hardly usable for large amount of expressions like in NGS, the second based on $\log$-linear fits which unfortunately require a first step which uses one of the methods described before. We introduce a new procedure for differential analysis in the context of transcriptomic data. It is the result of pooling together several differential analyses each based on randomly picked genes used as reference genes. It provides a differential analysis free from the estimation of the individual scaling factors or any other knowledge. Theoretical properties are investigated both in term of FWER and power. Moreover in the context of Poisson or negative binomial modelization of the transcriptomic expressions, we derived a test with non asymptotic control of its bounds. We complete our study by some empirical simulations and apply our procedure to a real data set of hepatic miRNA expressions from a mouse model of non-alcoholic steatohepatitis (NASH), the CDAHFD model. This study on real data provides new hits with good biological explanations.

翻译：剖面分析的特点是不是直接的定量分析,而只是提供相对表达水平的相对测量,直到一个未知的单个缩放因素。这一困难在表达分析中有所增强。为了避免这种缺乏知识的情况,我们建议了几种方法,通过估计未知的单个缩放因素来规避这种缺乏知识的情况,然而,即使是最常用的缩放因素,也因建立于几乎没有理由的生物假设之上或统计背景薄弱而受到损害。只有两种方法可以经受这种分析:一种基于无法用于像NGS这样的大量表达式的最大连接图形组件,第二种基于美元直径或线形图,不幸需要第一步使用前面描述的方法之一。我们引入了一个新的程序,用于在笔录组化数据方面进行差异分析。这是根据随机提取的基因进行的若干差异分析的结果。它提供了一种差异分析,而不用对单个缩放因素或任何其它知识进行估计。从FWER和力量的角度对理论特性进行了调查。此外,在Poisson或负的直线形模型化方面,需要首先使用前面描述的方法之一。我们从模型中得出了一种非正谱缩写式的缩略图分析。我们从模拟的模型,我们用一种非模拟的模型来进行了一种非模拟的模拟的模型,用模拟式的模型来研究。我们用其模拟的CD-我们用一种模拟式的CD-SDRAHAHART-SDMDFDMAFSM的模型进行某种程序。