Motivation: Combining the results of different experiments to exhibit complex patterns or to improve statistical power is a typical aim of data integration. The starting point of the statistical analysis often comes as sets of p-values resulting from previous analyses, that need to be combined in a flexible way to explore complex hypotheses, while guaranteeing a low proportion of false discoveries. Results: We introduce the generic concept of composed hypothesis, which corresponds to an arbitrary complex combination of simple hypotheses. We rephrase the problem of testing a composed hypothesis as a classification task, and show that finding items for which the composed null hypothesis is rejected boils down to fitting a mixture model and classify the items according to their posterior probabilities. We show that inference can be efficiently performed and provide a thorough classification rule to control for type I error. The performance and the usefulness of the approach are illustrated on simulations and on two different applications. The method is scalable, does not require any parameter tuning, and provided valuable biological insight on the considered application cases. Availability: The QCH methodology is implemented in the qch R package hosted on CRAN.
翻译:动机:将不同实验的结果结合起来,以展示复杂的模式或提高统计能力,这是数据整合的一个典型目的。统计分析的起点往往是以前分析得出的一套p值,需要灵活地结合,以探讨复杂的假设,同时保证虚假发现的比例较低。结果:我们引入了组合假设的通用概念,这与简单假设的任意复杂组合相对应。我们把测试一个组合假设的问题重新表述为一项分类任务,并表明发现一个被否定的无效假设被否定的物品,将之归结为混合模型,并根据其后继概率对物品进行分类。我们表明,可以有效地进行推断,并为控制I类错误提供一个彻底的分类规则。该方法的性能和有用性在模拟和两种不同的应用上加以说明。该方法可以缩放,不需要任何参数调整,并且对考虑的应用案例提供宝贵的生物洞察力。可用:QCH方法在CRAN主机的qch R软件包中实施。