Focusing on polygenic signal detection in high dimensional genetic association studies of complex traits, we develop an adaptive test for generalized linear models to accommodate different alternatives. To facilitate valid post-selection inference for high dimensional data, our study here adheres to the original sampling-splitting principle but does so, repeatedly, to increase stability of the inference. We show the asymptotic null distributions of the proposed test for both fixed and diverging number of variants. We also show the asymptotic properties of the proposed test under local alternatives, providing insights on why power gain attributed to variable selection and weighting can compensate for efficiency loss due to sample splitting. We support our analytical findings through extensive simulation studies and two applications. The proposed procedure is computationally efficient and has been implemented as the R package DoubleCauchy.
翻译:在对复杂特性的高度遗传联系研究中,我们侧重于多源信号探测,为通用线性模型开发适应性测试,以适应不同的替代品。为了便于对高维数据进行有效的选后推断,我们的研究遵循最初的抽样分解原则,但一再这样做,以提高推断的稳定性。我们展示了拟议测试的固定和不同变体的无症状分布。我们还在当地替代品下展示了拟议测试的无药可治性特性,揭示了为什么因不同选择和加权而获得的功率能够弥补因抽样分解而造成的效率损失。我们通过广泛的模拟研究和两种应用支持我们的分析结论。拟议的程序具有计算效率,并作为R包“双感”实施。