Accurate power and sample size estimation are crucial to the design and analysis of genetic association studies. When analyzing a binary trait via logistic regression, important covariates such as age and sex are typically included in the model. However, their effects are rarely properly considered in power or sample size computation during study planning. Unlike when analyzing a continuous trait, the power of association testing between a binary trait and a genetic variant depends, explicitly, on covariate effects, even under the assumption of gene-environment independence. Earlier work recognizes this hidden factor but implemented methods are not flexible. We thus propose and implement a generalized method for estimating power and sample size for (discovery or replication) association studies of binary traits that a) accommodates different types of non-genetic covariates E, b) deals with different types of G-E relationships, and c) is computationally efficient. Extensive simulation studies show that the proposed method is accurate and computationally efficient for both prospective and retrospective sampling designs with various covariate structures. A proof-of-principle application focused on the understudied African sample in the UK Biobank data. Results show that, in contrast to studying the continuous blood pressure trait, when analyzing the binary hypertension trait ignoring covariate effects of age and sex leads to overestimated power and underestimated replication sample size.
翻译:精确的能力和抽样规模估计对于基因联系研究的设计和分析至关重要。在通过后勤回归分析二进制特征时,模型中通常包含年龄和性别等重要的共变体特征。但是,在研究规划中,在动力或样本规模的计算中,很少适当考虑其影响。与分析连续特征时不同,二进制特征和基因变异之间的关联测试能力明确取决于共变效应,即使假设基因-环境独立,也明确取决于共变效应。早期工作承认这一隐性因素,但采用的方法并不灵活。因此,我们提出并执行一项通用方法,用于估计(发现或复制)二进制特征的(发现或复制)共变体特征,即(a)包含不同类型非遗传性共变种E,b)与不同类型G-E关系有关系,以及c)的共变体影响,在计算上是有效的。广泛的模拟研究表明,拟议的方法准确且计算对前景和追溯性抽样设计都有效。一个证据应用侧重于英国生物银行数据中未经充分研究的非洲样本。结果显示,(发现或复制)包含不同非遗传变异性变体规模的血压分析结果,结果显示,对不断的血压影响分析。