对基因组研究的二元结果进行基于模型的有力估算 (Robust model-based estimation for binary outcomes in genomics studies)

In quantitative genetics, statistical modeling techniques are used to facilitate advances in the understanding of which genes underlie agronomically important traits and have enabled the use of genome-wide markers to accelerate genetic gain. The logistic regression model is a statistically optimal approach for quantitative genetics analysis of binary traits. To encourage more widespread use of the logistic model in such analyses, efforts need to be made to address separation, which occurs whenever a specific combination of predictors can perfectly predict the value of a binary trait. Data separation is especially prevalent in applications where the number of predictors is near the sample size. In this study we motivate a logistic model that is robust to separation, and we develop a novel prediction procedure for this robust model that is appropriate when separation exists. We show that this robust model offers superior inferences and comparable predictions to existing approaches while remaining true to the logistic model. This is an improvement to previously existing approaches which treats separation as a modeling shortcoming and not an antagonistic data configuration. Previous approaches, therefore, change the modeling paradigm to consider separation that, before our robust model exists, is problematic to logistic models. Our comparisons are conducted on several didactic examples and a genomics study on the kernel color in maize. The ensuing analyses reaffirm the billed superior inferences and comparable predictive performance of our robust model. Therefore, our approach provides scientists with an appropriate statistical modeling framework for analyses involving agronomically important binary traits.

翻译：在定量遗传学中,使用统计模型技术来推动人们了解哪些基因是具有农业重要性的重要特征的基因的基础,从而能够利用整个基因组的标记来加速遗传增益。后勤回归模型是分析二元特征的定量遗传学分析的统计最佳方法。为了鼓励在这种分析中更广泛地使用后勤模型,需要努力解决分离问题。为了鼓励在这种分析中更广泛地使用物流模型,需要努力解决分离问题,如果具体结合预测器的具体组合能够完美预测二元特征的价值,预测器的具体组合可以完美预测二元性值时,就应使用这种分离。数据分离在预测器数目接近抽样规模的应用程序中尤为普遍。在本研究中,我们鼓励一种强大分离的后勤模型,我们为这种在分离存在时适合的强健健健健健模型制定新的预测程序。我们表明,这种强的模型提供了更精确的推论和可比较的预测与现有方法相比,同时仍然与后勤模式相一致。这是对以前将分离作为模拟短度模型而不是模型模型的适当配置方法的改进。因此,先改变模型模式,以考虑分离的模型模式模式,在我们的强型模型存在之前,对后勤模型有问题有困难。我们进行比较后,我们进行的一项比较的货币学分析,而后,而后,在逻辑模型分析则在逻辑模型中进行。