Genome-wide association studies (GWAS) require accurate cohort phenotyping, but expert labeling can be costly, time-intensive, and variable. Here we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; $P\leq5\times10^{-8}$) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 92 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR, with select loci near genes involved in neuronal and synaptic biology or known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR and primary open-angle glaucoma in the independent EPIC-Norfolk cohort.
翻译:全基因组协会研究(GWAS)需要准确的组群口观,但专家标签可以是昂贵的、时间密集的和可变的。在这里,我们开发了一个机器学习(ML)模型,从彩金照片中预测光学神经头部特征。我们用该模型来预测英国生物银行65,680欧洲人的青光谱(UKB)的垂直杯与分辨比率(VCDR),一种诊断参数和基本内分泌类型。基于ML的VCDR的GWAS发现,299个独立的全基因组(GWS;$P\leq5\times10 ⁇ 8}($)。基于MLWAS的机器学习(MML)模型用来预测156个Lcialci(GMLS)的点击率。基于MLS的MWAS复制了65个GWS(GWAS)的模型,其中两个眼科医生手动标签图像为67,040欧洲人。基于ML的GWAS还查明了92个初级小行星,大大扩展了我们对光学和VDR(VC)的精核-C-C)的遗传基因-C(GDRL)的基因分析,以及MDRAL(GL)接近的DNA(GL)的DNA-GL)的遗传结果。