Biological phenotypes are products of complex evolutionary processes in which selective forces influence multiple biological trait measurements in unknown ways. Phylogenetic factor analysis disentangles these relationships across the evolutionary history of a group of organisms. Scientists seeking to employ this modeling framework confront numerous modeling and implementation decisions, the details of which pose computational and replicability challenges. General and impactful community employment requires a data scientific analysis plan that balances flexibility, speed and ease of use, while minimizing model and algorithm tuning. Even in the presence of non-trivial phylogenetic model constraints, we show that one may analytically address latent factor uncertainty in a way that (a) aids model flexibility, (b) accelerates computation (by as much as 500-fold) and (c) decreases required tuning. We further present practical guidance on inference and modeling decisions as well as diagnosing and solving common problems in these analyses. We codify this analysis plan in an automated pipeline that distills the potentially overwhelming array of modeling decisions into a small handful of (typically binary) choices. We demonstrate the utility of these methods and analysis plan in four real-world problems of varying scales.
翻译:生物苯型是复杂的进化过程的产物,其中选择性力量以未知方式影响多种生物特征测量。细胞基因因素分析将这些关系分解在一组生物的进化史上。寻求利用这一模型框架的科学家面对许多模型和执行决定,其细节构成计算和可复制性的挑战。一般和有影响的社区就业需要一份数据科学分析计划,平衡灵活性、速度和易用程度,同时尽量减少模型和算法调适。即使存在非三重植物特征模型限制,我们也表明可以分析解决潜在因素的不确定性,其方式(a) 辅助模型灵活性,(b) 加速计算(500倍以上)和(c) 需要调整的减少。我们进一步提出关于推断和建模决定以及诊断和解决这些分析中常见问题的实际指导。我们将这一分析计划编成一个自动化管道,将潜在的大量模型决定转化为少数(典型的二进式)选择。我们展示了这些方法和分析计划在四个现实世界范围内的用途。我们展示了这些方法和分析计划的实用性。