Human microbiome studies based on genetic sequencing techniques produce compositional longitudinal data of the relative abundances of microbial taxa over time, allowing to understand, through mixed-effects modeling, how microbial communities evolve in response to clinical interventions, environmental changes, or disease progression. In particular, the Zero-Inflated Beta Regression (ZIBR) models jointly and over time the presence and abundance of each microbe taxon, considering the compositional nature of the data, its skewness, and the over-abundance of zeros. However, as for other complex random effects models, maximum likelihood estimation suffers from the intractability of likelihood integrals. Available estimation methods rely on log-likelihood approximation, which is prone to potential limitations such as biased estimates or unstable convergence. In this work we develop an alternative maximum likelihood estimation approach for the ZIBR model, based on the Stochastic Approximation Expectation Maximization (SAEM) algorithm. The proposed methodology allows to model unbalanced data, which is not always possible in existing approaches. We also provide estimations of the standard errors and the log-likelihood of the fitted model. The performance of the algorithm is established through simulation, and its use is demonstrated on two microbiome studies, showing its ability to detect changes in both presence and abundance of bacterial taxa over time and in response to treatment.
翻译:基于基因测序技术的人类微生物组研究产生了微生物类群相对丰度随时间的组成性纵向数据,这使我们能够通过混合效应模型理解微生物群落如何响应临床干预、环境变化或疾病进展而演化。特别是,零膨胀贝塔回归(ZIBR)模型联合并随时间建模每个微生物类群的存在与丰度,同时考虑了数据的组成性、偏态性以及零值的过度丰度。然而,与其他复杂的随机效应模型一样,最大似然估计面临着似然积分难以处理的难题。现有的估计方法依赖于对数似然的近似,这可能存在潜在局限性,如估计偏差或收敛不稳定。在本研究中,我们基于随机逼近期望最大化(SAEM)算法,为ZIBR模型开发了一种替代的最大似然估计方法。所提出的方法能够对不平衡数据进行建模,而这在现有方法中并不总能实现。我们还提供了拟合模型的标准误差和对数似然估计值。通过模拟研究确立了该算法的性能,并在两项微生物组研究中展示了其应用,证明了该方法能够检测细菌类群随时间和治疗响应的存在与丰度变化。