Mixture models are useful in a wide array of applications to identify subpopulations in noisy overlapping distributions. For example, in multiplexed immunofluorescence (mIF), cell image intensities represent expression levels and the cell populations are a noisy mixture of expressed and unexpressed cells. Among mixture models, the gamma mixture model has the strength of being flexible in fitting skewed strictly positive data that occur in many biological measurements. However, the current estimation method uses numerical optimization within the expectation maximization algorithm and is computationally expensive. This makes it infeasible to be applied across many large data sets, as is necessary in mIF data. Powered by a recently developed closed-form estimator for the gamma distribution, we propose a closed-form gamma mixture model that is not only more computationally efficient, but can also incorporate constraints from known biological information to the fitted distribution. We derive the closed-form estimators for the gamma mixture model and use simulations to demonstrate that our model produces comparable results with the current model with significantly less time, and is excellent in constrained model fitting.
翻译:混合模型在一系列广泛的应用中非常有用,可以用来识别杂乱重叠分布分布的亚人口。例如,在多氧化免疫显微度分布中,细胞图像强度代表表达水平,而细胞群是表达和未表达细胞的噪音混合体。在混合模型中,伽马混合模型具有灵活性,能够对许多生物测量中出现的严格正面的数据进行扭曲的匹配。然而,目前的估算方法在预期最大化算法中采用数字优化,而且计算成本很高。这使得无法对许多大型数据集适用,如MIF数据中所要求的那样。我们用最近开发的用于伽马分布的封闭式表象仪推动,我们提出了一个闭式伽马混合物模型,该模型不仅在计算上效率更高,而且还可以将已知生物信息的限制纳入到安装的分布中。我们为伽马混合物模型提供了封闭式估计器,并使用模拟来证明我们的模型与当前模型产生可比的结果,时间要少得多,而且模型的精细。