Mixtures of regression are a powerful class of models for regression learning with respect to a highly uncertain and heterogeneous response variable of interest. In addition to being a rich predictive model for the response given some covariates, the parameters in this model class provide useful information about the heterogeneity in the data population, which is represented by the conditional distributions for the response given the covariates associated with a number of distinct but latent subpopulations. In this paper, we investigate conditions of strong identifiability, rates of convergence for conditional density and parameter estimation, and the Bayesian posterior contraction behavior arising in finite mixture of regression models, under exact-fitted and over-fitted settings and when the number of components is unknown. This theory is applicable to common choices of link functions and families of conditional distributions employed by practitioners. We provide simulation studies and data illustrations, which shed some light on the parameter learning behavior found in several popular regression mixture models reported in the literature.
翻译:回归的混合体是一个极不确定和多变的反应变量,是回归学习的强大模型类别。除了是一个丰富的预测模型,对于某些共变反应,这个模型类别的参数提供了丰富的预测模型,除此之外,这个模型类别中的参数还提供了有关数据群的异质性的有用信息,根据与若干不同但潜伏的亚群有关的共变情况,该模型群的参数分布是有条件的。在本文中,我们调查了强有力的可识别性、有条件密度和参数估计的趋同率、以及Bayesian后方的后方收缩行为等各种条件,这些条件都来自回归模型的有限组合、不完全适合和过分适合的设置以及组件数量不详的情况。这一理论适用于共同选择的连接功能和从业者使用的有条件分布组别。我们提供了模拟研究和数据说明,对文献中报告的若干流行的回归混合模型中发现的参数学习行为提供了一些启发。