Regularized regression models, such as the lasso and variants, are well studied and, under appropriate conditions, offer fast and statistically interpretable results. However, large data in many applications are heterogeneous in the sense of harboring distributional differences between latent groups. Then, the assumption that the conditional distribution of response Y given features X is the same for all samples may not hold (even approximately). Furthermore, in scientific applications, the covariance structure of the features may contain important signals and its learning is also affected by latent group structure. We propose a class of regularized mixture models for paired data of the form (X,Y) that couples together the distribution of X (modeled using sparse graphical models) and the conditional Y | X (modeled using sparse regression). Both the regression and graphical models are specific to the latent groups and model parameters are estimated jointly (hence we call the approach "regularized joint mixtures"). This allows signals in either or both of the feature distribution and regression model to inform learning of latent structure and provides automatic control of confounding by such structure. Estimation is handled via an expectation-maximization algorithm, whose convergence is established theoretically. We illustrate the key ideas via empirical examples.
翻译:此外,在科学应用中,特征的共变结构可能包含重要信号,其学习也受到潜在群落结构的影响。我们为表(X,Y)的配对数据建议了一类固定混合模型,即X(使用稀疏图形模型制成)和Y ⁇ ⁇ X(使用稀疏回归制成)的配对数据组合成。我们通过一种预期-正缩算算法处理振动。我们通过实验性模型展示了这些模型的主要观点。