We study the problem of regression in a generalized linear model (GLM) with multiple signals and latent variables. This model, which we call a matrix GLM, covers many widely studied problems in statistical learning, including mixed linear regression, max-affine regression, and mixture-of-experts. In mixed linear regression, each observation comes from one of $L$ signal vectors (regressors), but we do not know which one; in max-affine regression, each observation comes from the maximum of $L$ affine functions, each defined via a different signal vector. The goal in all these problems is to estimate the signals, and possibly some of the latent variables, from the observations. We propose a novel approximate message passing (AMP) algorithm for estimation in a matrix GLM and rigorously characterize its performance in the high-dimensional limit. This characterization is in terms of a state evolution recursion, which allows us to precisely compute performance measures such as the asymptotic mean-squared error. The state evolution characterization can be used to tailor the AMP algorithm to take advantage of any structural information known about the signals. Using state evolution, we derive an optimal choice of AMP `denoising' functions that minimizes the estimation error in each iteration. The theoretical results are validated by numerical simulations for mixed linear regression, max-affine regression, and mixture-of-experts. For max-affine regression, we propose an algorithm that combines AMP with expectation-maximization to estimate intercepts of the model along with the signals. The numerical results show that AMP significantly outperforms other estimators for mixed linear regression and max-affine regression in most parameter regimes.
翻译:本文研究了广义线性模型(GLM)中多个信号和潜变量的回归问题。该模型被称为矩阵GLM,涵盖了许多在统计学习中广泛研究的问题,包括混合线性回归,max-affine回归和专家混合等。在混合线性回归中,每个观测都来自$L$个信号向量(回归器)中的一个,但我们不知道哪一个;在max-affine回归中,每个观测都来自$L$个仿射函数的最大值,每个函数都通过不同的信号向量定义。在所有这些问题中,目标是从观测中估计信号,并可能估计一些潜变量。我们提出了一种新颖的近似消息传递(AMP)算法用于矩阵GLM中的估计并在高维极限下严格地表征了其性能。这种表征是用状态演化递归来描述的,它允许我们精确计算性能度量,如渐近均方误差。状态演化表征可用于根据已知的信号结构来定制AMP算法。使用状态演化,我们得出了一种最优的AMP“去噪”函数选择,在每个迭代中最小化估计误差。理论结果通过混合线性回归,max-affine回归和专家混合的数值模拟得到验证。对于max-affine回归,我们提出了一种将AMP与期望极大化结合使用的算法,以估计模型拦截器以及信号。数值结果表明,在大多数参数区域中,AMP的性能显着优于混合线性回归和max-affine回归的其他估计器。