In a mixed generalized linear model, the objective is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two statistically independent signals in a mixed generalized linear model with Gaussian covariates. Spectral methods are a popular class of estimators which output the top two eigenvectors of a suitable data-dependent matrix. However, despite the wide applicability, their design is still obtained via heuristic considerations, and the number of samples $n$ needed to guarantee recovery is super-linear in the signal dimension $d$. In this paper, we develop exact asymptotics on spectral methods in the challenging proportional regime in which $n, d$ grow large and their ratio converges to a finite constant. By doing so, we are able to optimize the design of the spectral method, and combine it with a simple linear estimator, in order to minimize the estimation error. Our characterization exploits a mix of tools from random matrices, free probability and the theory of approximate message passing algorithms. Numerical simulations for mixed linear regression and phase retrieval display the advantage enabled by our analysis over existing designs of spectral methods.
翻译:在混合的通用线性模型中,目标是从未贴标签的观测中学习多种信号:每个样本来自一个信号,但并不知道哪个信号。我们考虑在与高森共行的混合通用线性模型中估算两个统计独立的信号的原型问题。光谱方法是一个流行的估测器类别,它输出出一个适当的数据依赖矩阵的最顶端的2个象形体。然而,尽管其设计广泛适用性,但还是通过脂质考虑获得的,保证恢复所需的样品数量是信号维度的超级线性美元。在本文中,我们在具有挑战性比例的系统中,在光谱方法上开发精确的静默性模型,在这种系统中,$,d美元增长大,其比率与定点一致。通过这样做,我们能够优化光谱方法的设计,并与简单的线性估测仪相结合,以尽量减少估计误差。我们的定性利用了随机矩阵、自由概率和近似信息通过算法显示现有光谱性分析的理论工具的混合组合。