In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model. Our first result is a sharp asymptotic expression for the test and training errors in the high-dimensional regime. Motivated by the recent stream of results on the Gaussian universality of the test and training errors in generalized linear estimation, we ask ourselves the question: "when is a single Gaussian enough to characterize the error?". Our formula allow us to give sharp answers to this question, both in the positive and negative directions. More precisely, we show that the sufficient conditions for Gaussian universality (or lack of thereof) crucially depend on the alignment between the target weights and the means and covariances of the mixture clusters, which we precisely quantify. In the particular case of least-squares interpolation, we prove a strong universality property of the training error, and show it follows a simple, closed-form expression. Finally, we apply our results to real datasets, clarifying some recent discussion in the literature about Gaussian universality of the errors in this context.
翻译:在这份手稿中,我们考虑了对高斯混合数据与一个单一指数模型给出的标签的普遍线性估算问题。我们的第一个结果是高斯混合数据与一个单一指数模型给出的标签的普遍线性估算问题。我们的第一个结果是对高斯高斯高斯高斯高斯高斯高斯高斯高斯高斯高斯高斯高斯高斯高斯混合数据测试和培训错误的普遍性的最新结果的急剧零星表达。我们问自己一个问题 : “ 当一个单一高斯高斯高斯高斯高斯高斯高斯高斯高斯高斯高斯混合数据时,我们何时能对高斯混合组中测试和培训错误的普遍性特征进行精确的描述? ”我们公式允许我们在正反方向上对这个问题给出精确的答案。 更确切地说,我们把结果应用到真实的数据集上, 澄清了最近关于高斯高斯高斯高斯高斯在此处错误的普遍性的文献中的一些讨论。