For a model of high-dimensional linear regression with random design, we analyze the performance of an estimator given by the mean of a log-concave Bayesian posterior distribution with gaussian prior. The model is mismatched in the following sense: like the model assumed by the statistician, the labels-generating process is linear in the input data, but both the classifier ground-truth prior and gaussian noise variance are unknown to her. This inference model can be rephrased as a version of the Gardner model in spin glasses and, using the cavity method, we provide fixed point equations for various overlap order parameters, yielding in particular an expression for the mean-square reconstruction error on the classifier (under an assumption of uniqueness of solutions). As a direct corollary we obtain an expression for the free energy. Similar models have already been studied by Shcherbina and Tirozzi and by Talagrand, but our arguments are more straightforward and some assumptions are relaxed. An interesting consequence of our analysis is that in the random design setting of ridge regression, the performance of the posterior mean is independent of the noise variance (or "temperature") assumed by the statistician, and matches the one of the usual (zero temperature) ridge estimator.
翻译:对于使用随机设计的高维线性回归模型, 我们分析由对数调巴耶西亚后端分布的平均值给出的测深器的性能。 模型在以下意义上不匹配: 与统计家假设的模型一样, 标签生成过程在输入数据中是线性, 但是她对分类师之前的地面图理和Gassuian噪音差异都不了解。 这个推度模型可以重新表述成加德纳模型在旋转眼镜中的版本, 并且使用洞察法, 我们为各种重叠顺序参数提供固定点方程方程式, 特别是表达分类器的中平均方程重建错误( 假设解决方案的独特性 ) 。 作为直接的必然结果, 我们获得了自由能源的表达。 Shcherbina 和Tirozzi 以及 Talagrand 已经研究过类似的模型, 但是我们的论点比较简单一些, 一些假设比较宽松。 我们分析的一个有趣的结果是, 在随机设计后方位回归, 假设的海平面温度( 通常的海平面) 的温度变化是独立的。