Given an optimization problem, the Hessian matrix and its eigenspectrum can be used in many ways, ranging from designing more efficient second-order algorithms to performing model analysis and regression diagnostics. When nonlinear models and non-convex problems are considered, strong simplifying assumptions are often made to make Hessian spectral analysis more tractable. This leads to the question of how relevant the conclusions of such analyses are for more realistic nonlinear models. In this paper, we exploit deterministic equivalent techniques from random matrix theory to make a \emph{precise} characterization of the Hessian eigenspectra for a broad family of nonlinear models, including models that generalize the classical generalized linear models, without relying on strong simplifying assumptions used previously. We show that, depending on the data properties, the nonlinear response model, and the loss function, the Hessian can have \emph{qualitatively} different spectral behaviors: of bounded or unbounded support, with single- or multi-bulk, and with isolated eigenvalues on the left- or right-hand side of the bulk. By focusing on such a simple but nontrivial nonlinear model, our analysis takes a step forward to unveil the theoretical origin of many visually striking features observed in more complex machine learning models.
翻译:鉴于一个优化问题,赫森矩阵及其等离子光谱可以在许多方面使用,从设计更高效的第二阶算法到进行模型分析和回归诊断等一系列广泛的非线性模型。当考虑非线性模型和非线性问题时,往往会作出强有力的简化假设,使赫森光谱分析更加可移植。这导致这样的分析结论对于更现实的非线性模型的关联性问题。在本文中,我们利用随机矩阵理论的确定性等同技术,为非线性模型的大家庭(包括将典型的通用线性模型普遍化模型的模型,而不必依赖以前使用的强有力的简化假设)做出描述。我们表明,根据数据属性、非线性反应模型和损失函数,赫森可以有不同的光谱行为:有约束性或无约束性支持,有单倍或多倍的螺旋形模型,有非线性非线性非线性模型的描述,有非线性非线性模型的模型,以孤立的典型的直径直线性模型,在右前方或直径直径直径直径直径直径直径直径的模型中,以许多直径直径直径直径直径直径的模型的模型,在前向前向前向前向前向前向前走。我们最偏向前的模型,以许多方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向方向,以。我们。