When analyzing parametric statistical models, a useful approach consists in modeling geometrically the parameter space. However, even for very simple and commonly used hierarchical models like statistical mixtures or stochastic deep neural networks, the smoothness assumption of manifolds is violated at singular points which exhibit non-smooth neighborhoods in the parameter space. These singular models have been analyzed in the context of learning dynamics, where singularities can act as attractors on the learning trajectory and, therefore, negatively influence the convergence speed of models. We propose a general approach to circumvent the problem arising from singularities by using stratifolds, a concept from algebraic topology, to formally model singular parameter spaces. We use the property that specific stratifolds are equipped with a resolution method to construct a smooth manifold approximation of the singular space. We empirically show that using (natural) gradient descent on the smooth manifold approximation instead of the singular space allows us to avoid the attractor behavior and therefore improve the convergence speed in learning.
翻译:在分析参数统计模型时,一个有用的方法就是对参数空间进行几何建模。然而,即使对于非常简单和常用的等级模型,例如统计混合物或深神经网络,在参数空间显示非光滑相邻的单点,也违反元体的平稳假设。这些单点模型是在学习动态的背景下分析的,在学习轨迹上,奇点可以作为吸引者,从而对模型的趋同速度产生消极影响。我们提出了一个一般性办法,通过使用顶点(一个来自代数表层学的概念)来避免因奇点而产生的问题,正式模拟单点参数空间。我们使用特定层装有分辨率方法的属性来构建单点空间的平滑多边近。我们从经验上表明,在光滑的多边近距离上使用(自然)梯度下降,而不是单点空间,可以使我们避免吸引者的行为,从而提高学习的趋同速度。