A common way to learn and analyze statistical models is to consider operations in the model parameter space. But what happens if we optimize in the parameter space and there is no one-to-one mapping between the parameter space and the underlying statistical model space? Such cases frequently occur for hierarchical models which include statistical mixtures or stochastic neural networks, and these models are said to be singular. Singular models reveal several important and well-studied problems in machine learning like the decrease in convergence speed of learning trajectories due to attractor behaviors. In this work, we propose a relative reparameterization technique of the parameter space, which yields a general method for extracting regular submodels from singular models. Our method enforces model identifiability during training and we study the learning dynamics for gradient descent and expectation maximization for Gaussian Mixture Models (GMMs) under relative parameterization, showing faster experimental convergence and a improved manifold shape of the dynamics around the singularity. Extending the analysis beyond GMMs, we furthermore analyze the Fisher information matrix under relative reparameterization and its influence on the generalization error, and show how the method can be applied to more complex models like deep neural networks.
翻译:学习和分析统计模型的常见方式是考虑在模型参数空间中的操作。但如果我们优化参数空间,并且在参数空间和基本统计模型空间之间没有一对一的绘图,会发生什么情况?这种案例经常发生在包括统计混合物或随机神经网络在内的等级模型中,而这些模型据说是独特的。单数模型揭示了机器学习中的若干重要和经过广泛研究的问题,如由于吸引行为而导致学习轨迹的趋同速度下降。在这项工作中,我们建议对参数空间进行相对的再分计技术,从而产生从单数模型中提取常规子模型的一般方法。我们的方法在培训期间执行模型的可辨别性,我们研究在相对参数化下为高斯混合模型(GAMS)学习梯度的动力和预期最大化,显示实验趋同速度更快,并改进了奇数周围动态的多重形状。将分析范围扩大到GMMM,我们进一步分析参数空间的渔业信息矩阵,从而产生从单数模型中提取常规子模型的一般方法。我们的方法是在培训中执行模型的可辨别性模型,我们研究如何在比较参数下将方法应用于比较复杂的模型。