This paper analyzes the convergence and generalization of training a one-hidden-layer neural network when the input features follow the Gaussian mixture model consisting of a finite number of Gaussian distributions. Assuming the labels are generated from a teacher model with an unknown ground truth weight, the learning problem is to estimate the underlying teacher model by minimizing a non-convex risk function over a student neural network. With a finite number of training samples, referred to the sample complexity, the iterations are proved to converge linearly to a critical point with guaranteed generalization error. In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.
翻译:本文分析了培训单层神经网络的趋同和一般化情况,因为输入特征遵循高斯混合模型,由有限数量的高斯分布组成。假设标签来自教师模型,地面真伪重量未知,学习问题是通过最大限度地减少学生神经网络的非混凝土风险功能来估计基本教师模型。参照样本的复杂性,有数量有限的培训样本,迭代被证明线性地向临界点汇合,并有保证的概括错误。此外,本文首次描述了投入分布对样本复杂性和学习率的影响。