There remain numerous unanswered research questions on deep learning (DL) within the classical learning theory framework. These include the remarkable generalization capabilities of overparametrized neural networks (NNs), the efficient optimization performance despite non-convexity of objectives, the mechanism of flat minima in generalization, and the exceptional performance of deep architectures, among others. This paper introduces a novel theoretical learning framework known as General Distribution Learning (GD Learning), which is designed to address a comprehensive range of machine learning and statistical tasks, including classification, regression and parameter estimation. Departing from statistical machine learning, GD Learning focuses on the true underlying distribution. In GD Learning, learning error, corresponding to the expected error in classical statistical learning framework, is divided into fitting errors caused by models and fitting algorithms, as well as sampling errors introduced by limited sampling data. The framework significantly incorporates prior knowledge, especially in scenarios characterized by data scarcity. This integration of external knowledge helps to minimize learning errors across the entire dataset, thereby enhancing performance. Within the GD Learning framework, we demonstrate that the global optimal solution to non-convex optimization problems, such as minimizing fitting error, can be approached by minimizing the gradient norm and the non-uniformity of the eigenvalues of the model's Jacobian matrix. This insight has led to the development of the gradient structure control algorithm. GD Learning also offers a fresh perspective on the questions on deep learning, including overparameterization and non-convex optimizations, bias-variance trade-off, and the mechanism of flat minima.
翻译:暂无翻译