This paper introduces a novel theoretical learning framework, termed probability distribution learning (PD learning). Departing from the traditional statistical learning framework, PD learning focuses on learning the underlying probability distribution, which is modeled as a random variable within the probability simplex. Within this framework, the optimization objective is learning error, which quantifies the posterior expected discrepancy between the model's predicted distribution and the underlying true distribution, given available sample data and prior knowledge. To optimize the learning error, this paper proposes the necessary conditions for loss functions, models, and optimization algorithms, ensuring that these conditions are all satisfied in real-world machine learning scenarios. Based on these conditions, the non-convex optimization mechanism corresponding to model training can be theoretically resolved. Moreover, the paper provides both model-dependent and model-independent bounds on learning error, offering new insights into the model's fitting ability and generalization capabilities. Furthermore, the paper applies the PD learning framework to elucidate the mechanisms by which various techniques, including random parameter initialization, over-parameterization, and dropout, influence deep model training. Finally, the paper substantiates the key conclusions of the proposed framework through experimental results.
翻译:暂无翻译