这门课的目的是为最广泛使用的学习架构阐述学习理论的最新结果。本课程面向以理论为导向的学生,以及那些想要对整个硕士课程中使用的算法有基本数学理解的学生。
我们将特别从第一性原理证明许多结果,同时保持阐述尽可能简单。这将自然地导致一个关键结果的选择,以简单但相关的实例来展示学习理论中的重要概念。在没有证明的情况下,也将给出一些一般的结果。
本课程分为9节,每节3小时,除了最后一节专门介绍最近的学习理论成果外,每节都有一个精确的主题。见下面的暂定时间表。
目录内容:
- 无线数据学习 Learning with infinite data (population setting)
- Decision theory (loss, risk, optimal predictors)
- Decomposition of excess risk into approximation and estimation errors
- No free lunch theorems
- Basic notions of concentration inequalities (MacDiarmid, Hoeffding, Bernstein)
- 线性最小二乘回归 Linear least-squares regression
- Guarantees in the fixed design settings (simple in closed form)
- Guarantees in the random design settings
- Ridge regression: dimension independent bounds
- 经典风险分解 Classical risk decomposition
- Approximation error
- Convex surrogates
- Estimation error through covering numbers (basic example of ellipsoids)
- Modern tools (no proof): Rademacher complexity, Gaussian complexity (+ Slepian/Lipschitz)
- Minimax rates (at least one proof)
- 机器学习优化 Optimization for machine learning
- Gradient descent
- Stochastic gradient descent
- Generalization bounds through stochastic gradient descent
- 局部平均技术 Local averaging techniques
- Kernel density estimation
- Nadaraya-Watson estimators (simplest proof to be found with apparent curse of dimensionality)
- K-nearest-neighbors
- Decision trees and associated methods
- 核方法 Kernel methods
- Modern analysis of non-parametric techniques (simplest proof with results depending on s and d
- 模型选择 Model selection
- L0 penalty with AIC
- L1 penalty
- High-dimensional estimation
- 神经方法 Neural networks
- Approximation properties (simplest approximation result)
- Two layers
- Deep networks
- 特别话题 Special topics
- Generalization/optimization properties of infinitely wide neural networks
- Double descent