Curriculum learning (CL) - training using samples that are generated and presented in a meaningful order - was introduced in the machine learning context around a decade ago. While CL has been extensively used and analysed empirically, there has been very little mathematical justification for its advantages. We introduce a CL model for learning the class of k-parities on d bits of a binary string with a neural network trained by stochastic gradient descent (SGD). We show that a wise choice of training examples, involving two or more product distributions, allows to reduce significantly the computational cost of learning this class of functions, compared to learning under the uniform distribution. We conduct experiments to support our analysis. Furthermore, we show that for another class of functions - namely the `Hamming mixtures' - CL strategies involving a bounded number of product distributions are not beneficial, while we conjecture that CL with unbounded many curriculum steps can learn this class efficiently.
翻译:课程学习(CL) -- -- 使用以有意义的顺序生成和展示的样本进行的培训 -- -- 大约十年前在机器学习背景下引入了计算机学习过程。虽然CL已被广泛使用和分析经验,但其优点的数学理由却很少。我们引入了CL模式,以学习二进制弦的千分之一的K-差级课程,由神经网络培训,由随机梯度下降(SGD)培训。我们发现,明智地选择涉及两个或两个以上产品分布的培训范例,可以大大降低学习这一类功能的计算成本,而与统一分布下的学习相比。我们进行实验以支持我们的分析。此外,我们表明,对于另一类功能,即“哈姆混合物”-涉及一定数量产品分配的CL战略,没有好处,我们推测,没有限制的许多课程步骤的CL能够有效地学习这一类。