In machine learning, data augmentation (DA) is a technique for improving the generalization performance. In this paper, we mainly considered gradient descent of linear regression under DA using noisy copies of datasets, in which noise is injected into inputs. We analyzed the situation where random noisy copies are newly generated and used at each epoch; i.e., the case of using on-line noisy copies. Therefore, it is viewed as an analysis on a method using noise injection into training process by DA manner; i.e., on-line version of DA. We derived the averaged behavior of training process under three situations which are the full-batch training under the sum of squared errors, the full-batch and mini-batch training under the mean squared error. We showed that, in all cases, training for DA with on-line copies is approximately equivalent to a ridge regularization whose regularization parameter corresponds to the variance of injected noise. On the other hand, we showed that the learning rate is multiplied by the number of noisy copies plus one in full-batch under the sum of squared errors and the mini-batch under the mean squared error; i.e., DA with on-line copies yields apparent acceleration of training. The apparent acceleration and regularization effect come from the original part and noise in a copy data respectively. These results are confirmed in a numerical experiment. In the numerical experiment, we found that our result can be approximately applied to usual off-line DA in under-parameterization scenario and can not in over-parametrization scenario. Moreover, we experimentally investigated the training process of neural networks under DA with off-line noisy copies and found that our analysis on linear regression is possible to be applied to neural networks.
翻译:在机器学习中,数据增强(DA)是改进概括性表现的一种技术。在本文中,我们主要考虑的是在DA下,使用超音的数据集复制件,将噪音注入输入输入输入输入输入输入输入输入输入输入输入的数据集。我们分析的是随机噪音复制件新生成并在每个时段使用的情况,即使用在线噪音复制件的案例。因此,它被视为对一种方法的分析,用DA的方式将噪音注入培训过程;即DA的在线版本。我们在三种情况下得出了培训过程的平均偏斜性下降,这三种情况是:在平方错误的总和下,全调培训过程是全调培训过程;在平均平方位错误下,全调和小批培训。我们分析的情况是,在所有情况下,对DA的在线培训大约相当于脊椎整整,其调整参数与注入噪音的噪音差异。另一方面,我们显示学习速度可以乘以杂音复制件数,在平方形错误的总和微调分析中,在平均平流的网络中应用微调调调调,在原始变整结果中,DAA是初步结果,在原始变整结果中,在原始变整结果中,在原始变整结果中,在原始变整结果中,在原始结果中,在原始结果中,直压结果为明显变整结果为递减。DAAAAADAAAA得到明显,在初步结果。