In the context of supervised learning of a function by a neural network, we claim and empirically verify that the neural network yields better results when the distribution of the data set focuses on regions where the function to learn is steep. We first traduce this assumption in a mathematically workable way using Taylor expansion and emphasize a new training distribution based on the derivatives of the function to learn. Then, theoretical derivations allow constructing a methodology that we call Variance Based Samples Weighting (VBSW). VBSW uses labels local variance to weight the training points. This methodology is general, scalable, cost-effective, and significantly increases the performances of a large class of neural networks for various classification and regression tasks on image, text, and multivariate data. We highlight its benefits with experiments involving neural networks from linear models to ResNet and Bert.
翻译:在对神经网络的功能进行有监督的学习的背景下,我们声称并用经验核查,当数据集的分布侧重于学习功能陡峭的区域时,神经网络会产生更好的结果;我们首先以数学上可行的方式,利用泰勒的扩展,以数学上可行的方式阐述这一假设,并强调以该功能的衍生物为基础进行新的培训分配;然后,理论推算可以用来构建一种我们称之为差异基抽样加权的方法;VBSW用当地差异标为加权的培训点。这种方法是一般的、可缩放的、成本效益高的,大大提高了大批神经网络在图像、文本和多变量数据方面的各种分类和回归任务的性能。我们通过从线性模型到ResNet和Bert的神经网络实验来突出其好处。