In the context of supervised learning of a function by a Neural Network (NN), we claim and empirically justify that a NN yields better results when the distribution of the data set focuses on regions where the function to learn is steeper. We first traduce this assumption in a mathematically workable way using Taylor expansion. Then, theoretical derivations allow to construct a methodology that we call Variance Based Samples Weighting (VBSW). VBSW uses local variance of the labels to weight the training points. This methodology is general, scalable, cost effective, and significantly increases the performances of a large class of NNs for various classification and regression tasks on image, text and multivariate data. We highlight its benefits with experiments involving NNs from shallow linear NN to Resnet or Bert.
翻译:在由神经网络(NN)监督地学习功能的背景下,我们声称并且从经验上证明,当数据集的分布侧重于需要学习的功能更陡峭的区域时,NN会产生更好的结果。我们首先以数学上可行的方式,利用泰勒的扩展,以数学上可行的方式转述这一假设。然后,理论上的推理可以构建一种我们称之为基于差异的抽样加权(VBSW)的方法。VBSW使用标签的本地差异来加权培训点。这种方法是通用的、可缩放的、成本效益高的,大大提高了大批NNP在图像、文本和多变量数据方面进行各种分类和回归任务的性能。我们通过从浅线性NNW到Resnet或Bert的实验来突出它的好处。