Instance-dependent label noise is realistic but rather challenging, where the label-corruption process depends on instances directly. It causes a severe distribution shift between the distributions of training and test data, which impairs the generalization of trained models. Prior works put great effort into tackling the issue. Unfortunately, these works always highly rely on strong assumptions or remain heuristic without theoretical guarantees. In this paper, to address the distribution shift in learning with instance-dependent label noise, a dynamic distribution-calibration strategy is adopted. Specifically, we hypothesize that, before training data are corrupted by label noise, each class conforms to a multivariate Gaussian distribution at the feature level. Label noise produces outliers to shift the Gaussian distribution. During training, to calibrate the shifted distribution, we propose two methods based on the mean and covariance of multivariate Gaussian distribution respectively. The mean-based method works in a recursive dimension-reduction manner for robust mean estimation, which is theoretically guaranteed to train a high-quality model against label noise. The covariance-based method works in a distribution disturbance manner, which is experimentally verified to improve the model robustness. We demonstrate the utility and effectiveness of our methods on datasets with synthetic label noise and real-world unknown noise.
翻译:标签依赖性标签的噪音是现实的,但相当具有挑战性,因为标签腐败过程直接取决于各种情况。它导致培训和测试数据分布之间的严重分配变化,从而损害经过培训的模型的普及性。先前的作品为解决这一问题付出了巨大的努力。不幸的是,这些作品总是高度依赖强势假设,或者在没有理论保证的情况下仍然偏执。在本文中,为了用依赖性标签的噪音解决学习的分布变化,采用了动态分布校正战略。具体地说,我们假设在培训数据被标签噪音腐蚀之前,每个班级都符合功能层面的多变数高斯的分布。 Label 噪音产生外推线以改变高斯分布。在培训期间,为了校准变化的分布,我们提出了两种方法,分别以多变数高斯分布的平均值和共变数为基础,我们提出了两种方法。基于平均值的方法以递增递增递减递增的维度的维度方法,在理论上保证对标签噪音进行高质的模型进行训练。基于差异的方法以分配干扰性的方法以改变高位分布的分布性方式工作,我们以试验性地验证了真实的噪音。