Label Shift has been widely believed to be harmful to the generalization performance of machine learning models. Researchers have proposed many approaches to mitigate the impact of the label shift, e.g., balancing the training data. However, these methods often consider the underparametrized regime, where the sample size is much larger than the data dimension. The research under the overparametrized regime is very limited. To bridge this gap, we propose a new asymptotic analysis of the Fisher Linear Discriminant classifier for binary classification with label shift. Specifically, we prove that there exists a phase transition phenomenon: Under certain overparametrized regime, the classifier trained using imbalanced data outperforms the counterpart with reduced balanced data. Moreover, we investigate the impact of regularization to the label shift: The aforementioned phase transition vanishes as the regularization becomes strong.
翻译:人们普遍认为,Label Shift对机器学习模型的通用性表现有害,研究人员提出了许多减轻标签转换影响的办法,例如平衡培训数据;然而,这些方法往往考虑到抽样规模大大大于数据层面的不对称制度;过度平衡制度下的研究非常有限;为弥补这一差距,我们提议对Fisher Linear Distriminant分类器进行新的零用分析,以进行标签转换的二进制分类;具体地说,我们证明存在一个阶段过渡现象:在某种过度平衡制度下,使用不平衡数据培训的分类器比数据平衡性数据减少的对应系统更优;此外,我们调查正规化对标签转换的影响:随着正规化的加强,上述阶段过渡会消失。