Dataset bias has attracted increasing attention recently for its detrimental effect on the generalization ability of fine-tuned models. The current mainstream solution is designing an additional shallow model to pre-identify biased instances. However, such two-stage methods scale up the computational complexity of training process and obstruct valid feature information while mitigating bias. To address this issue, we utilize the representation normalization method which aims at disentangling the correlations between features of encoded sentences. We find it also promising in eliminating the bias problem by providing isotropic data distribution. We further propose Kernel-Whitening, a Nystrom kernel approximation method to achieve more thorough debiasing on nonlinear spurious correlations. Our framework is end-to-end with similar time consumption to fine-tuning. Experiments show that Kernel-Whitening significantly improves the performance of BERT on out-of-distribution datasets while maintaining in-distribution accuracy.
翻译:最近,由于数据偏差对微调模型的普及能力产生了有害影响,最近人们日益关注这种偏差。目前的主流解决方案正在设计新的浅度模型,以预先确定偏差情况。然而,这种两阶段方法扩大了培训过程的计算复杂性,在减少偏差的同时阻碍有效的特征信息。为解决这一问题,我们使用了代表比例正常化方法,目的是分解编码判决各特征之间的相互关系。我们发现,通过提供异位数据分布,也有望消除偏差问题。我们进一步提议采用Nystrom-Whitening这一Nystrom内核近似方法,在非线性假对应关系上实现更彻底的分辨。我们的框架是端对端,同时使用类似的时间进行微调。实验显示,Kernel-Whitening在保持分配准确性的同时,大大改进了BERT在分配外数据集上的性能。