Classification on long-tailed distributed data is a challenging problem, which suffers from serious class-imbalance and hence poor performance on tail classes with only a few samples. Owing to this paucity of samples, learning on the tail classes is especially challenging for the fine-tuning when transferring a pretrained model to a downstream task. In this work, we present a simple modification of standard fine-tuning to cope with these challenges. Specifically, we propose a two-stage fine-tuning: we first fine-tune the final layer of the pretrained model with class-balanced reweighting loss, and then we perform the standard fine-tuning. Our modification has several benefits: (1) it leverages pretrained representations by only fine-tuning a small portion of the model parameters while keeping the rest untouched; (2) it allows the model to learn an initial representation of the specific task; and importantly (3) it protects the learning of tail classes from being at a disadvantage during the model updating. We conduct extensive experiments on synthetic datasets of both two-class and multi-class tasks of text classification as well as a real-world application to ADME (i.e., absorption, distribution, metabolism, and excretion) semantic labeling. The experimental results show that the proposed two-stage fine-tuning outperforms both fine-tuning with conventional loss and fine-tuning with a reweighting loss on the above datasets.
翻译:长期分发数据的长期分类是一个具有挑战性的问题,它存在严重的等级平衡问题,因此,在只有少量样本的尾品类上表现不佳。由于样本的缺乏,在尾品类上学习对于在将预先培训的模式转换到下游任务时进行微调尤其具有挑战性。在这项工作中,我们提出对标准微调进行简单修改,以应对这些挑战。具体地说,我们建议进行两阶段微调:我们首先用等级平衡的重新加权损失来微调预先培训模式的最后一层,然后我们进行标准微调。我们的修改有好几项好处:(1)它仅通过微调一小部分模型参数来利用预先培训的表述方式,同时保持其余部分的不受影响;(2)它使该模型能够了解具体任务的初步表述;以及(3)它保护对尾品类的学习在更新模型期间不会处于不利地位。我们先用两种等级和多等级的文本分类任务对合成数据集进行广泛的实验,然后对ADME进行实际应用(i.i. 以上,吸收、分配、新分类、新分类),然后调整预设的演示,然后调整两个阶段的数据调整。