Adapting pre-trained language models (PrLMs) (e.g., BERT) to new domains has gained much attention recently. Instead of fine-tuning PrLMs as done in most previous work, we investigate how to adapt the features of PrLMs to new domains without fine-tuning. We explore unsupervised domain adaptation (UDA) in this paper. With the features from PrLMs, we adapt the models trained with labeled data from the source domain to the unlabeled target domain. Self-training is widely used for UDA which predicts pseudo labels on the target domain data for training. However, the predicted pseudo labels inevitably include noise, which will negatively affect training a robust model. To improve the robustness of self-training, in this paper we present class-aware feature self-distillation (CFd) to learn discriminative features from PrLMs, in which PrLM features are self-distilled into a feature adaptation module and the features from the same class are more tightly clustered. We further extend CFd to a cross-language setting, in which language discrepancy is studied. Experiments on two monolingual and multilingual Amazon review datasets show that CFd can consistently improve the performance of self-training in cross-domain and cross-language settings.
翻译:将培训前语言模型(例如,BERT)适应到新领域最近引起了人们的极大关注。 我们调查如何在不进行微调的情况下将PrLMS的特性调整到新的领域,而不进行微调。 我们探索本文件中未受监督的域适应(UDA) 。 利用PrLMS的特性, 我们用源域的标签数据将经过培训的模型调整到未加标签的目标领域。 自我培训被广泛用于UDA, 该UDA预测目标领域数据用于培训。 然而, 预测的假标签必然包括噪音, 这将对强健健的模型的培训产生不利影响。 为了提高自我培训的稳健性, 我们在本文中展示了阶级自觉自我消化(UDA)的特性。 我们从PrLM的特性自我退缩到未加标签的目标领域。 我们进一步将CFD扩展到跨语言环境, 其中将语言差异扩大到一个跨语言环境, 其中将研究如何持续改善亚马孙两面语言的自我学习模式。