A reasonable amount of annotated data is required for fine-tuning pre-trained language models (PLM) on downstream tasks. However, obtaining labeled examples for different language varieties can be costly. In this paper, we investigate the zero-shot performance on Dialectal Arabic (DA) when fine-tuning a PLM on modern standard Arabic (MSA) data only -- identifying a significant performance drop when evaluating such models on DA. To remedy such performance drop, we propose self-training with unlabeled DA data and apply it in the context of named entity recognition (NER), part-of-speech (POS) tagging, and sarcasm detection (SRD) on several DA varieties. Our results demonstrate the effectiveness of self-training with unlabeled DA data: improving zero-shot MSA-to-DA transfer by as large as \texttildelow 10\% F$_1$ (NER), 2\% accuracy (POS tagging), and 4.5\% F$_1$ (SRD). We conduct an ablation experiment and show that the performance boost observed directly results from the unlabeled DA examples used for self-training. Our work opens up opportunities for leveraging the relatively abundant labeled MSA datasets to develop DA models for zero and low-resource dialects. We also report new state-of-the-art performance on all three tasks and open-source our fine-tuned models for the research community.
翻译:在下游任务方面,需要合理数量的附加数据,以微调受过训练的语文模型(PLM)进行下游任务的微调。然而,获得不同语文品种的标签示例可能成本很高。在本文中,我们只调查在对现代标准阿拉伯文(MSA)数据微调一个PLM时,对现代标准阿拉伯文(MSA)数据进行微调时,对PLM的零点表现 -- -- 在评价DA的这种模型时,确定显著的绩效下降。为了纠正这种性能下降,我们建议用未加标签的DA数据进行自我培训,并在名称实体识别(NER)、部分语音标记(POS)标记和对DA的几种品种进行讽刺检测(SRD)的背景下应用这些数据。我们的成果表明,在使用未加标签的DA数据进行自我培训时,自我培训效果是有效的:改进零点的MIS-DA对DA的传输,其规模为\text 10美元(NER), 2 ⁇ 准确度(POS标记) 和4.5 美元(SRD) 。我们进行了一个模拟实验,并展示了从未加标签的DADA 用于相对升级的三类研究模型的精确的三类模型上,我们的工作也为DAAA做了新的数据定位。