Over-parameterized models, typically pre-trained language models (LMs), have shown an appealing expressive power due to their small learning bias. However, the huge learning capacity of LMs can also lead to large learning variance. In a pilot study, we find that, when faced with multiple domains, a critical portion of parameters behave unexpectedly in a domain-specific manner while others behave in a domain-general one. Motivated by this phenomenon, we for the first time posit that domain-general parameters can underpin a domain-general LM that can be derived from the original LM. To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets). In order to intervene the lottery, we propose a domain-general score, which depicts how domain-invariant a parameter is by associating it with the variance. Comprehensive experiments are conducted on the Amazon, Mnli and OntoNotes datasets. The results show that the doge tickets obtains an improved out-of-domain generalization in comparison with a range of competitive baselines. Analysis results further hint the existence of domain-general parameters and the performance consistency of doge tickets.
翻译:由于学习偏差小,超参数模型(通常为预先培训的语言模型)显示出了极具吸引力的表情力量,然而,LM模型的庞大学习能力也会导致巨大的学习差异。在一项试点研究中,我们发现,在面临多个领域时,参数中的关键部分是意外地以特定领域的方式出现,而另一些参数则以一般领域的形式出现。受这一现象的驱动,我们首次认为域通用参数可以支持原LM产生的域通用LM。为了发现域通用LM,我们提议通过玩彩票(双面彩票)确定域通用参数。为了干预彩票,我们建议了域通用评分,说明域变量参数如何与差异相联系。在亚马孙、Mnli和Onto Notes数据集上进行了全面实验。结果显示,与一系列竞争性基线相比, doge门票获得了更好的外向外概括化。分析结果进一步暗示了域局参数的存在和性能的一致性。