In this article, we use probing to investigate phenomena that occur during fine-tuning and knowledge distillation of a BERT-based natural language understanding (NLU) model. Our ultimate purpose was to use probing to better understand practical production problems and consequently to build better NLU models. We designed experiments to see how fine-tuning changes the linguistic capabilities of BERT, what the optimal size of the fine-tuning dataset is, and what amount of information is contained in a distilled NLU based on a tiny Transformer. The results of the experiments show that the probing paradigm in its current form is not well suited to answer such questions. Structural, Edge and Conditional probes do not take into account how easy it is to decode probed information. Consequently, we conclude that quantification of information decodability is critical for many practical applications of the probing paradigm.
翻译:在本篇文章中,我们用探究方法调查在微调和知识蒸馏基于BERT的自然语言理解模型(NLU)过程中出现的现象。我们的最终目的是利用探究来更好地了解实际生产问题,从而建立更好的NLU模型。我们设计了实验,以观察微调如何改变BERT的语言能力,微调数据集的最佳尺寸,以及基于微小变异器的精炼NLU所含信息的数量。实验结果显示,其目前形式的探究模式不适于回答这类问题。结构、边缘和条件性探针没有考虑到解解码所探测的信息是多么容易。因此,我们得出结论,信息可衰变性量化对于验证范例的许多实际应用至关重要。